
In temporary: Intel has drummed up a rivalry between its new Gaudi2 accelerator and the now two-year-old market chief, the Nvidia A100. In two benchmarks suited to its area of interest, the brand new gaudily-named accelerator pulls out forward.
Gaudi2 is made for Intel by Habana Labs, an Israeli firm that it acquired on the finish of 2019 for $2 billion. Habana truly makes two forms of specialised accelerators: some for coaching neural networks, like Gaudi2; and others for operating (i.e., “inferencing”) them, resembling Goya and Greco.
Performance
Habana and Intel launched Gaudi2 in May however waited till final week to add its benchmark scores into the general public MLPerf database. In their graphs, they evaluate the scores of their Gaudi2 system in opposition to the general public scores of A100-equipped methods from Nvidia and Dell.
ResNet-50 assessments {hardware}’s capacity to coach an AI to categorise photos. Habana’s Gaudi2 system took simply 18 minutes to coach the AI effectively sufficient for it to go the take a look at, simply surpassing Nvidia’s A100 system, which wanted nearly half an hour.
Habana’s Gaudi2 system took simply 17 minutes to coach the BERT mannequin, beating Nvidia’s A100 system’s time by a couple of minute. BERT is a pure language processing mannequin, and on this take a look at, it trains itself with Wikipedia articles.
For each benchmarks, all of the methods used eight accelerators/GPUs. Habana’s system paired theirs with a pair of 40-core Intel Xeon 8380 CPUs and Nvidia’s used two 64-core AMD Epyc 7742 CPUs.
Specifications
Gaudi2 options 24 TPCs (tensor processor cores) and two MMEs (matrix multiplication engines) that run partially in parallel. It helps a broad array of knowledge varieties, together with FP32, TF32, BF16, FP16, and FP8. It additionally has a devoted media engine for processing audio and visible media as inputs.
For reminiscence, Gaudi2 has six 16 GB stacks of HBM2e that sum to 96 GB and a couple of.45 TB/s of complete reminiscence bandwidth. Inside, it has a 48 MB cache. For connectivity, it makes use of an x16 PCIe 4.0 connection and has 24x 100 Mbps RoCE2 (RDMA over Converged Ethernet 2) ports.
Competition
Habana has clearly created an actual A100-competitor for Intel. Its timing could possibly be higher, on condition that Nvidia introduced the H100 three months in the past, however the two are such totally different merchandise that despite the fact that they could compete in benchmarks, they won’t actually be competing for motherboard slots.
Whereas the A100 and H100 are versatile behemoths, Gaudi2 is a streamlined accelerator attempting to do one thing totally different, and it will be fascinating to see whether or not it is profitable or not.