Azure delivers strong MLPerf inferencing v2.0 results from 1 to 8 GPUs

This blog has been co-authored by Jon Shelley, Principal PM Manager, Azure Compute.

Microsoft Azure is committed to providing its customers with industry-leading real-world AI capabilities. In December 2021, Microsoft Azure debuted its leadership performance with the MLPerf training v1.1 results. Azure debuted at number one among cloud providers and number two overall at scale among all submitters. Azure’s supercomputer’s building blocks were used to generate the results in our v2.0 submissions for the MLPerf inferencing results published on April 6, 2022.

These industry-leading results are driven by Microsoft’s publicly available supercomputing capabilities designed for real-world AI inferencing workloads. Microsoft enables customers of all scales to deploy powerful AI solutions, whether at a focused local scale or at the scale of the largest supercomputers in the world.

Microsoft Azure’s publicly available AI inferencing capabilities are led by the NDm A100 v4, ND A100 v4, and NC A100 v4 virtual machines (VMs) that are powered by NVIDIA A100 SXM and PCIe Tensor Core graphics processing units (GPUs). These results showcase Azure’s commitment to making AI inferencing available to all in the most accessible way—while raising the bar for AI inferencing in Azure.

In our quest to continually provide the best technology for our customers, Azure has recently announced the preview for the NC A100 v4. With this introduction of the NC A100 v4 series, we have provided our customers with three different VM sizes ranging from one to four GPUs. From our benchmarking, we have seen more than two times performance over the previous generation. Azure’s customers can get access to these new systems today by signing up for the preview program.

Some highlights for this round of MLPerf inferencing submissions can be seen in the following tables.

Highlights from the results

ND96amsr A100 v4 powered by NVIDIA A100 80G SXM Tensor Core GPU

Benchmark	Samples/second	Queries/second	Scenarios
bert-99	27,500 plus	~22,500 plus	Offline and server
resnet	300,000 plus	~200,000 plus	Offline and server
3d-unet	24.87		Offline

NC96ads A100 v4 powered by NVIDIA A100 80G PCIe Tensor Core GPU

Benchmark	Samples/second	Queries/second	Scenarios
bert-99	~6,300	~5,300	Offline and server
resnet	144,000	~119,600	Offline and server
3d-unet	11.7		Offline

The above tables showcase three of the six benchmarks the team ran using NVIDIA A100 SXM and PCIe Tensor Core GPUs for offline and server scenarios respectively. Take a look at the full list of results for the various divisions.

Azure works closely with NVIDIA

The results were generated by deploying the environment using the VM offerings and Azure’s Ubuntu 18.04-HPC marketplace image. We worked closely with NVIDIA to quickly deploy the environment and perform benchmarks with industry-leading results in performance and scalability.

These results are a testament to Azure’s focus on offering scalable supercomputing for any workload while enabling our customers to utilize “on-demand” supercomputing capabilities in the cloud to solve their most complex problems. Visit the Azure Tech Community blog to read the steps to reproduce the results.

More about MLPerf

MLPerf is a consortium of AI leaders from academia, research labs, and industry where the mission is to “build fair and useful benchmarks” that provide unbiased evaluations of training and inference performance for hardware, software, and services—all conducted under prescribed conditions. To stay on the cutting edge of industry trends, MLPerf continues to evolve, holding new tests at regular intervals and adding new workloads that represent state-of-the-art AI. MLPerf’s tests are transparent and objective, so users can rely on the results to make informed buying decisions. The industry benchmarking group, formed in May 2018, is backed by dozens of industry leaders. The benchmark tests across inferencing are increasingly becoming the key tests that hardware and software vendors use to demonstrate performance. Take a look at the full list of results for MLPerf Inference v2.0.

Azure delivers strong MLPerf inferencing v2.0 results from 1 to 8 GPUs

Highlights from the results

ND96amsr A100 v4 powered by NVIDIA A100 80G SXM Tensor Core GPU

NC96ads A100 v4 powered by NVIDIA A100 80G PCIe Tensor Core GPU

Azure works closely with NVIDIA

More about MLPerf

Aimee Garcia posts

Meet Brain: The AI system behind Azure reliability

Proving application resilience on Azure with Chaos Studio

Azure IaaS: How to design, build, and optimize cloud infrastructure for long-term cost efficiency

Explore Microsoft Foundry

Azure delivers strong MLPerf inferencing v2.0 results from 1 to 8 GPUs

Highlights from the results

ND96amsr A100 v4 powered by NVIDIA A100 80G SXM Tensor Core GPU

NC96ads A100 v4 powered by NVIDIA A100 80G PCIe Tensor Core GPU

Azure works closely with NVIDIA

More about MLPerf

Related posts

Meet Brain: The AI system behind Azure reliability

Proving application resilience on Azure with Chaos Studio

Azure IaaS: How to design, build, and optimize cloud infrastructure for long-term cost efficiency

Explore Microsoft Foundry