• 3 min read

Azure announces general availability of scale-out NVIDIA A100 GPU Clusters: the fastest public cloud supercomputer

Today, Azure announces the general availability of the Azure ND A100 v4 Cloud GPU instances—powered by NVIDIA A100 Tensor Core GPUs—achieving leadership-class supercomputing scalability in a public cloud. For demanding customers chasing the next frontier of AI and high-performance computing (HPC), scalability is the key to unlocking improved total cost of ownership and time-to-solution.

Today, Azure announces the general availability of Azure ND A100 v4 Cloud GPU instances—powered by NVIDIA A100 Tensor Core GPUs—achieving leadership-class supercomputing scalability in a public cloud. For demanding customers chasing the next frontier of AI and high-performance computing (HPC), scalability is the key to unlocking improved Total Cost of Solution and Time-to-Solution. 

Simply put, ND A100 v4—powered by NVIDIA A100 GPUs—is designed to let our most demanding customers scale up and scale out without slowing down.

Benchmarking with 164 ND A100 v4 virtual machines on a pre-release public supercomputing cluster yielded a High-Performance Linpack (HPL) result of 16.59 petaflops. This HPL result, delivered on public cloud infrastructure, would fall within the Top 20 of the November 2020 Top 500 list of the fastest supercomputers in the world, or top 10 in Europe, based on the region where the job was run.

Measured via HPL-AI, an artificial intelligence (AI) and machine learning (ML)-focused High-Performance Linpack variant, the same 164-VM pool achieved a 142.8 Petaflop result, placing it among the world’s Top 5 fastest known AI supercomputers as measured by the official HPL-AI benchmark list. These HPL results, utilizing only a fraction of a single public Azure cluster, rank with the most powerful dedicated, on-premises supercomputing resources in the world.

And today, as ND A100 v4 goes to general availability, we’re announcing the immediate availability of the world’s fastest public cloud supercomputers on-demand, near you, through four Azure regions: East United States, West United States 2, West Europe, and South Central United States.

The ND A100 v4 VM series starts with a single virtual machine (VM) and eight NVIDIA Ampere architecture-based A100 Tensor Core GPUs, and can scale up to thousands of GPUs in a single cluster with an unprecedented 1.6 Tb/s of interconnect bandwidth per VM delivered via NVIDIA HDR 200Gb/s InfiniBand links: one for each individual GPU. Additionally, every 8-GPU VM features a full complement of third generation NVIDIA NVLink, enabling GPU to GPU connectivity within the VM in excess of 600 gigabytes per second. 

Built to take advantage of de-facto industry standard HPC and AI tools and libraries, customers can leverage ND A100 v4’s GPUs and unique interconnect capabilities without any special software or frameworks, using the same NVIDIA NCCL2 libraries that most scalable GPU-accelerated AI and HPC workloads support out-of-box, without any concern for underlying network topology or placement. Provisioning VMs within the same VM Scale Set automatically configures the interconnect fabric.

Anyone can bring demanding on-premises AI and HPC workloads to the cloud via ND A100 v4 with minimal fuss, but for customers who prefer an Azure-native approach, Azure Machine Learning provides a tuned virtual machine (pre-installed with the required drivers and libraries) and container-based environments optimized for the ND A100 v4 family. Sample recipes and Jupyter Notebooks help users get started quickly with multiple frameworks including PyTorch, TensorFlow, and training state-of-the-art models like BERT. With Azure Machine Learning, customers have access to the same tools and capabilities in Azure as our AI engineering teams.

Each NVIDIA A100 GPU offers 1.7 to 3.2 times the performance of prior V100 GPUs out-of-the-box and up to 20 times the performance when layering new architectural features like mixed-precision modes, sparsity, and Multi-Instance GPU (MIG) for specific workloads. And at the heart of each VM is an all-new 2nd Generation AMD EPYC platform, featuring PCI Express Gen 4.0- for CPU to GPU transfers twice as fast as prior generations.

We can’t wait to see what you’ll build, analyze, and discover with the new Azure ND A100 v4 platform.

Size

Physical CPU Cores

Host Memory  (GB)

GPUs

Local NVMe Temporary Disk

NVIDIA InfiniBand Network

Azure network

Standard_ND96asr_v4

96

900 GB

8 x 40 GB NVIDIA A100

6,500 GB

8 x 200 Gbps

40 Gbps

Learn more