HB-series Azure Virtual Machines achieve cloud supercomputing milestone

Posted on May 26, 2019

Principal Program Manager, Azure HPC

New HPC-targeted cloud virtual machines are first to scale to 10,000 cores

Azure Virtual Machine HB-series are the first on the public cloud to scale a MPI-based high performance computing (HPC) job to 10,000 cores. This level of scaling has long been considered the realm of only the world’s most powerful and exclusive supercomputers, but now is available to anyone using Azure.

HB-series virtual machines (VMs) are optimized for HPC applications requiring high memory bandwidth. For this class of workload, HB-series VMs are the most performant, scalable, and price-performant ever launched on Azure or elsewhere on the public cloud.

With the AMD EPYC processors, the HB-series delivers more than 260 GB/s of memory bandwidth, 128 MB L3 cache, and SR-IOV-based 100 Gb/s InfiniBand. At scale, a customer can utilize up to 18,000 physical CPU cores and more than 67 terabytes of memory for a single distributed memory computational workload.

For memory-bandwidth bound workloads, the HB-series delivers something many in HPC thought may never happen. Azure-based VMs are now as or more capable as bare-metal, on-premises status quo that dominates the HPC market, and at a highly competitive price point.

World-class HPC technology

HB-series VMs feature the cloud’s first deployment of AMD EPYC 7000-series CPUs explicitly for HPC customers. AMD EPYC features 33 percent more memory bandwidth than any x86 alternative, and even more than leading POWER and ARM server platforms. In context, the 263 GB/s of memory bandwidth the HB-series VM delivers is 80 percent more than competing cloud offerings in the same memory per core class.

HB-series VMs expose 60 non-hyperthreaded CPU cores and 240 GB of RAM, with a baseclock of 2.0 GHz, and an all-cores boost speed of 2.55 GHz. HB VMs also feature a 700 GB local NVMe SSD, and support up to four Managed Disks including the new Azure P60/P70/P80 Premium Disks.

A flagship feature of HB-series VMs is 100 GB/ss InfiniBand from Mellanox. HB-series VMs expose the Mellanox ConnectX-5 dedicated back-end NIC via SR-IOV, meaning customers can use the same OFED driver stack that they’re accustomed to in a bare metal context. HB-series VMs deliver MPI latencies as low as 2.1 microseconds, with consistency, bandwidth, and message rates in line with bare-metal InfiniBand deployments.

Cloud HPC scaling achievement

As part of early acceptance testing, the Azure HPC team benchmarked many widely used HPC applications. One common class of applications are those that simulate computational fluid dynamics (CFD). To see how far HB-series VMs could scale, we selected the Le Mans 100 million cell model available to Star-CCM+ customers, with results as follows:

Graph of Siemens Star-CCM+ V.14.02 Le Mans 100M couple scaling - Speed up vs nodes


Graph of Siemens Star-CCM+ V.14.02 Le Mans 100M couple scaling - parallel efficiency vs nodes

Table showing number of hosts, cores, PPN, sample elapsed time, speed up node, and parallel efficiency

Table showing number of hosts, cores, PPN, sample elapsed time, speed up node, and parallel efficiency

The Le Mans 100 million cell model scaled to 256 VMs across multiple configurations accounting for as many as 11,520 CPU cores. Our testing revealed that maximum scaling efficiency could be had with two MPI ranks per NUMA domain yielding a top-end scaling efficiency of 71.3 percent. For top-end performance, three MPI ranks per NUMA domain yielded the fastest overall time to solution. Customers can choose which metric they find most valuable based on a wide variety of factors.

Delighting HPC customers on Azure

The unique capabilities and cost-performance of HB-series VMs are a big win for scientists and engineers who depend on high-performance computing to drive their research and productivity to new heights. Organizations spanning aerospace, automotive, defense, financial services, heavy equipment, manufacturing, oil & gas, public sector academic, and government research have shared feedback on how the HB-series has increased HPC application performance and provided new insights through more detailed simulation models.

Rescale partners with Azure to provide HPC resources for computationally complex simulations and analytics. Launching today, Azure Virtual Machine HB-series VM can be consumed through Rescale’s ScaleX® as the new “Amber” compute resource.

“As the only fully managed HPC cloud service in the market, Rescale creates an elegant way to move on-premises HPC workloads to the cloud. We have been waiting with great anticipation for Microsoft to introduce cloud building blocks specifically engineered for HPC," said Adam McKenzie, CTO of Rescale. "Now, new HB-series VMs on Azure enable MPI workloads to scale to tens of thousands of cores with the kind of cost-performance that rivals on-premises supercomputers”


Available now

Azure Virtual Machine HB-series are currently available in South Central US and Western Europe, with additional regions rolling out soon.