Availability of Linux RDMA on Microsoft Azure

By Tejas Karmarkar Principal Program Manager, Azure Compute

Availability of Linux RDMA on Microsoft Azure • 6 min read

Posted on July 9, 2015
6 min read

We are excited to announce availability of Linux RDMA on Microsoft Azure. With this release, we mark a new milestone in our cloud journey and in our vision to make HPC and Big Compute more accessible and cost-effective for a broader set of users. Linux RDMA makes high speed low latency networking accessible to the engineering and scientific community across the globe, helping them solve complex problems with the applications they use today.

Remote Direct Memory Access, or RDMA, is a technology that provides a low-latency network connection between processing running on two servers, or virtual machines in Azure. This technology is essential for engineering simulations and other compute applications that are too large to fit in the memory of a single machine. The A8 and A9 VM sizes in Azure use the InfiniBand network to provide RDMA virtualized through Hyper-V with near “bare metal” performance of less than 3 microsecond latency and greater than 3.5 Gbps bandwidth.

The current release of Azure Linux RDMA supports SUSE Linux Enterprise Server 12 (SLES12). We will continue to work with other Linux distributions and will have more to say about other supported distributions in near future. A SLES 12 image with completely integrated RDMA drivers specifically tuned for HPC workloads is available now in the Azure market place.

“As a leader in HPC, SUSE is excited to help enable compute-intensive workloads to be published in the Microsoft Azure cloud,” said Pete Chadwick, senior product manager of cloud solutions at SUSE. “With full enterprise support and managed images from SUSE and Microsoft, this offering is the latest in a line of customer solutions for the enterprise and hybrid cloud engineered by Microsoft and SUSE since 2006.”

We partnered with Intel and our early adopter customers to test a range of commercial and open source applications using the Intel MPI Library for performance and scalability. The Message Passing Library (MPI) is a standard that facilitates communication between compute nodes so simulations can be larger and complete faster by using cores across many machines. We’re continuing to test with other popular MPI stacks and will share best practices in future blog posts.

Achieving More on Azure

The true HPC capabilities for Linux and Windows applications that RDMA enables provides our customers greater agility and helps partners deliver new services without large up-front capital investment.

Software-as-a-Service

Engineers and scientists want to focus on developing better solutions and making better decisions. Managing infrastructure gets in the way of the fast, project-based reality of business today. Software-as-a-Service (SaaS) providers can package together workflows and best practices to help their customers explore new ideas and work more efficiently.

d3View has built an end to end SaaS service for engineering simulation and data management on the Azure platform. Azure provides d3View’s customers HPC clusters on demand to run large multi-core and multi-node simulations through a rich web interface.

“Microsoft understands big-compute needs for the simulation industry and provides great Azure support for HPC-centric applications. We look forward to using Azure Linux RDMA to help customers use large-scale cloud computing for LS-DYNA® simulation-based product development.” Suri Bala, Founder and Chief Executive Officer, d3VIEW & Scientist at LSTC

d3view_screen_shot

The premier engineering simulation application supported by d3View is LS-Dyna, from Livermore Software Technology Corporation (LSTC). LS-DYNA is a general-purpose finite element program capable of simulating complex real world problems. It is used by the automobile, aerospace, construction, military, manufacturing, and bioengineering industries. The code’s origins lie in highly nonlinear, transient dynamic finite element analysis using explicit time integration.

LSTC and Microsoft have worked closely during the private preview and we see great scaling of LSDYNA on Azure with both Linux and Windows RDMA. Below is a chart showing scaling of a model on Linux RDMA from two 16-core A9 virtual machines to sixteen. This scaling makes it possible to run many more simulations, testing different parameters.

dyna_scaling_chart

Marketplace

Another approach to enabling self-service applications for engineers and scientists is through the Azure Marketplace and easy-to-use tools. ISVs can offer their service through the marketplace for broader reach.

uber_cloud_logo
UberCloud provides a one-stop shop for access to computing resources and bundled application services on-demand. Their community can help users discover solutions and get started, or provide advanced consulting services when needed. Azure’s HPC capabilities gives these solutions scale and performance.

“At UberCloud we work with engineers and scientists who require dense compute power to be innovative and productive. They use highly sophisticated math models to simulate anything from car bumpers to turbo machines to personalized medicines and time to market is driven by how fast they can crunch numbers. However, technical computing clusters not only require a significant budget to procure, but also are challenging to operate. Therefore, access to dense compute power has traditionally been a luxury; available only to a lucky few while the majority is limited to desktop grade computing equipment. With Azure Linux RDMA, Microsoft is bringing together the self-service, pay-per-use characteristics of the Cloud and high performance together. We believe this is a powerful combination and we are excited to be experimenting with engineering and scientific applications on Azure” Burak Yenier Co-founder and CEO TheUberCloud

A virtual machine image with OpenFOAM pre-installed and MPI pre-configured is available now in the Azure Marketplace.

Enterprise and Hybrid Cloud

Even customers with their own HPC infrastructure need additional capacity. There are two common deployment models. For some projects, customers will deploy a cluster entirely in the cloud. This provides full control over the applications and environment in an isolated deployment.

Azure Resource Manager and templates provide a great way of doing this in an automated way. We recently published a template for deploying a SLURM cluster, and will be adding templates for additional job schedulers.

Microsoft HPC Pack takes this a step further by integrating Azure IaaS VM deployment scripts or templates with a rich job scheduler and cluster management tool. We’re excited that HPC Pack now also supports Linux VMs in Azure, providing customers with an additional option.

Other customers want to extend their on-premises clusters with dynamic compute resources in the cloud. The VMs in Azure can be part of the enterprise network, securely connected over a VPN or ExpressRoute connection and not traversing the public Internet.

altair_logo

Altair Engineering and Microsoft have been working together to help customers using PBS Works and the PBS Professional job scheduler take advantage of Microsoft Azure for their Linux and Windows HPC workloads. Many of Altair’s products for computer aided engineering (CAE) simulations have also been tested and work well in Azure.

“We’re pleased that Microsoft has chosen Altair’s PBS Works as their preferred workload management suite for the manufacturing industry on Azure. By working together, Altair and Microsoft now provide a turnkey solution that allows our mutual clients to access cloud high-performance computing resources from any web-enabled device.” Sam Mahalingam, Altair CTO

Rescale with their ScaleX™ Enterprise is the enterprise deployment of Rescale’s industry leading cloud simulation and HPC platform. ScaleX Enterprise features a unified enterprise simulation platform and a powerful administrative portal, along with direct integrations and management of on-premise and cloud HPC resources, schedulers, and software licenses.

“We are excited to add to our global HPC infrastructure network by partnering with Microsoft Azure, further ensuring that Rescale’s customers always have access to the latest in leading cloud computing hardware technologies. The release of Linux RDMA provides a compelling new infrastructure solution for customers running simulation software requiring low latency and high interconnect speeds. Powerful simulation software on top of an agile, responsive IT infrastructure layer, such as Microsoft Azure, is critical to our enterprise customers across aerospace, automotive, energy, and life sciences verticals. This seamless combination enables Fortune 500 leaders to accelerate product development and drive innovation.” Joris Poort, CEO Rescale

Get Started on Azure

Learn more about Linux RDMA on Azure A8 and A9 Virtual Machines. The Azure Big Compute team is committed to helping customers run the applications and tools they need. We are excited to support both Linux and Windows with RDMA, and welcome all the partners that work with us to help customers achieve more with the compute power possible in the cloud. We’d love to hear about your stories and your requests. Feel free to send us a mail to share your thoughts.

Availability of Linux RDMA on Microsoft Azure