• 6 min read

Microsoft at Supercomputing: Yes, we support Linux

As our customers bring their workloads to the cloud to take advantages of scale on-demand, without the overhead of managing infrastructure, they ask if Microsoft can support them with Linux. The answer is yes.

Compute-intensive modeling, simulation and analysis is used across industries to make new discoveries and draw insights that will transform their organizations. Microsoft has been investing in HPC and exhibiting at the Supercomputing conference since 2004. As our customers bring their workloads to the cloud to take advantages of scale on-demand, without the overhead of managing infrastructure, they ask if Microsoft can support them with Linux. The answer is yes. On Microsoft Azure, we are committed to supporting the operating systems, job schedulers, applications, tools, databases, and devices you use.

Linux is part of our everyday work in the Azure Big Compute team. We are building an enterprise ready, intelligent cloud with True HPC capabilities. We work with partners across the industry to have world class performance, scalability, and support for the applications and tools you run. And we are pushing the state of art in modern, cloud-native applications for greater agility and lower cost to operate.

We’ll be showcasing the work from our team and partners at SC15 in Austin this week, and we invite you to see what’s new on Azure for Windows and Linux HPC.

Enterprise-ready intelligent cloud

Microsoft is the only public cloud provider investing deeply to provide True HPC capabilities for customers. True HPC to us means performance, scalability, and features like low-latency RDMA networking and GPU accelerators delivered with cloud economics. What’s exciting about working on HPC in Azure is that these specialized capabilities are available to more customers that previously didn’t have access to them. We believe is expanding the reach of HPC to any researcher, engineering, scientist, or tinkerer.

HPC application scalability

True HPC applications rely on low-latency interconnects like InfiniBand for scalability. Azure provides compute-optimized virtual machines with InfiniBand for remote direct memory access (RDMA) communication used by message passing interface (MPI) and other highly scalable applications. The RDMA network in Azure delivers near bare-metal performance for both Windows and Linux MPI applications.

We’ve worked closely with Intel to tune Intel MPI for Azure and SuSE Linux. This work has been noticed by customers like Altair and NASA who are now taking advantage of these capabilities for their applications. We are partnering with Altair to provide their PBS Works Suite and applications for customers on Azure.

“Altair and the Microsoft Azure team started working together in early 2015 to provide an end-to-end seamless engineering simulation workflow in the Azure cloud. The solution spans industrial design, non-linear dynamic simulation, topology optimization, and engineering data analytics. While there is still a lot of work to do together, we are excited by the progress we have made to provide this experience for our mutual customers. Our partnership with Microsoft aligns very well with Altair’s vision to broaden the use of simulation technology to foster product innovations and optimize designs, processes, and decisions to improve our customers' business performance. As result of this partnership we are excited to announce availability of Altair RADIOSS on Microsoft Azure” – Sam Mahalingam, CTO HPC/Cloud Solutions, Altair.

GPU for compute and visualization

GPUs are an important part of the HPC toolkit and recently we announced this capability is coming to Azure with N-Series virtual machines. The new virtual machine family supports both compute and high-end visualization scenarios and technologies such as OpenGL and CUDA. Our strong partnership with NVIDIA will make GPUs in Azure more accessible and cost-effective for a broader set of users, and will enable new workloads and applications to run in Azure.

“Our vision is to deliver accelerated graphics and high performance computing to any connected device, regardless of location,” said Jen-Hsun Huang, co-founder and CEO of NVIDIA. “We are excited to collaborate with Microsoft Azure to give engineers, designers, content creators, researchers and other professionals the ability to visualize complex, data-intensive designs accurately from anywhere.”

Flexibility and choice

Microsoft Azure supports the operating systems, job schedulers, applications, databases, tools and devices that you run. Azure Resource Manager (ARM) is the glue to configure and orchestrate deployments. We have developed ARM templates for deploying HPC clusters, and feature a range of partner solutions in the Azure Marketplace.

Easy Linux cluster deployment

While supporting Linux is just the start, we want to make it as simple and efficient to create and scale clusters on demand. In addition to ARM templates, Azure Virtual Machine Scale Sets can be used to deploy and manage a collection of virtual machines as a set, enabling you to scale out an in easily and rapidly.

We’ve written a set of templates to help you get started deploying clusters directly from the Azure portal by simply providing configuration parameters. Cluster templates include Torque, and SLURM, as well as Windows HPC Pack with Windows compute nodes or Linux compute nodes. All the ARM quickstart templates are available here.

HPC Pack with Linux

HPC Pack is Microsoft’s cluster management and job scheduling tool. The newest release HPC Pack 2012 R2 Update 3 with Microsoft MPI v7 supports scheduling to Windows and Azure compute nodes both on-premises or private cloud, and in Microsoft Azure. HPC Pack provides a rich admin interface to manage the cluster, and full featured job scheduler with batch and interactive policies. HPC Pack is widely used by financial services customers and ISVs.

This latest update of HPC Pack includes support for scheduling GPUs as a resource on compute nodes, new policies for SOA workloads, customizable idle detection for cycle harvesting, an unlimited number of parametric sweep tasks in a job, and other updates. HPC Pack also now supports “bursting” to Azure Batch, leverage Batch to provide resource management and task dispatching as a service for greater scalability in Azure.

HPC tools

Alongside your applications comes data. Data is a vital part of HPC workloads such as engineering simulations, genomic analysis, rendering, and risk analysis. We’ve been working with solution providers in this space to make it easier to use and manipulate your data in the cloud. We’re excited by many new HPC solutions from partners being made available on Azure as ARM templates and through the Azure Marketplace.

UberCloud has made STAR-CCM+ v10 available and pre-configured with a Linux desktop, MPI, and support for Power on Demand licensing.

Intel Cloud Edition for Lustre Software is available for evaluation in the Azure Marketplace. Lustre is a scalable, parallel file system purpose-built for HPC and widely used in compute centers around the world. We’ve been working with a number of customers to scale their applications on Azure using Lustre, and it’s an easy transition from how they run on-premises.

“Intel® is excited to be working closely with Microsoft to bring Intel's Cloud Edition for Lustre* software to the Azure cloud platform. The Lustre file system is the most widely used parallel file system in HPC and provides massive scalability and performance on-demand. Lustre storage helps accelerate even the most demanding HPC and technical computing workloads.” – Brent Gorda, General Manager of Intel’s High Performance Data Division.

We are excited that Avere is bringing their scalable file system to Microsoft Azure. Avere Hybrid Cloud is a clustered NAS solution called a virtual FXT Edge Filer (vFXT). The vFXT supports SMB and NFS protocols enabling file-based applications to be run on Azure without changes, scales to support tens of thousands of compute cores, and automatically caches active data in Azure, hiding latency when connecting back to on-premises storage.

Cloud native applications

Many developers building applications on Azure use a parallel task execution pattern to process files or data. We’re able to apply what we’ve learned from supporting HPC over the years, and help customers scale out work on Azure without needing to manage compute clusters.

Azure Batch

Azure Batch is “job scheduling as a service.” It’s the next step for many customers moving their HPC and batch computing workloads to the cloud, in addition for the developers that want task execution managed for their services.

Batch is a service that runs large-scale and computationally-intensive jobs. Instead of managing a cluster or sets of virtual machines, you tell Batch the resources that you need, submit jobs and tasks, and let it manage the execution. Customers are running jobs with tens of thousands of cores with Batch.

At SC15, we’ll be previewing two new features coming soon to Batch: support for Linux virtual machines and scheduling of MPI jobs. We wanted to take a platform neutral approach for Linux on Batch, and developed our agent in Python for easy portability. We’re also using Virtual Machine Scale Sets which will enable additional capabilities like custom VM images in the future.

Scheduling MPI jobs on Windows and Linux virtual machines is coming soon as well. MPI application scale great on the Azure HPC hardware and A9 virtual machines. We are working closely with several partners to cloud-enable their applications on Batch and make it easier for their customers to scale-out jobs to the cloud.

Stop by our booth if you are in Austin this week. You can also reach us through comments on this post, the Azure Batch forum, HPC Pack forum, or sending us an email. We are ready to help you with HPC in the cloud.