Skip Navigation

Microsoft Azure, the cloud for high performance computing

Posted on August 29, 2018

Partner PM Manager, Azure Specialized Compute

Today, we continue to see customers leveraging Azure to push through new frontiers in high performance and accelerated computing. From Neuroiniative’s quest to accelerate drug discovery for Parkinson’s and Alzheimer’s diseases to EFS’s building of self-driving car technologies, a vast number of customers are leveraging Azure for breakthrough innovation.

We continue to invest to deliver the broadest range of Accelerated and high-performance computing (HPC) capabilities in the public cloud. From InfiniBand-enabled Virtual Machine families for artificial intelligence and HPC, to Hyperscale services like Cray supercomputing, Azure enables customers to deliver the full spectrum of AI and machine learning applications.

Azure CycleCloud – the simplest way to execute HPC on Azure

We are excited to announce the general availability of Azure CycleCloud, a tool for creating, managing, operating, and optimizing HPC clusters of any scale in Azure.

With Azure CycleCloud, we are making it even easier for everyone to deploy, use, and optimize HPC burst, hybrid, or cloud-only clusters. For users running traditional HPC clusters, using schedulers including SLURM, PBS Pro, Grid Engine, LSF, HPC Pack, or HTCondor, this will be the easiest way to get clusters up and running in the cloud, and manage the compute/data workflows, user access, and costs for their HPC workloads over time.

With a few clicks, HPC IT administrators can deploy high-performance clusters of compute, storage, filesystem, and application capability in Azure. Azure CycleCloud’s role-based policies and governance features make it easy for their organizations to deliver the hybrid compute power where needed while avoiding runaway costs. Users can rely on Azure CycleCloud to orchestrate their job and data workflows across these clusters.

CycleCloud

Customers including GE, Johnson & Johnson, and Ramboll leverage CycleCloud technology to deploy HPC clusters, control access and costs, and simplify management for compute and data workloads on the cloud. As an example of an innovative HPC simulation and AI workload, Silicon Therapeutics is using Azure CycleCloud to orchestrate a large Slurm HPC cluster with GPUs to simulate a large number of proteins to assess if and how these proteins can be targeted in their drug design projects.

Silicon Therapeutics – next generation drug discovery using quantum physics and machine learning

Silicon Therapeutics has created a unique quantum physics simulation technology to identify targets and design drugs to fight diseases that have been considered difficult for traditional approaches. These challenging protein targets typically involve large changes in their shape “conformational changes” associated with their biological function.

The company’s proprietary platform couples biological data with the dynamic nature of proteins to identify new disease targets. The integration of experimental data with physics-based simulations and machine learning can be performed at the genome scale, which is extremely computationally demanding, but tractable in the modern era of computing. Once targets have been identified, the platform is used to study thousands of molecules at the atomic level to gain insights that are used to guide the design of new, better drug candidates, which they synthesize and test in the lab.

Here, Silicon Therapeutics ran molecular dynamics simulations on thousands of targets—both to explore “flexibility” and to identify potential “hotspots” for designing new medicines. The simulations entailed millions of steps computing interactions between tens of thousands of atoms, which they ran on thousands of proteins.

The computations used five years of GPU compute-time, but was run in only 20 hours on 2048 NCv1 GPU instances in Azure. The auto-scaling capabilities of Azure CycleCloud created a Slurm cluster using Azure’s NCv1 VMs with full-performance NVIDIA K80 GPUs, and a BeeGFS file system. This environment mirrored their internal cluster, so their on-premise jobs could run seamlessly without any bottlenecks in Azure. This search for potential protein “hotspots” where drug candidates might be able to fight disease, generated over 50 TB of data. At peak, the 2048 K80 GPUs used over 25 GB/second of bandwidth between the BeeGFS and the compute nodes.

Using CycleCloud, Silicon Therapeutics could run the same platform they ran inhouse, and simply scale a Slurm HPC cluster with low-priority GPU execute nodes and a 80TB BeeGFS parallel filesystem to execute the molecular dynamics simulations and machine learning workloads to search for potential new drug candidates.

“In our work, where simulations are central to our decisions, time-to-solution is critical. Even with our significant internal compute resources, the Microsoft Azure cloud offers the opportunity to scale up resources with minimal effort. Running thousands of GPUs, as in this work, was a smooth process, and the Azure support team was excellent” says Woody Sherman, CSO at Silicon Therapeutics.

Azure CycleCloud is free to download and use to help get innovative HPC workloads like this one running on Azure, with easy management and cost control. If you have HPC and AI workloads that need to leverage Azure’s specialized compute capabilities to get answers back faster, try it for free today!

NVIDIA GPU Cloud with Azure

As GPUs provide outstanding performance for AI and HPC, Microsoft Azure provides a variety of virtual machines enabled with NVIDIA GPUs. Starting today, Azure users and cloud developers have a new way to accelerate their AI and HPC workflows with powerful GPU-optimized software that takes full advantage of supported NVIDIA GPUs on Azure.

Containers from the NVIDIA GPU Cloud (NGC) container registry are now supported on NVIDIA Volta and Pascal-powered Azure NCv3, NCv2 and ND. This brings together the power of NVIDIA GPUs in Azure cloud infrastructure with the comprehensive library of deep learning and HPC containers from NGC.

The NGC container registry includes NVIDIA tuned, tested, and certified containers for deep learning software such as Microsoft Cognitive Toolkit, TensorFlow, PyTorch, and NVIDIA TensorRT. Through extensive integration and testing, NVIDIA creates an optimal software stack for each framework – including required operating system patches, NVIDIA deep learning libraries, and the NVIDIA CUDA Toolkit – to allow the containers to take full advantage of NVIDIA GPUs. The deep learning containers from NGC are refreshed monthly with the latest software and component updates.

NGC also provides fully tested, GPU-accelerated applications and visualization tools for HPC, such as NAMD, GROMACS, LAMMPS, ParaView, and VMD. These containers simplify deployment and get you up and running quickly with the latest features.

To make it easy to use NGC containers with Azure, a new image called NVIDIA GPU Cloud Image for Deep Learning and HPC is available on Azure Marketplace. This image provides a pre-configured environment for using containers from NGC on Azure. Containers from NGC on Azure NCv2, NCv3, and ND virtual machines can also be run with Azure Batch AI by following these GitHub instructions.

To access NGC containers from this image, simply signup for a free account and then pull the containers into your Azure instance. To learn more accelerating HPC and AI projects with Azure and NGC, sign up for the webinar on October 2nd.

Azure: Investing to make HPC, AI, and GPU in the cloud easy

Microsoft is committed to making Azure the cloud of choice for HPC. Azure CycleCloud and NVIDIA GPUs ease integration and the ability to manage and scale. Near-term developments around hybrid cloud performance with the Avere vFXT will enhance your ability to minimize latency while leveraging on-premises NAS or Azure blob storage alongside Azure CycleCloud and Azure Batch workloads. With this portfolio of HPC solutions in-hand, we’re excited to see the new innovations you create!