Deep Learning, Simulation and HPC Applications with Docker and Azure Batch

By Fred Park Software Engineer

Deep Learning, Simulation and HPC Applications with Docker and Azure Batch • 2 min read

Posted on September 22, 2016
2 min read

The Azure Big Compute team is happy to announce version 1.0.0 of the Batch Shipyard toolkit, which enables easy deployment of batch-style Dockerized workloads to Azure Batch compute pools. Azure Batch enables you to run parallel jobs in the cloud without having to manage the infrastructure. It’s ideal for parametric sweeps, Deep Learning training with NVIDIA GPUs, and simulations using MPI and InfiniBand.

Whether you need to run your containerized jobs on a single machine or hundreds or even thousands of machines, Batch Shipyard blends features of Azure Batch — handling complexities of large scale VM deployment and management, high throughput, highly available job scheduling, and auto-scaling to pay only for what you use — with the power of Docker containers for application packaging. Batch Shipyard allows you to harness the deployment consistency and isolation for your batch-style and HPC containerized workloads, and run them at any scale without the need to develop directly to the Azure Batch SDK.

The initial release of Batch Shipyard has the following major features:

Automated Docker Host Engine installation tuned for Azure Batch compute nodes
Automated deployment of required Docker images to compute nodes
Accelerated Docker image deployment at scale to compute pools consisting of a large number of VMs via private peer-to-peer distribution of Docker images among the compute nodes
Automated Docker Private Registry instance creation on compute nodes with Docker images backed to Azure Storage if specified
Automatic shared data volume support for:
- Azure File Docker Volume Driver installation and share setup for SMB/CIFS backed to Azure Storage if specified
- GlusterFS distributed network file system installation and setup if specified
Seamless integration with Azure Batch job, task and file concepts along with full pass-through of the Azure Batch API to containers executed on compute nodes
Support for Azure Batch task dependencies allowing complex processing pipelines and graphs with Docker containers
Transparent support for GPU accelerated Docker applications on Azure N-Series VM instances (Preview)
Support for multi-instance tasks to accommodate Dockerized MPI and multi-node cluster applications on compute pools with automatic job cleanup
Transparent assist for running Docker containers utilizing Infiniband/RDMA for MPI on HPC low-latency Azure VM instances (i.e., STANDARD_A8 and STANDARD_A9)
Automatic setup of SSH tunneling to Docker Hosts on compute nodes if specified

We’ve also made available an initial set of recipes that enable scenarios such as Deep Learning, Computational Fluid Dynamics (CFD), Molecular Dynamics (MD) and Video Processing with Batch Shipyard. In fact, we are aiming to make Deep Learning on Azure Batch an easy, low friction experience. Once you have the toolkit installed and have Azure Batch and Azure Storage credentials, you can get CNTK, Caffe or TensorFlow running in an Azure Batch compute pool in under 15 minutes. Below is a screenshot of CNTK running on a GPU-enabled STANDARD_NC6 VM via Batch Shipyard with nvidia-smi:

CNTK

We hope to continue to expand the repertoire of recipes available for Batch Shipyard in the future.

The Batch Shipyard toolkit can be found on GitHub. We welcome any feedback and contributions!

Deep Learning, Simulation and HPC Applications with Docker and Azure Batch

Explore

Related posts

Microsoft to showcase purpose-built AI infrastructure at NVIDIA GTC

Advancing global network reliability through intelligent software—part 1 of 2

Announcing low-priority VMs on scale sets now in public preview

New NVIDIA GPUs coming to Azure accelerate HPC and AI workloads

Join the conversation

Sélection

IA + Machine Learning

Analyse

Calcul

Conteneurs

Bases de données

DevOps

Outils de développement

Hybride + multicloud

Identité

Intégration

Internet des Objets

Gestion et gouvernance

Données multimédias

Migration

Réalité mixte

Mobile

Mise en réseau

Sécurité

Stockage

Web

Bureau virtuel Windows

Cas d'utilisation

Développement d’applications

IA

Migration et modernisation cloud

Données et analyse

Cloud hybride et infrastructure

Internet des Objets

Sécurité et gouvernance

Type d’organisation

Ressources

Explore

Related posts

Join the conversation