Run massive parallel R Jobs in Azure, now at a fraction of the price

By JS Tan Program Manager

Run massive parallel R Jobs in Azure, now at a fraction of the price • 2 min read

Posted on May 31, 2017
2 min read

We continue to add new capabilities to our lightweight R package, doAzureParallel, built on top of Azure Batch that allows you to easily use Azure's flexible compute resource right from your R session. Combined with the recently announced low-priority VMs on Azure Batch, you can now run your parallel R jobs at a fraction of the price. We also included other commonly requested capabilities to enable you to do more, and to do it more easily, with doAzureParallel.

Using R with low priority VMs to reduce cost

Our second major release comes with full support for low-priority VMs, letting R users run their jobs on Azure’s surplus compute capacity at up to an 80% discount.

doazureparallel-v0.3.0-graphic2

For data scientists, low-priority is great way to save costs when experimenting and testing their algorithms, such as parameter tuning (or parameter sweeps) or comparing different models entirely. And Batch takes care of any pre-empted low-priority nodes by automatically rescheduling the job to another node.

You can also mix both on-demand nodes and low-priority nodes. Supplementing your regular nodes with low-priority nodes gives you a guaranteed baseline capacity and more compute power to finish your jobs faster. You can also spin up regular nodes using autoscale to replace any pre-empted low-priority nodes to maintain your capacity and to ensure that your job completes when you need it.

Other new features

Aside from the scenarios that low-priority VMs enable, this new release includes additional tools and common feature asks to help you do the following:

Parameter tuning & cross validation with Caret
Job management and monitoring to make it easier to run long-running R jobs
Leverage resource files to preload data to your cluster
Additional utility to help you read from and write to Azure Blob storage
ETL and data prep with Hadley Wickham’s plyr

Getting started with doAzureParallel

doAzureParallel is extremely easy to use. With just a few lines of code, you can register Azure as your parallel backend which can be used by foreach, caret, plyr and many other popular open source packages.

Once you install the package, getting started is as simple as few lines of code:

# 1. Generate your credentials config and fill it out with your Azure information
generateCredentialsConfig(“credentials.json”)

# 2. Set your credentials
setCredentials(“credentials.json”) 

# 3. Generate your cluster config to customize your cluster
generateClusterConfig(“cluster.json”)

# 4. Create your cluster in Azure passing, it your cluster config file.
cluster <- makeCluster(“cluster.json”)

# 5. Register the cluster as your parallel backend
registerDoAzureParallel(cluster)

# Now you are ready to use Azure as your parallel backend for foreach, caret, plyr, and many more

For more information, visit the doAzureParallel Github page for a full getting started guide, samples and documentation.

We look forward to you using these capabilities and hearing your feedback. Please contact us at razurebatch@microsoft.com for feedback or feel free to contribute to our Github repository.

Additional information:

Download and get started with doAzureParallel
For questions related to using the doAzureParallel package, please see our docs, or feel free to reach out to razurebatch@microsoft.com
Please submit issues via Github

Additional resources:

See Azure Batch, the underlying Azure service used by the doAzureParallel package
More general purpose HPC on Azure
Learn more about low-priority VMs
Visit our previous blog post on doAzureParallel

Run massive parallel R Jobs in Azure, now at a fraction of the price

Using R with low priority VMs to reduce cost

Other new features

Getting started with doAzureParallel

Additional information:

Additional resources:

Explore

Related posts

Enabling Diagnostic Logging in Azure API for FHIR®

Conformité avec le niveau de classification IRAP « Protected » de la couche infra à la couche d'application SAP sur Azure

MileIQ and Azure Event Hubs: Billions of miles streamed

Azure Stack IaaS – part ten

Join the conversation

Sélection

IA + Machine Learning

Analyse

Calcul

Conteneurs

Bases de données

DevOps

Outils de développement

Hybride + multicloud

Identité

Intégration

Internet des Objets

Gestion et gouvernance

Données multimédias

Migration

Réalité mixte

Mobile

Mise en réseau

Sécurité

Stockage

Web

Bureau virtuel Windows

Cas d'utilisation

Développement d’applications

IA

Migration et modernisation cloud

Données et analyse

Cloud hybride et infrastructure

Internet des Objets

Sécurité et gouvernance

Type d’organisation

Ressources

Using R with low priority VMs to reduce cost

​Other new features

​Getting started with doAzureParallel

Additional information:

Additional resources:

Explore

Related posts

Join the conversation

Other new features

Getting started with doAzureParallel