Run massive parallel R Jobs in Azure, now at a fraction of the price

By JS Tan Program Manager

Run massive parallel R Jobs in Azure, now at a fraction of the price • 2 min read

Posted on May 31, 2017
2 min read

We continue to add new capabilities to our lightweight R package, doAzureParallel, built on top of Azure Batch that allows you to easily use Azure's flexible compute resource right from your R session. Combined with the recently announced low-priority VMs on Azure Batch, you can now run your parallel R jobs at a fraction of the price. We also included other commonly requested capabilities to enable you to do more, and to do it more easily, with doAzureParallel.

Using R with low priority VMs to reduce cost

Our second major release comes with full support for low-priority VMs, letting R users run their jobs on Azure’s surplus compute capacity at up to an 80% discount.

doazureparallel-v0.3.0-graphic2

For data scientists, low-priority is great way to save costs when experimenting and testing their algorithms, such as parameter tuning (or parameter sweeps) or comparing different models entirely. And Batch takes care of any pre-empted low-priority nodes by automatically rescheduling the job to another node.

You can also mix both on-demand nodes and low-priority nodes. Supplementing your regular nodes with low-priority nodes gives you a guaranteed baseline capacity and more compute power to finish your jobs faster. You can also spin up regular nodes using autoscale to replace any pre-empted low-priority nodes to maintain your capacity and to ensure that your job completes when you need it.

Other new features

Aside from the scenarios that low-priority VMs enable, this new release includes additional tools and common feature asks to help you do the following:

Parameter tuning & cross validation with Caret
Job management and monitoring to make it easier to run long-running R jobs
Leverage resource files to preload data to your cluster
Additional utility to help you read from and write to Azure Blob storage
ETL and data prep with Hadley Wickham’s plyr

Getting started with doAzureParallel

doAzureParallel is extremely easy to use. With just a few lines of code, you can register Azure as your parallel backend which can be used by foreach, caret, plyr and many other popular open source packages.

Once you install the package, getting started is as simple as few lines of code:

# 1. Generate your credentials config and fill it out with your Azure information
generateCredentialsConfig(“credentials.json”)

# 2. Set your credentials
setCredentials(“credentials.json”) 

# 3. Generate your cluster config to customize your cluster
generateClusterConfig(“cluster.json”)

# 4. Create your cluster in Azure passing, it your cluster config file.
cluster <- makeCluster(“cluster.json”)

# 5. Register the cluster as your parallel backend
registerDoAzureParallel(cluster)

# Now you are ready to use Azure as your parallel backend for foreach, caret, plyr, and many more

For more information, visit the doAzureParallel Github page for a full getting started guide, samples and documentation.

We look forward to you using these capabilities and hearing your feedback. Please contact us at razurebatch@microsoft.com for feedback or feel free to contribute to our Github repository.

Additional information:

Download and get started with doAzureParallel
For questions related to using the doAzureParallel package, please see our docs, or feel free to reach out to razurebatch@microsoft.com
Please submit issues via Github

Additional resources:

See Azure Batch, the underlying Azure service used by the doAzureParallel package
More general purpose HPC on Azure
Learn more about low-priority VMs
Visit our previous blog post on doAzureParallel

Run massive parallel R Jobs in Azure, now at a fraction of the price

Using R with low priority VMs to reduce cost

Other new features

Getting started with doAzureParallel

Additional information:

Additional resources:

Explore

Related posts

Enabling Diagnostic Logging in Azure API for FHIR®

Azure におけるインフラから SAP アプリケーションレイヤーまでの IRAP Protected コンプライアンス

MileIQ and Azure Event Hubs: Billions of miles streamed

Azure Stack IaaS – part ten

Join the conversation

おすすめ

AI + machine learning

分析

コンピューティング

コンテナー

データベース

DevOps

開発者ツール

ハイブリッド + マルチクラウド

ID

統合

モノのインターネット (IoT)

管理とガバナンス

メディア

移行

複合現実

モバイル

ネットワーク

セキュリティ

ストレージ

Web

Windows Virtual Desktop

ユース ケース

アプリケーション開発

AI

クラウドの移行とモダン化

データと分析

ハイブリッド クラウドとインフラストラクチャ

モノのインターネット (IoT)

セキュリティとガバナンス

組織の種類

リソース

Using R with low priority VMs to reduce cost

​Other new features

​Getting started with doAzureParallel

Additional information:

Additional resources:

Explore

Related posts

Join the conversation

ユースケース

ハイブリッドクラウドとインフラストラクチャ

Other new features

Getting started with doAzureParallel