We continue to add new capabilities to our lightweight R package, doAzureParallel, built on top of Azure Batch that allows you to easily use Azure's flexible compute resource right from your R session. Combined with the recently announced low-priority VMs on Azure Batch, you can now run your parallel R jobs at a fraction of the price. We also included other commonly requested capabilities to enable you to do more, and to do it more easily, with doAzureParallel.
Using R with low priority VMs to reduce cost
Our second major release comes with full support for low-priority VMs, letting R users run their jobs on Azure’s surplus compute capacity at up to an 80% discount.
For data scientists, low-priority is great way to save costs when experimenting and testing their algorithms, such as parameter tuning (or parameter sweeps) or comparing different models entirely. And Batch takes care of any pre-empted low-priority nodes by automatically rescheduling the job to another node.
You can also mix both on-demand nodes and low-priority nodes. Supplementing your regular nodes with low-priority nodes gives you a guaranteed baseline capacity and more compute power to finish your jobs faster. You can also spin up regular nodes using autoscale to replace any pre-empted low-priority nodes to maintain your capacity and to ensure that your job completes when you need it.
Other new features
Aside from the scenarios that low-priority VMs enable, this new release includes additional tools and common feature asks to help you do the following:
- Parameter tuning & cross validation with Caret
- Job management and monitoring to make it easier to run long-running R jobs
- Leverage resource files to preload data to your cluster
- Additional utility to help you read from and write to Azure Blob storage
- ETL and data prep with Hadley Wickham’s plyr
Getting started with doAzureParallel
doAzureParallel is extremely easy to use. With just a few lines of code, you can register Azure as your parallel backend which can be used by foreach, caret, plyr and many other popular open source packages.
Once you install the package, getting started is as simple as few lines of code:
# 1. Generate your credentials config and fill it out with your Azure information generateCredentialsConfig(“credentials.json”) # 2. Set your credentials setCredentials(“credentials.json”) # 3. Generate your cluster config to customize your cluster generateClusterConfig(“cluster.json”) # 4. Create your cluster in Azure passing, it your cluster config file. cluster <- makeCluster(“cluster.json”) # 5. Register the cluster as your parallel backend registerDoAzureParallel(cluster) # Now you are ready to use Azure as your parallel backend for foreach, caret, plyr, and many more
For more information, visit the doAzureParallel Github page for a full getting started guide, samples and documentation.
We look forward to you using these capabilities and hearing your feedback. Please contact us at razurebatch@microsoft.com for feedback or feel free to contribute to our Github repository.
Additional information:
- Download and get started with doAzureParallel
- For questions related to using the doAzureParallel package, please see our docs, or feel free to reach out to razurebatch@microsoft.com
- Please submit issues via Github
Additional resources:
- See Azure Batch, the underlying Azure service used by the doAzureParallel package
- More general purpose HPC on Azure
- Learn more about low-priority VMs
- Visit our previous blog post on doAzureParallel