doAzureParallel: Take advantage of Azure’s flexible compute directly from your R session

By JS Tan Program Manager

doAzureParallel: Take advantage of Azure’s flexible compute directly from your R session • 2 min read

Posted on March 16, 2017
2 min read

Users of the R language often require more compute capacity than their local machines can handle. However, scaling up their work to take advantage of cloud capacity can be complex, troublesome, and can often distract R users from focusing on their algorithms.

We are excited to announce doAzureParallel – a lightweight R package built on top of Azure Batch, that allows you to easily use Azure’s flexible compute resources right from your R session. doAzureParallel compliments Microsoft R Server by providing the infrastructure for running massive compute parallel simulations.

At its core, the doAzureParallel package is a parallel backend, for the widely popular foreach package, that lets you execute multiple processes across a cluster of Azure virtual machines. In just a few lines of code, the package helps you create and manage a cluster in Azure, and register it as a parallel backend to be used with the foreach package.

With doAzureParallel, there’s no need to manually create, configure, and manage a cluster of individual virtual machines. Instead, this package makes running your jobs at scale no more complex than running your algorithms on your local machine. With Azure Batch’s autoscaling capabilities, you can also increase or decrease the size of your cluster to fit your workloads, helping you to save time and/or money.

doAzureParallel also uses the Azure Data Science Virtual Machine (DSVM), allowing Azure Batch to easily and quickly configure the appropriate environment in as little time as possible.

There is no additional cost for these capabilities – you only pay for the Azure VMs you use.

doAzureParallel is ideal for running embarrassingly parallel work such as parametric sweeps or Monte Carlo simulations, making it a great fit for many financial modelling algorithms (back-testing, portfolio scenario modelling, etc).

Installation / Pre-requisites

To use doAzureParallel, you need to have a Batch account and a Storage account set up in Azure. More information on setting up your Azure accounts.

You can install the package directly from Github. More information on install instructions and dependencies.

Getting Started

Once you install the package, getting started is as simple as few lines of code:

Load the package:

library(doAzureParallel)

Set up your parallel backend (which is your pool of virtual machines) with Azure:

# 1. Generate your credentials config and fill it out with your Azure information
generateCredentialsConfig(“credentials.json”)

# 2. Set your credentials
setCredentials(“credentials.json”) 

# 3. Generate your cluster config to customize your cluster
generateClusterConfig(“cluster.json”)

# 4. Create your cluster in Azure passing, it your cluster config file.
cluster <- makeCluster(“cluster.json”)

# 5. Register the cluster as your parallel backend
registerDoAzureParallel(cluster)

Run your parallel foreach loop with the %dopar% keyword. The foreach function will return the results of your parallel code.

number_of_iterations <- 10
results <- foreach(i = 1:number_of_iterations) %dopar% {
    # This code is executed, in parallel, across your Azure pool.
    myAlgorithm(…)
}

When developing at scale, it is always recommended that you test and debug your code locally first. Switch between %dopar% and %do% to toggle between running in parallel on Azure and running in sequence on your local machine.

# run your code sequentially on your local machine
results <- foreach(i = 1:number_of_iterations) %do% { … }

# use the doAzureParallel backend to run your code in parallel across your Azure pool
results <- foreach(i = 1:number_of_iterations) %dopar% {…}

After you finish running your R code at scale, you may want to shut down your pool of VMs to make sure that you aren’t being charged anymore:

# shut down your pool
stopCluster(pool)

Monte Carlo Pricing Simulation Demo

The following demo will show you a simplified version of predicting a stock price after 5 years by simulating 5 million different outcomes of a single stock.

Let's imagine Contoso's stock price gains on average 1.001 times its opening price each day, but has a volatility of 0.01. Given a starting price of $100, we can use a Monte Carlo pricing simulation to figure out what price Contoso's stock will be after 5 years.

First, define the assumptions:

mean_change = 1.001
volatility = 0.01
opening_price = 100

Create a function to simulate the movement of the stock price for one possible outcome over 5 years by taking the cumulative product from a normal distribution using the variables defined above.

simulateMovement <- function() {
    days <- 1825 # ~ 5 years
    movement <- rnorm(days, mean=mean_change, sd=volatility)
    path <- cumprod(c(opening_price, movement))
    return(path)
}

On our local machine, simulate 30 possible outcomes and graph the results:

simulations <- replicate(30, simulateMovement())
matplot(simulations, type='l') # plots all 30 simulations on a graph

To understand where Contoso's stock price will be in 5 years, we need to understand the distribution of the closing price for each simulation (as represented by the lines). But instead of looking at the distribution of just 30 possible outcomes, lets simulate 5 million outcomes to get a massive sample for the distribution.

Create a function to simulate the movement of the stock price for one possible outcome, but only return the closing price.

getClosingPrice <- function() {
    days <- 1825 # ~ 5 years
    movement <- rnorm(days, mean=mean_change, sd=volatility)
    path <- cumprod(c(opening_price, movement))
    closingPrice <- path[days]
    return(closingPrice)
}

Using the foreach package and doAzureParallel, we can simulate 5 million outcomes in Azure. To parallelize this, lets run 50 iterations of 100,000 outcomes:

closingPrices <- foreach(i = 1:50, .combine='c') %dopar% {
    replicate(100000, getClosingPrice())
}

After running the foreach package against the doAzureParallel backend, you can look at your Azure Batch account in the Azure Portal to see your pool of VMs running the simulation.

As the nodes in the heat map changes color, we can see it busy working on the pricing simulation.

When the simulation finishes, the package will automatically merge the results of each simulation and pull it down from the nodes so that you are ready to use the results in your R session.

Finally, we'll plot the results to get a sense of the distribution of closing prices over the 5 million possible outcomes.

# plot the 5 million closing prices in a histogram
hist(closingPrices)

Based on the distribution above, Contoso's stock price will most likely move from the opening price of $100 to a closing price of roughly $500, after a 5 year period.

We look forward to you using these capabilities and hearing your feedback. Please contact us at razurebatch@microsoft.com for feedback or feel free to contribute to our Github repository.

Additional information:

Download and get started with doAzureParallel
For questions related to using the doAzureParallel package, please see our docs, or feel free to reach out to razurebatch@microsoft.com
Please submit issues via Github

Additional Resources:

See Azure Batch, the underlying Azure service used by the doAzureParallel package
More general purpose HPC on Azure

doAzureParallel: Take advantage of Azure’s flexible compute directly from your R session

Installation / Pre-requisites

Getting Started

Monte Carlo Pricing Simulation Demo

Additional information:

Additional Resources:

Explore

Related posts

Enabling Diagnostic Logging in Azure API for FHIR®

Durch IRAP-geschützte Compliance von der Infrastruktur in die SAP-Anwendungsschicht in Azure

MileIQ and Azure Event Hubs: Billions of miles streamed

Azure Stack IaaS – part ten

Join the conversation

Vorgestellt

KI + Machine Learning

Analysen

Compute

Container

Datenbanken

DevOps

Entwicklungstools

Hybrid Cloud und Multi Cloud

Identität

Integration

Internet der Dinge

Verwaltung und Governance

Medien

Migration

Mixed Reality

Mobil

Netzwerk

Sicherheit

Speicher

Web

Windows Virtual Desktop

Anwendungsfälle

Anwendungsbereitstellung

KI

Cloudmigration und -modernisierung

Daten und Analysen

Hybrid Cloud und Infrastruktur

Internet der Dinge

Sicherheit und Governance

Organisationstyp

Ressourcen

Installation / Pre-requisites

Getting Started

Monte Carlo Pricing Simulation Demo

Additional information:

Additional Resources:

Explore

Related posts

Join the conversation