• 5 min read

Autoscaling Windows Azure Applications

Editor's Note: This post comes from the Windows Azure Monitoring Team. One of the key benefits of Windows Azure is that you can quickly scale your application in response to changing…

Editor’s Note: This post comes from the Windows Azure Monitoring Team.

One of the key benefits of Windows Azure is that you can quickly scale your application in response to changing demand. In the past, you have had to either manually set the scale of your application, or, use additional tooling (such as WASABi or MetricsHub) to automatically scale your application. However, with these solutions, you may not be able to easily find the ideal balance between performance and cost.

Today, we’re announcing that autoscale is built directly into Windows Azure for:

  • Cloud Services
  • Virtual Machines
  • Web Sites

Autoscale allows Azure to scale your application dynamically on your behalf (without any manual intervention) so you can achieve the ideal performance and cost balance. It’s reactive — regularly adjusting the number of instances in response to the load in your application. Currently, we support two different load metrics:

  • CPU percentage
  • Storage queue depth (Cloud Services and Virtual Machines only)

How to Enable Autoscale

The following are recommend criteria for a component of your service to use autoscale:

  1. The component can scale horizontally (e.g. it can be duplicated to multiple instances)
  2. The component’s load changes over time

If it meets these criteria, then you can leverage autoscale, although the benefit you get out of it depends on how dynamic the load is over time.

To enable autoscale, navigate to the Scale tab in the portal for the service you wish to enable (note that there is no API available to do this programmatically at this time). For Cloud Services, autoscale is configured for each role. For a Virtual Machine, autoscale is configured for each availability set.


Clicking the CPU button exposes all of the controls you need to configure autoscale for scaling by average CPU percentage, and clicking the Queue button exposes the options for scaling by a Storage account queue. No matter the metric, you always define a minimum and maximum number of instances so you can be sure your service will always have a baseline level of performance, and will also never exceed a certain spending level. 


Below the instance range slider, you have controls to adjust the target CPU (or in the case of Queue, target queue depth). The target is where Azure will attempt to keep the metric within by adding or removing instances.  

Once you’ve turned autoscale on, you can return to the Scale tab at any point and select Off to manually set the number of instances. 

Guidance for Autoscale Instance Range

When you first set up autoscale, setting the minimum number of instances appropriately is very important. If your application or site currently has very low load we recommend:

  • For cloud services and virtual machines, 2 instances for high availability
    • The Azure platform requires at least two instances to meet SLA’sFor websites, only 1 instance is required for SLA’s
  • However, if you currently have baseline load that exceeds one instance, or, if you have sudden usage spikes on your service, be sure that you have a higher minimum number of instances to handle the load.

If you have a sudden spike of traffic before Windows Azure checks your CPU usage, your service might not be responsive during that time. If you expect sudden, large amounts of traffic, set the minimum instance count higher to anticipate these bursts.

Guidance for Autoscale Target Metrics

When choosing to autoscale by CPU, you’ll see a new slider. This range represents average CPU usage for the entire role. By default, we recommend 60% to 80% for your target CPU. This means your machines can run very hot ( > 80%) before scaling up, so if you want more conservative metrics, you can reduce both the minimum and maximum.

It is not recommended to set a range that puts the sliders too close to the ends or to each other. If you drag either slider to the end (e.g. 0% to 100%), you will never see any scale actions. If you make the sliders very close to each other, (e.g. 74% to 75%), you will see too many scale actions.

When scaling by storage queue, you first need to select the storage account that contains a queue, and the queue that you want to scale by. Each role can scale by only one queue.

The number of machines autoscale will set you to is determined by the target number of messages per machine. We will divide the number of messages in the queue by the target to get the desired number of machines. Thus, the target should be the average number of messages in the queue that you expect one worker instance to handle.

Virtual Machine Autoscaling

For virtual machines, autoscale will turn on or off machines in an availability set. Because of the recent stop without billing work, you won’t pay for any machines that are stopped. Moreover Virtual Machines now charge you per the minute. This means that if we turn a machine on for 30 minutes to handle additional load, you’ll only be charged for that half an hour!

Unlike web sites or cloud services, virtual machines autoscale cannot create new instances, or delete existing instances. This means you are required to provision all of the machines you think you will need, and add them to the availability set you want to autoscale in advance. Once added, autoscale will manage which VMs are running by looking at your load.

At this time, there is no mechanism to choose which machines are turned on or off – if you have one or more machines that always must remain on, we recommend putting them in a separate availability set. 

How Fast is Autoscale?

The speed that we autoscale your service depends on the metric that are used to scale by. For CPU on cloud services and virtual machines, the metric is average CPU across all of the instances over the past hour. This means that if you have a sudden increase in traffic, scale will not be immediate – it will take some time for the 60 minute running average to increase.

For queue depth, and CPU on web sites, the metric is checked every five minutes, and is not a running hourly average.

For Cloud Services and Virtual Machines, we also expose controls so you can adjust the rate you scale up or down. You can set the step size (e.g. 2 instances at a time), or the wait time between each action (e.g. wait 30 minutes before taking a scale down action). For web sites, the autoscale speed is fixed based on the capabilities of the service.

  Cloud Services and Virtual Machines Web Sites
Scale steps 1 – full range 1 at a time
Scale up wait time 5 minutes* – 2 hours 5 minutes
Scale down wait time 5 minutes* – 2 hours 2 hours

* Although you can set 5 minutes, this does not guarantee a scale action will be taken every 5 minutes. Azure always waits for the previous deployment to complete before taking the next scale action. Thus, depending on how long it takes for your service to deploy, it can take 10 or even 15 minutes between scale actions even if you select the wait time as 5 minutes.

If you want to be more aggressive about scaling up than scaling down, then we recommend setting a higher scale up by than scale down by, or, a lower scale up wait time than scale down wait time


With this latest update of Azure, you can now, in just a few minutes, have Azure automatically adjust the number of instances you have running to keep your service performant at your desired cost.

Autoscale is a preview feature, and will be free for a limited time, until General Availability. During preview, each subscription is limited to 10 autoscale rules across all of the resources we support (Web sites, Cloud services or Virtual Machines). If you encounter this limit, you can disable autoscale for any resource to enable it for another. 

For more details of how autoscale works, check out our help topics: