Azure #CosmosDB: Introducing Per Minute (RU/m) provisioning to lower your cost, increase your performance

5月 17, 2017 に投稿済み

Senior Program Manager

Azure Cosmos DB Request unit per minute capability

Last week at our annual Build conference, we’ve announced Azure Cosmos DB – our globally-distributed, multi-model database service and a set of new capabilities to enable developers to build apps that are out of this world. As a part of those new capabilities, customers can now provision request unit (RU) throughput at a per minute granularity – we call it RU/m. This new option, currently in preview with a 50% discount, is very complementary with the existing request unit per second (RU/s) provisioning model. With RU/s, you get a predictable performance at the granularity of a second, but it also means that you must provision for spikes and bursty workloads to avoid throttling. Now, with RU/m, you can consume more of what you provision and save on costs. No need to provision for peak anymore!

By combining provisioning per second with provisioning per minute, you now can:

  • Address workloads with large spikes
  • Fit the workload patterns that need minute granularity (common in IoT)
  • Have the flexibility in a dev/test environment: the first thing our developers want to do is to code and not have to think of how many request units they need
  • Substantially lower your per-second provisioning needs and save up to 60% in costs, since you don’t need to provision for your peak workloads anymore

With Azure Cosmos DB, our philosophy is to continuously innovate to deliver more value to our customers at a lower cost. This new option combines both. Here is what some of our early adopter customers say about this new and exciting capability:

  • “RU/m is a real game changer for us, we see more than doubled “performance” in our load tests that simulate typical user’s behavior. And more importantly we are not blocked during temporary spikes of user’s activity.” - Sergii Kram, Lead Software Engineer, Quest
  • “The RU/m feature is exactly what our project needed.  Previously we had to provision our service to four times our normal max load so that we didn’t throttle requests during spikes in traffic.  With the new RU/m feature, we were able to drastically reduce our DocDB cost and completely eliminate throttling during those spikes.” - Tyler Hennessy, Senior Software Engineer, Xbox
  • “We will definitely use this feature to avoid overprovisioning and save money. Our traffic pattern is very “spiky” (multiple parallel data collection threads dump data hourly) and enabling RU/m provisioning provided the same service quality with a much lower overall throughput. An iterative tuning approach of “adjust-and-monitor” allowed us to scale the setup to a usable production configuration in a few days.” - Andreas Schiffler, Senior Software Engineer, Windows Servicing & Delivery – Data Analytics (WSD DA)

How does RU/m work?

RU/m is aligned with RU/s. Most important, RU/m can be enabled with a click in the portal or with a single line of code by using the SDKs. The amount of RU/m you get is linear with how many RU/s you provision.

  • RU/m is billed hourly and in addition to reserved RU/s. You can consider RU/m as a flexible budget to consume RUs within a minute. Pricing is fixed so you will always get low cost and financial predictability, without taking the risk of variable pricing.
  • RU/m can be enabled at container level. This can be done through the SDKs (Node.js, Java or .Net) or through the portal (also include MongoDB API workloads).
  • For every 100 RU/s provisioned, you also get 1,000 RU/m provisioned (the ratio is 10x). This means that if you get 1,000 RU/s with 10,000 RU/m for a full month, you will spend $80/month ($60 for 1,000 RU/s+$20 for 10,000 RU/m with preview pricing).
  • At a given second, a request unit will consume your RU/m provisioning only if you have exceeded your per second provisioning.
  • Within a 60 second period (UTC), the per minute provisioning is refilled.
  • RU/m can be enabled only on containers with no more than 5,000 RU/s per partition provisioned.
  • You can decide which type of operations can access the RU/m budget. As an example, you can decide to use RU/m budget only for critical operations and disable RU/m for ad-hoc operations (e.g.: Queries, find more in the documentation).

A concrete example

Below is a concrete example, in which a customer can provision 10k RU/s with 100k RU/m, saving 73% in cost against provisioning for peak (at 50k RU/s). During a 90-second period on a collection that has 10,000 RU/s and 100,000 RU/m provisioned:

  • Second 1: The RU/m budget is set at 100,000
  • Second 3: During that second the consumption of request units was 11,010 RUs, 1,010 RUs above the RU/s provisioning. Therefore, 1,010 RUs are deducted from the RU/m budget. 98,990 RUs are available for the next 57 seconds in the RU/m budget.
  • Second 29: During that second, a large spike happened (>4x the per second provisioning) and the consumption of request units was 46,920 RUs. 36,920 RUs are deducted from the RU/m budget that dropped from 92,323 RUs (28th second) to 55,403 RUs (29th second).
  • Second 61: RU/m budget is refilled to 100,000 RUs.

Azure Cosmos DB RU Consumption and Provisioning

Enabling/Disabling RU/m

You can enable RU/m at the container level through the SDK or the portal. Through the portal, you only need to click on scale, select the container you want and enable RU/m.


To learn how to provision RU/m through the SDK, please refer to the documentation. Currently RU/m is available for the following SDKs:

  • .Net 1.14.1
  • Java 1.11.0
  • Node.js 1.12.0
  • Python 2.2.0

Support for other SDKs will be added soon.

Scenarios and Impact with Early Adoption Customers

During our beta preview, we’ve identified some interesting and illustrative scenarios to test how big a performance improvement and how much savings our customers were able to achieve with RU/m at scale and worldwide. By referring to those scenarios and our multi-step approach to gradually optimize your throughput, we hope you can also replicate the same improvements. If you refer to the documentation, you will be able to see how the portal metrics can be used to monitor throttling and RUs consumption.

Example 1: Leverage RU/m to reduce throttling

In e-commerce scenario, a retailer may expect spikes when a merchant registers a new batch of items in their inventory. A customer had a container with 400,000 RU/s provisioned, and 1.68% of the requests were throttled due to insufficient provisioning for spikes.

  1. As soon as RU/m provisioning was enabled, this customer experienced an 88% drop in throttled requests (down to 0.2%).
  2. As a second step, this customer lowered their provisioned capacity at per second granularity - from 400,000 RU/s to 300,000 RU/s with a throttling rate of 0.25%.
  3. As a third step, the customer lowered throughput provisioning to 200,000 RU/s (and 2m RU/m) with a throttling rate of 1.12%.
  4. Finally, their ideal provisioning level was found at 250K RU/s with 2.5 million RU/m.


  • 17% cost saving on provisioning
  • 80% of throttling eliminated

Request Consumption 5min-granularity

Example 2: Reduce throttling and lower provisioning costs with a spiky workload

In this case, a customer was storing a telemetry data for devices with very spiky needs due to sporadic queries. This customer had a partitioned container with 100,000 RU/s provisioned. Due to spiky needs and despite high provisioning, this customer experienced some throttling (0.0109% of requests being throttled).

  1. Right after enabling RU/m, the ratio of throttled requests dropped to 0.000567%, representing 95% elimination of throttling.
  2. As a second step, they lowered the provisioning to 80K RU/s + 800K RU/m and were still able to hold the same ratio of 0.000677% throttled requests.
  3. As a third step, they decreased the provisioning to 50K RU/s + 500K RU/m. Throttling increased to 0.0121%, so the customer decided to increase back the provisioning per second to 60K RU/s + 600K RU/m. Throttling dropped back 0.00199%.


  • 20% cost saving on provisioning
  • 80% of throttling eliminated

Example 2 - Request Consumption

Example 3: Lower provisioning cost and eliminate small throttling

A customer from the gaming industry stored data mainly with a predictable access but just a few small spikes. They had provisioned 8,000 RU/s for one single partition and experienced a little bit of throttling (with 0.000053% of requests to be throttled). RU/m was a perfect capability to eliminate any throttling and give the customer a peace of mind. Working together, we also quickly realized that their workload had the potential to be further optimized.

  1. First, to enable RU/m, we had to lower their single partition provisioning to 5,000 RU/s (RU/m works only on partitions with a maximum of 5,000 RU/s). Despite a drop of 3,000 RU/s in provisioning, we were able to eliminate all throttling.
  2. Since the consumption of RU/m was minimal, this was a signal that we could lower the provisioning to 4,000 RU/s while keeping RU/m. They didn’t experience any throttling and were able to use more than 18% of what they provisioned.
  3. As seen in the graph below, we ended up provisioning only 2,000 RU/s with 20,000 RU/m while eliminating all the throttling.
    Their average cost of consumed RU was lower than any existing cloud service with throughput provisioning or consumption. Their average cost amounted to less than $0.10 per million RUs consumed, 75% cheaper than object store read transactions.

Example 3


  • 53% cost saving on provisioning
  • 100% of throttling eliminated (initially at low level)

Example Use-Cases Summary


Example 1

Example 2

Example 3

Initial Throughput

400,000 RU/s

100,000 RU/s

8,000 RU/s

Final Throughput

250,000 RU/s + 2,500,000 RU/m

60,000 RU/s + 600,000 RU/m

2,000 RU/s + 20,000 RU/m


17% cost saving

80% of throttling eliminated

20% cost saving

80% of throttling eliminated

53% cost saving

100% of throttling eliminated


Our vision is to be the most trusted database service for all modern applications. We want to enable developers to truly transform the world we are living in through the apps they are building, which is even more important than the individual features we are putting into Azure Cosmos DB. We spend limitless hours talking to customers every day and adapting Azure Cosmos DB to make the experience truly stellar and fluid. We hope that RU/m capability will enable you to do more and will make your development and maintenance even easier!

So, what are the next steps you should take?

  • First, understand the core concepts of Azure Cosmos DB
  • Learn more about RU/m by reading the documentation:
    • How RU/m works
    • Enabling and disabling RU/m
    • Good use cases
    • Optimize your provisioning
    • Specify access to RU/m for specific operations
  • Visit the pricing page to understand billing implications

If you need any help or have questions or feedback, please reach out to us through Stay up-to-date on the latest Azure Cosmos DB news (#CosmosDB) and features by following us on Twitter @AzureCosmosDB and join our LinkedIn Group.

About Azure Cosmos DB

Azure Cosmos DB started as “Project Florence” in the late 2010 to address developer pain-points faced by large scale applications inside Microsoft. Observing that the challenges of building globally distributed apps are not a problem unique to Microsoft, in 2015 we made the first generation of this technology available to Azure Developers in the form of Azure DocumentDB. Since that time, we’ve added new features and introduced significant new capabilities.  Azure Cosmos DB is the result. It represents the next big leap in globally distributed, at scale, cloud databases.