During //Build 2016, we announced a number of major updates to the Azure DocumentDB service. One of these big updates is that DocumentDB now has built-in server side support for partitioning. Partitioned collections provide a great option for applications that ingest large volume of data at high rates or applications that require high throughput, low latency access to data. In this blog, we cover the top 10 things you need to know about the new partitioning support and how you can build massively scalable applications and services using DocumentDB.
1. What are Collections in DocumentDB?
In DocumentDB, you can store and query schema-less JSON documents with order-of-millisecond response times at any scale. DocumentDB provides containers for data called collections. Each collection can be reserved with throughput ranging from hundreds up to millions of request units per second. The provisioned throughput can be adjusted throughout the life of a collection to adapt to the changing processing needs and access patterns of your application.
Collections are logical resources and can span 1-N physical partitions or servers. The number of partitions is determined by DocumentDB based on the storage size and the provisioned throughput of the collection. Prior to this release, collections mapped 1:1 to partitions.
2. What are Partitions in DocumentDB?
Every partition in DocumentDB has a fixed amount of SSD-backed storage associated with it, and is replicated for high availability (with 99.99% availability SLAs). Partition management is fully performed by Azure DocumentDB, and you, the user, do not have to write complex code or manage your partitions. The provisioned throughput of a collection is distributed evenly among the partitions within a collection.
3. What are Partition Keys in DocumentDB?
When you create a collection, you can now specify a partition key property. This is the JSON property (or path) within your documents that can be used by DocumentDB to distribute data among multiple partitions. DocumentDB will hash the partition key value and use the hashed result to determine the partition in which the JSON document will be stored. All documents with the same partition key will be stored in the same partition, but multiple partition keys may share the same partition.
For example, let's say that you're storing JSON data about employees and your partition key is "department." Then all documents with the value of "department" equal to "engineering" will be stored in the same partition. Similarly, all documents with "department" of "marketing" will be stored in the same partition.
For more information see Partitioning in DocumentDB.
4. Do I have to manage partitioning?
5. Should I partition my collection?
While partitioning is now a core feature of DocumentDB, you can still create a "single partition" collection by omitting the partition key during collection creation. In order to decide if you should partition your collection, here are some key points to consider.
- Single-partition collections have lower price options and the ability to execute queries and perform transactions across all collection data. They have the scalability and storage limits of a single partition (10GB and 10,000 RU/s). You do not have to specify a partition key for these collections. For scenarios that do not need large volumes of storage or throughput, single partition collections are a good fit.
- Partitioned collections can span multiple partitions and support very large amounts of storage and throughput. You must specify a partition key for these collection.
6. How do I choose the right partition key property?
It is important to choose a partition key property that has a number of distinct values, and lets you distribute your workload evenly across these values. As a natural artifact of partitioning, requests involving the same partition key are limited by the maximum throughput of a single partition. Additionally, the storage size for documents belonging to the same partition key is limited to 10GB. An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.
A partition key is also the boundary for transactions in DocumentDB's stored procedures and triggers. You should choose the partition key so that documents that occur together in transactions share the same partition key value. For more information see Designing for Partitioning.
7. How do I create a collection with TBs of data or handle billions of requests per second?
By default, partitioned collections can scale up to 250 GB of storage and 250,000 request units per second. If you need higher storage or throughput per collection, please contact Azure Support to have these increased for your account. DocumentDB supports collections with tens of TB of data and provisioned to handle millions of request units per second.
8. What versions of the SDKs support partitioning?
Partitioning is supported via the REST API version 2015-12-16, and SDK versions 1.6.0 and newer in .NET, Node.js, Java, and Python. The programming model introduces partition keys during collection creation and as an argument for read, write, query, and execute stored-procedure operations on documents. You can also create and manage collections via the Azure Portal.
9. How do I migrate to partitioned collections?
You can use the DocumentDB Data Migration Tool to migrate to a partitioned collection. You must first Export your old collection(s) to JSON files, then Import to a new DocumentDB collection created with a partition key definition and over 10,000 RU/s. If you use client-side partition resolvers, you might be able to simplify your application code by consolidating your data into a single collection using server-side partitioning.
10. How does pricing work with partitioned collections?
Along with partitioned collections, we also introduced a new pricing option for DocumentDB, which decouples storage size from throughput, enabling pay as you go storage and a user-defined throughput level that’s customizable for your application needs. Learn more about the pricing options here.
Get started scaling your database to handle massive storage sizes and throughput using DocumentDB partitioned collections using the Azure Portal or one of the supported SDKs. If you need any help or have questions or feedback, please reach out to us on the developer forums on stack overflow or schedule a 1:1 chat with the DocumentDB engineering team.
Stay up-to-date on the latest DocumentDB news and features by following us on Twitter @DocumentDB.