Earlier this year, we deployed a flat network for Windows Azure across all of our datacenters to create Flat Network Storage (FNS) for Windows Azure Storage. We used a flat network design in order to provide very high bandwidth network connectivity for storage clients. This new network design and resulting bandwidth improvements allows us to support Windows Azure Virtual Machines, where we store VM persistent disks as durable network attached blobs in Windows Azure Storage. Additionally, the new network design enables scenarios such as MapReduce and HPC that can require significant bandwidth between compute and storage.
From the start of Windows Azure, we decided to separate customer VM-based computation from storage, allowing each of them to scale independently, making it easier to provide multi-tenancy, and making it easier to provide isolation. To make this work for the scenarios we need to address, a quantum leap in network scale and throughput was required. This resulted in FNS, where the Windows Azure Networking team (under Albert Greenberg) along with the Windows Azure Storage, Fabric and OS teams made and deployed several hardware and software networking improvements.
The changes to new storage hardware and to a high bandwidth network comprise the significant improvements in our second generation storage (Gen 2), when compared to our first generation (Gen 1) hardware, as outlined below:
Storage Node Network Speed
Networking Between Compute and Storage
Storage Device Used for Journaling
Hardware Load Balancer
Software Load Balancer
The deployment of our Gen 2 SKU, along with software improvements, provides significant bandwidth between compute and storage using a flat network topology. The specific implementation of our flat network for Windows Azure is referred to as the “Quantum 10” (Q10) network architecture. Q10 provides a fully non-blocking 10Gbps based fully meshed network, providing an aggregate backplane in excess of 50 Tbps of bandwidth for each Windows Azure datacenter. Another major improvement in reliability and throughput is moving from a hardware load balancer to a software load balancer. Then the storage architecture and design described here, has been tuned to fully leverage the new Q10 network to provide flat network storage for Windows Azure Storage.
With these improvements, we are pleased to announce an increase in the scalability targets for Windows Azure Storage, where all new storage accounts are created on the Gen 2 hardware SKU. These new scalability targets apply to all storage accounts created after June 7th, 2012. Storage accounts created before this date have the prior scalability targets described here. Unfortunately, we do not offer the ability to migrate storage accounts, so only storage accounts created after June 7th, 2012 have these new scalability targets.
To find out the creation date of your storage account, you can go to the new portal, click on the storage account, and see the creation date on the right in the quick glance section as shown below:
Storage Account Scalability Targets
By the end of 2012, we will have finished rolling out the software improvements for our flat network design. This will provide the following scalability targets for a single storage account created after June 7th 2012.
- Capacity – Up to 200 TBs
- Transactions – Up to 20,000 entities/messages/blobs per second
- Bandwidth for a Geo Redundant storage account
- Ingress - up to 5 gigabits per second
- Egress - up to 10 gigabits per second
- Bandwidth for a Locally Redundant storage account
- Ingress - up to 10 gigabits per second
- Egress - up to 15 gigabits per second
Storage accounts have geo-replication on by default to provide what we call Geo Redundant Storage. Customers can turn geo-replication off to use what we call Locally Redundant Storage, which results in a discounted price relative to Geo Redundant Storage and higher ingress and egress targets (by end of 2012) as described above. For more information on Geo Redundant Storage and Locally Redundant Storage, please see here.
Note, the actual transaction and bandwidth targets achieved by your storage account will very much depend upon the size of objects, access patterns, and the type of workload your application exhibits. To go above these targets, a service should be built to use multiple storage accounts, and partition the blob containers, tables and queues and objects across those storage accounts. By default, a single Windows Azure subscription gets 20 storage accounts. However, you can contact customer support to get more storage accounts if you need to store more than that (e.g., petabytes) of data.
Partition Scalability Targets
Within a storage account, all of the objects are grouped into partitions as described here. Therefore, it is important to understand the performance targets of a single partition for our storage abstractions, which are (the below Queue and Table throughputs were achieved using an object size of 1KB):
- Single Queue– all of the messages in a queue are accessed via a single queue partition. A single queue is targeted to be able to process:
- Up to 2,000 messages per second
- Single Table Partition– a table partition are all of the entities in a table with the same partition key value, and usually tables have many partitions. The throughput target for a single table partition is:
- Up to 2,000 entities per second
- Note, this is for a single partition, and not a single table. Therefore, a table with good partitioning, can process up to the 20,000 entities/second, which is the overall account target described above.
- Single Blob– the partition key for blobs is the “container name + blob name”, therefore we can partition blobs down to a single blob per partition to spread out blob access across our servers. The target throughput of a single blob is:
- Up to 60 MBytes/sec
The above throughputs are the high end targets. What can be achieved by your application very much depends upon the size of the objects being accessed, the operation types (workload) and the access patterns. We encourage all services to test the performance at the partition level for their workload.
When your application reaches the limit to what a partition can handle for your workload, it will start to get back “503 Server Busy” or “500 Operation Timeout” responses. When this occurs, the application should use exponential backoff for retries. The exponential backoff allows the load on the partition to decrease, and to ease out spikes in traffic to that partition.
In summary, we are excited to announce our first step towards providing flat network storage. We plan to continue to invest in improving bandwidth between compute and storage as well as increase the scalability targets of storage accounts and partitions over time.
Brad Calder and Aaron Ogus
Windows Azure Storage