We recently published a paper describing the internal details of Windows Azure Storage at the 23rd ACM Symposium on Operating Systems Principles (SOSP).
The paper can be found here. The conference also posted a video of the talk here. The slides for the talk are available here.
The paper describes how we provision and scale out capacity within and across data centers via storage stamps, and how the storage location service is used to manage our stamps and storage accounts. Then it focuses on the details for the three different layers of our architecture within a stamp (front-end layer, partition layer and stream layer), why we have these layers, what their functionality is, how they work, and the two replication engines (intra-stamp and inter-stamp). In addition, the paper summarizes some of the design decisions/tradeoffs we have made as well as lessons learned from building this large scale distributed system.
A key design goal for Windows Azure Storage is to provide Consistency, Availability, and Partition Tolerance (CAP) (all 3 of these together, instead of just 2) for the types of network partitioning we expect to see for our architecture. This is achieved by co-designing the partition layer and stream layer to provide strong consistency and high availability while being partition tolerance for the common types of partitioning/failures that occur within a stamp, such as node level and rack level network partitioning.
In this short conference talk we try to touch on the key details of how the partition layer provides an automatically load balanced object index that is scalable to 100s of billions of objects per storage stamp, how the stream layer performs its intra-stamp replication and deals with failures, and how the two layers are co-designed to provide consistency, availability, and partition tolerant for node and rack level network partitioning and failures.
Brad Calder