Windows Azure Storage at PDC 2009

Last week at PDC 2009, we announced several new features for Windows Azure Storage. Windows Azure Storage enables applications to store and manipulate large objects and files in the cloud via Blobs, store and manipulate service state via Tables, and provide reliable delivery of messages using Queues.  In addition, we also announced two new features coming soon to storage:

  • Windows Azure XDrive – This allows your Windows Azure compute applications running in our cloud to use the existing NTFS APIs to store their data in a durable drive. The drive is backed by a Windows Azure Page Blob formatted as a single NTFS volume VHD.   The Page Blob can be mounted as a drive within the Windows Azure cloud, where all non-buffered/flushed NTFS writes are made durable to the drive (Page Blob).   If the application using the drive crashes, the data is kept persistent via the Page Blob, and can be remounted when the application instance is restarted or remounted elsewhere for a different application instance to use.   Since the drive is an NTFS formatted Page Blob, you can also use the standard blob interfaces to uploaded and download your NTFS VHDs to the cloud.
  • Geo-Replication – We have been working hard to provide geo-replication between our geo-locations for your data.  We described at PDC 2008 how we replicate your data to keep multiple copies within the geo-location your storage account is allocated to.   When you do an update to your storage account we return success back when the update has been committed to multiple copies in that geo-location.  This keeps your data durable at the geo-location, and this hasn’t changed.   What geo-replication provides on top of this, is that after your update has been committed to your storage account in its primary geo-location, we quickly geo-replicate the update to another geo-location in the same geo-region (e.g., within the US, within Europe, within Asia).   This is an important feature for Geo-Disaster Contingency Planning, since this allows keeping multiple copies of your data in different geo-locations.

For more details on Windows Azure XDrive and Geo-Replication as well as the new Blob features listed below, please see the talk and slides from PDC 2009 on “Windows Azure Blob and Drive Deep Dive”.

At PDC 2009 there were also two other talks focused on Windows Azure Storage that you may want to check out:

  • Windows Azure Tables and Queues Deep Dive – Deep dive into key areas for Windows Azure Tables and Queues.    This includes how to choose partitioning keys and what makes for fast and efficient queries for Tables.  For Queues, we describe some future features we will be providing such as (a) removing the time limit for how long a message can stay in the queue, (b) removing the time limit for how long the invisibility time can be, and (c) allowing you to change the invisibility time of a dequeued message at anytime.
  • Building Scalable and Reliable Applications with Windows Azure – In terms of Windows Azure Storage, this talk describes the design target for scalability of storage accounts, Blobs, Entities, and Messages for commercial availability, and describes at a high level how we automatically load balance your data within a geo-location to meet the peak traffic demands for your data.   It also describes how to use Queues to create scalable and reliable workflow for your computation, and describes how to use rolling upgrade to perform a schema change (add a new property) for your Tables.

For the PDC 2009 new features, these are versioned using “x-ms-version: 2009-09-19”. All prior versions of commands executed against the storage system will continue to work, as we extend the capabilities of the existing commands and introduce new commands.

With the PDC 2009 release, we now support two types of blobs:

  • Block Blob – This is the blob type that we have offered since PDC 2008, which is optimized for streaming workloads. Each blob consists of a sequence of blocks, and each block is identified by a unique Block ID relative to the blob. This type of blob allows blocks to be uploaded with PutBlock and then committed with PutBlockList. Block Blobs can now be up to 200GB in size.
  • Page Blob – With this release we have added a new type of blob optimized for random reads/writes called Page Blob. A Page Blob consists of an array of pages and each page is identified by its offset from the start of the blob. PutPage is used to perform a ranged put on the blob, and the update is applied immediately to the blob. In addition, regions of the blob can be cleared with ClearPage, and these cleared regions do not consume storage space. This means a storage account is only charged for the pages within a Page Blob with data stored in them. A Page Blob can be up to 1TB in size.

Enhancements for both types of blobs:

  • Content Delivery Network – Windows Azure CDN can be used to cache your Windows Azure Blobs at strategically placed locations to provide maximum bandwidth for delivering your content to users. You can now specify the HTTP Cache-Control policy for each blob, and that will determine the length of time in which the blob will be cached in the Windows Azure CDN. You can specify the time-to-live (TTL) in the Cache-Control to be as small as you want, but remember you only benefit from the CDN if your TTL is long enough and content popular enough to get cache hits when serving the data out of Windows Azure CDN. See here for more details on Windows Azure CDN:
    http://blogs.msdn.com/windowsazure/archive/2009/11/05/introducing-the-windows-azure-content-delivery-network.aspx
  • Custom Storage Domain Names – The custom storage domain name feature allows you to register a custom domain name for a given storage account, and to use that custom domain to access your blobs instead of the blob service URL:http://<account>.blob.core.windows.net/<container>/<blobname> . With the release, custom storage domain names now work with authenticated access as well as for anonymous access. See here for more details on custom storage domain names:
    http://blogs.msdn.com/windowsazure/archive/2009/11/05/accessing-windows-azure-blobs-using-custom-storage-domain-names.aspx
  • Snapshot Blob – Allows the creation of read-only versions of a blob, which can be used for creating blob backups or blob versioning. An account is only charged for the unique blocks or pages; blocks or pages shared across snapshots and the base blob from which they were derived do not accrue additional storage charges.
  • Lease Blob – Clients can now acquire a lease on a blob for exclusive write access to that blob. A lease will lock the blob for exclusive writing until the lease expires, while still allowing non-exclusive read access to the blob. The initial version of Lease Blob supports only one-minute leases, but the leases can be renewed to allow clients to maintain the lock for longer periods of time. Lease Blob is useful when dealing with high concurrent writes to Page Blob.
  • Get Blob – This is used to retrieve both block and page blobs. In addition, it now provides an option to return a dynamically generated MD5 for ranged reads that are less than or equal to 4MB in size.
  • List Blobs – Applications can now retrieve each blob’s application metadata and MD5 information when listing blobs.
  • Blob Properties – Applications can now update a blob’s properties independently of the blob, and can specify the standard HTTP Cache-Control property for blobs.
  • Root Container – Anonymous access is now provided for blobs stored in the root container. This was an important missing feature for supporting cross domain policy access for Silverlight. For example, you can now specify the following cross domain policy file in the root blob container:  http://account.blob.core.windows.net/clientaccesspolicy.xml

Enhancement for Queue:

  • Dequeue Count – We now return a dequeue count for each message retrieved from Windows Azure Queue. This allows applications to see how many times a message has been dequeued. 

In addition to the above new features we made the following semantic changes as part of this versioned CTP release:

  • Anonymous Blob Access – All blob containers having their access set to public using the “x-ms-version: 2009-09-19” version of the blob API will have their anonymous requests processed using the 2009-09-19 version of the blob APIs. Containers that are set to public with a prior version will still have their anonymous requests processed with the CTP2008 version of the blob APIs.
  • Blob and Queue Authentication – Support has been added for an improved signing algorithm for enhanced security by including additional information as part of the canonicalization of the String-to-Sign.
  • Listing Containers, Queues and Blobs – Changed the response format for listing operations to be more XML friendly.
  • Blob and Queue Metadata Naming – Metadata for a container or blob resource is stored as a name-value pair associated with the resource. Metadata names must now adhere to the naming rules for C# identifiers
  • Table Query – A table query is allowed to execute for up to 5 seconds before returning a result and potential continuation.
  • Table DataService Version – All REST calls to the Table service must now include the DataServiceVersion andMaxDataServiceVersion headers on every request. Applications using the Astoria client library already send the required headers.

The above features for this new release are available via the Windows Azure Storage REST interface, and they are also supported as part of the new Storage Client Library just released with the Windows Azure SDK.

For more information, details about these new features can be found in the MSDN documentation here: http://msdn.microsoft.com/en-us/library/dd894041.aspx.

As always, we appreciate any feedback you might have.

Brad Calder
Windows Azure Storage