MSN.com is one of the most trusted web brands. It is the #1 portal across 26 markets worldwide and currently supports over half a billion monthly users across the world. MSN.com, which is part of the Information and Content Experiences (ICE) group at Microsoft, consists of a rich set of consumer facing vertical experience including Health and Fitness, Finance, Autos, Entertainment, Weather, Food and Drink, Video, Travel, News, and Sports. Each of these vertical experiences are designed to be highly scalable, loosely coupled and support millions of users worldwide. These verticals are also released as separate applications across multiple platforms (e.g. Windows Phone, iOS, and Android) as well as in the browser.
Looking for a massively scalable, schema-free and queryable distributed data store
Over the years, MSN.com evolved to use various purpose built data stores and caching solutions. Earlier this year, various teams within MSN realized the need for a unified distributed storage system across all of the verticals within MSN. The unification was crucial to enable a variety of new vertical and social computing capabilities across devices. ICE teams decided to revamp and replace their existing storage solutions with a single Azure based distributed storage system called User Data Store (UDS). The UDS needed to support the following:
- Scale requirements to support +425M unique MSN users with +100M direct authenticated users. Initial capacity requirements for 20TB of document storage.
- Under 15ms write latency and single digit read latencies for 99% requests.
- Authorization scopes across the same underlying data.
- Schema free storage with rich query and transaction support.
- Data model extensions to support the diverse set of verticals schemas.
- Hadoop based analytics on top of the data.
- Available globally to serve all MSN markets and users.
After evaluating various solutions on Azure, MSN decided to use Azure DocumentDB as a core component of UDS. Health and Fitness is one of the first MSN verticals going live on Azure DocumentDB with the new MSN launch. The other MSN verticals will soon start using an updated UDS architecture layered on top of Azure DocumentDB.
Health and Fitness Scenarios
The Health and Fitness vertical further consists of various features including:
- Diet Tracker: Users can maintain their daily diet intake journal. Each diet entry is associated with a meal and has calories, fat, carbs and protein etc.
- Exercise Tracker: Users journal cardio exercise with distance, time and calories.
- GPS Tracker: A user can run with her GPS enabled phone. The run is recorded along with GPS co-ordinates. Run meta-data is stored in DocumentDB.
- Pedometer: User steps are tracked on pedometer enabled phones (all new Nokia phones) and stored in UDS.
- Weight Tracker. Users can track their weight.
- Analysis: Analysis on historical data for diet, exercise, GPS and steps data.
- Favorites and custom: A user can store favorite foods and exercises. User can also create custom food and exercises with associated metadata. Planned to extend favorite feature to other data types like articles, health conditions, yoga etc.
With the launch of the new MSN portal, users interacting with the Health and Fitness vertical are storing and querying their data using Azure DocumentDB on the back end. At the time of the launch, the Health and Fitness vertical had provisioned 150 capacity units (CUs) of SSD backed document storage and provisioned throughput across three geographic regions. Health and Fitness applications create a single DocumentDB database account in each region with a single DocumentDB database. Health and Fitness applications have configured the DocumentDB database accounts with Session consistency which guarantees monotonic reads, writes and read-your-own-write guarantees. This delivers very high read scalability and meets the latency targets for the writes. Each DocumentDB database in-turn contains a set of collections, each containing documents belonging to a set of MSN users. The documents vary from 1KB – 10KB in size. Documents within a collection are not expected to have any schema in common. Most collections are configured to get the optimal write throughput, automatic indexing for certain document paths, and minimal index overhead. The overall architecture of the User Data Store that will support various verticals across multiple platforms is depicted below. The architecture leverages the full capabilities of DocumentDB including the ability to store a divergent set of schemas while providing rich query capabilities across all verticals. The new MSN User Data Store is built on Azure DocumentDB The UDS framework distributes user information across multiple collections based on available capacity. Each user’s data is saved in a document. UDS maintains a horizontal scale solution by distributing documents to a set of collections based on their user ID and thereby achieving scalability, high throughput and efficient query. Saud Alshibani, the Principal Architect behind the MSN’s User Data Store and his team have been working closely with the Azure DocumentDB team. Saud summarizes his experience with DocumentDB in the following words.
Congratulations to the MSN team on the launch of the new portal. The DocumentDB team is thrilled to be a part of this journey!