Questions? Feedback? powered by Olark live chat software
Hopp over navigasjon

Cortana Analytics Gallery - a scalable community site built on Azure DocumentDB

Posted on 19 november, 2015

Program Manager, Azure DocumentDB

Co-authored with Elena Apreutesei, Principal Software Engineer, Azure Machine Learning.

CortanaAnalyticsGallery

The Cortana Analytics Gallery is a community driven website used to discover, share, and learn about solutions built from the Cortana Analytics Suite. The Gallery hosts a wide range of solutions; everything from a Retail Forecasting experiment to the ever popular Face APIs used in the How old do I look? app. Machine Learning enthusiasts can share their own experiments using the Azure Machine Learning Studio, a private space focused on Machine Learning experimentation and model creation.

The Cortana Analytics Gallery follows a micro service architecture with single purpose components working together to deliver reliable and durable functionality. This post focuses on the Gallery Catalog API, which is used as the data master within the Gallery user experience.

AchitectureDiagram_v4

*Gallery architecture is subject to change

Gallery UX Web rendering layer with support for both MVC-based server side rendering, as well as client side single page application style rendering.
Gallery Catalog API Canonical store for Gallery entity metadata. Underlying data store is Azure DocumentDB for the Gallery entities and Azure Blob Storage for embedded content such as images. CRUD (create, read, update, delete) operations are exposed through an OData-like RESTful endpoint.
Gallery Index API Full text search endpoint for Gallery entities. Index API relies on Azure Search for search technology.
Storage API Endpoint used to store the physical packaged content of the Studio experiment for the purpose of being published to the Gallery.
Packaging API Endpoint used to prepare packaged content, which is a bundled copy of the experiment the user intends to publish to Gallery. The output is a package URL stored with the Catalog entity metadata.
Activity API Endpoint used to log user activity that impacts trending, dynamic values, such as download count and view count.

Gallery Catalog API

Every Gallery entity has a JSON metadata document stored in Azure DocumentDB and all CRUD (create, read, update, and delete) operations are against DocumentDB. As a micro service, the Gallery Catalog API provides REST APIs and supports OData; both of which were built through the ASP.NET library WebApi. Generally, the Catalog API pushes the query filters provided through ODATA directly down to the DocumentDB LINQ provider as an ExpressionTree, where it gets executed at the database.

The Gallery Catalog service exposes GET / POST / PATCH / DELETE operations on standard WebAPI routes, such as the generic route /entities/{entityId} and the specialized routes /experiments/{entityId}, /collections/{entityId}, /tutorials/{entityId}, etc.

The JSON objects in the REST API payload are simply the serialized form of the Catalog entity data contracts, derived from a common class EntityBase which provides the flexibility to store other entities.

    [DataContract]
    public class Experiment : EntityBase
    {
        public Experiment()
        {
            this.EntityType = EntityType.Experiment;
        }

        [DataMember(Name = "modules")]
        public IList<string> Modules { get; set; }
    }

Catalog entities contain the essential metadata, such as name and summary, for user facing context. For example, the JSON object below maps to:

{
  "entity_type": "Experiment",
  "name": "Predict the remaining useful life of an aircraft engine",
  "summary": "This experiment aims to build a regression model to predict the RUL (Remaining Useful Life) of a specific aircraft engine. ",
  "description": "This experiment aims to build a regression model to predict the RUL (Remaining Useful Life) of a specific aircraft engine, including data...",

In addition, entities contain reference links, such as image_url and package_link. The raw image and package data are both stored in Blob Storage.

  "image_url": "https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/42332590d22a44e5a6e0dac7293e61b3/image",
  "created_at": "2015-02-10T22:25:45.0510842+00:00",
  "updated_at": "2015-02-13T19:25:00+00:00",
  "content": {
    "service_type": "Azure",
    "package_link": "https://storage.azureml.net/directories/f2c4da7bcd4847259f6879a1d4c73889"
  },
  "hidden": false,

Tags and algorithms properties are both arrays of strings, which enables DocumentDB queries to find all Catalog entities related to specific tags and algorithms in real-time.

 

  "tags": [
    "regression",
    "predictive maintenance",
    "prognostics"
  ],
  "algorithms": [
    "Decision Forest Regression",
    "Boosted Decision Tree Regression"
  ],
  "author": {
    "name": "Yan Zhang",
    "avatar_url": "https://cid-335e4711451a0d9f.users.storage.live.com/users/0x335e4711451a0d9f/myprofile/expressionprofile/profilephoto:UserTileStatic,UserTileMedium,Win8Static",
    "id": "FD02E574E0801687BA770AE1993A458FD3DB7090DBD739116F411328FEA31939"
  },

 

The entities also contain download_count, view_count, share_count, and trending properties to order returning results based on popularity. The etag property is provided by DocumentDB and used to manage conflicting edits.

  "download_count": 344,
  "view_count": 730,
  "share_count": 1,
  "trending": 12.34544664797978,
  "index_eid": "e9e50264-b77f-4eb0-bc08-62e20f3ac6ed",
  "id": "42332590d22a44e5a6e0dac7293e61b3",
  "_links": {
    "self": "http://catalog.azureml.net/tenants/14b2744cf8d6418c87ffddc3f3127242/communities/9502630827244d60a1214f250e3bbca7/experiments/42332590d22a44e5a6e0dac7293e61b3"
  },
  "etag": "\"00003501-0000-0000-0000-563ae96c0000\""
}

Sample query used to retrieve all experiments from a specific author:

GET /entities?$orderby=updated_at desc&$filter=author/id eq '151C1FC0FFA9AE788F7872766C4076EA49F8EDD487E674688F98D8F77E22FA4B' and (entity_type eq Microsoft.MachineLearning.Community.Contracts.Catalog.EntityType'Experiment')

Sample query used to get all collections, sorted by trending:

GET /entities?$orderby=trending desc&$filter=(entity_type eq Microsoft.MachineLearning.Community.Contracts.Catalog.EntityType'Collection')

The Gallery Catalog micro service leverages DocumentDB’s schema free nature and LINQ support to build a reliable, flexible, and scalable canonical store for Gallery entities’ metadata.

Next steps

Try out DocumentDB today by signing up for a free trial and create a DocumentDB account. If you need any help or have questions, please reach out to us through the developer forums on StackOverflow or schedule a 1:1 chat with the DocumentDB engineering team.

Stay up-to-date on the latest DocumentDB news and features by following us on Twitter @DocumentDB.