Planet scale aggregates with Azure DocumentDB

Udgivet den 14 marts, 2017

Program Manager, Azure DocumentDB

We’re excited to announce that we have expanded the SQL grammar in DocumentDB to support aggregate functions with the last service update. Support for aggregates is the most requested feature on the user voice site, so we are thrilled to roll this out everyone that's voted for it.

Azure DocumentDB is a fully managed NoSQL database service built for fast and predictable performance, high availability, elastic scaling, global distribution, and ease of development. DocumentDB provides rich and familiar SQL query capabilities with consistent low latencies on JSON data. These unique benefits make DocumentDB a great fit for web, mobile, gaming, IoT, and many other applications that need seamless scale and global replication.

DocumentDB is truly schema-free. By virtue of its commitment to the JSON data model directly within the database engine, it provides automatic indexing of JSON documents without requiring explicit schema or creation of secondary indexes. DocumentDB supports querying JSON documents using SQL. DocumentDB query is rooted in JavaScript's type system, expression evaluation, and function invocation. This, in turn, provides a natural programming model for relational projections, hierarchical navigation across JSON documents, self joins, spatial queries, and invocation of user defined functions (UDFs) written entirely in JavaScript, among other features. We have now expanded the SQL grammar to include aggregations in addition to these capabilities.

graphic

Aggregates for planet scale applications

Whether you’re building a mobile game that needs to calculate statistics based on completed games, designing an IoT platform that triggers actions based on the number of occurrences of a certain event, or building a simple website or paginated API, you need to perform aggregate queries against your operational database. With DocumentDB you can now perform aggregate queries against data of any scale with low latency and predictable performance.

Aggregate support has been rolled out to all DocumentDB production datacenters. You can start running aggregate queries against your existing DocumentDB accounts or provision new DocumentDB accounts via the SDKs, REST API, or the Azure Portal. You must however download the latest version of the SDKs in order to perform cross-partition aggregate queries or use LINQ aggregate operators in .NET.

Aggregates with SQL

DocumentDB supports the SQL aggregate functions COUNT, MIN, MAX, SUM, and AVG. These operators work just like in relational databases, and return the computed value over the documents that match the query. For example, the following query retrieves the number of readings from the device xbox-1001 from DocumentDB:

SELECT VALUE COUNT(1) 
FROM telemetry T 
WHERE T.deviceId = "xbox-1001"

(If you’re wondering about the VALUE keyword – all queries return JSON fragments back. By using VALUE, you can get the scalar value of count e.g., 100, instead of the JSON document {"$1": 100})

We extended aggregate support in a seamless way to work with the existing query grammar and capabilities. For example, the following query returns the average temperature reading among devices within a specific polygon boundary representing a site location (combines aggregation with geospatial proximity searches):

SELECT VALUE AVG(T.temperature?? 0) 
FROM telemetry T 
WHERE ST_WITHIN(T.location, {"type": "polygon": … })

As an elastically scalable NoSQL database, DocumentDB supports storing and querying data of any storage or throughput. Regardless of the size or number of partitions in your collection, you can submit a simple SQL query and DocumentDB handles the routing of the query among data partitions, runs it in parallel against the local indexes within each matched partition, and merges intermediate results to return the final aggregate values. You can perform low latency aggregate queries using DocumentDB.

In the .NET SDK, this can be performed via the CreateDocumentQuery<T> method as shown below:

 client.CreateDocumentQuery<int>(
    "/dbs/devicedb/colls/telemetry",
    "SELECT VALUE COUNT(1) FROM telemetry T WHERE T.deviceId  = 'xbox-1001'",
     new FeedOptions { MaxDegreeOfParallelism = -1 });

For a complete example, you can take a look at our query samples in Github

Aggregates with LINQ

With the .NET SDK 1.13.0, you can query for aggregates using LINQ in addition to SQL. The latest SDK supports the operators Count, Sum, Min, Max, Average and their asynchronous equivalents CountAsync, SumAsync, MinAsync, MaxAsync, AverageAsync. For example, the same query shown previously can be written as the following LINQ query:

client.CreateDocumentQuery<DeviceReading>("/dbs/devicedb/colls/telemetry",
     new FeedOptions { MaxDegreeOfParallelism = -1 })
      .Where(r => r.DeviceId == "xbox-1001")
      .CountAsync();

Learn more about DocumentDB’s LINQ support, including how asynchronous pagination is performed during aggregate queries.

Aggregates using the Azure Portal

You can also start running aggregate queries using the Azure Portal right away.

screenshot

Next Steps

In this blog post, we looked at support for aggregate functions and query in Azure DocumentDB. To get started running queries, create a new DocumentDB account from the Azure Portal.

Stay up-to-date on the latest DocumentDB news and features by following us on Twitter @DocumentDB or reach out to us on the developer forums on Stack Overflow.