Introducing Change Feed support in Azure DocumentDB

Gepost op 14 december, 2016

Principal Program Manager, Azure Cosmos DB

As of May 10th 2017,

Azure Cosmos DB is Microsoft’s globally distributed multi-model database. Azure Cosmos DB was built from the ground up with global distribution and horizontal scale at its core. It offers turnkey global distribution across any number of Azure regions by transparently scaling and replicating your data wherever your users are. Elastically scale throughput and storage worldwide, and pay only for the throughput and storage you need. Azure Cosmos DB guarantees single-digit-millisecond latencies at the 99th percentile anywhere in the world, offers multiple well-defined consistency models to fine-tune performance, and guarantees high availability with multi-homing capabilities—all backed by industry leading service level agreements (SLAs). 

Azure Cosmos DB is truly schema-agnostic; it automatically indexes all the data without requiring you to deal with schema and index management. It’s also multi-model, natively supporting document, key-value, graph, and column-family data models. With Azure Cosmos DB, you can access your data using APIs of your choice, as DocumentDB SQL (document), MongoDB (document), Azure Table Storage (key-value), and Gremlin (graph) are all natively supported.


We’re excited to announce the availability of Change Feed support in Azure DocumentDB! With Change Feed support, DocumentDB provides a sorted list of documents within a DocumentDB collection in the order in which they were modified. This feed can be used to listen for modifications to data within the collection and perform actions such as:

  • Trigger a call to an API when a document is inserted or modified
  • Perform real-time (stream) processing on updates
  • Synchronize data with a cache, search engine, or data warehouse

DocumentDB's Change Feed is enabled by default for all accounts, and does not incur any additional costs on your account. You can use your provisioned throughput in your write region or any read region to read from the change feed, just like any other operation from DocumentDB.

In this blog, we look at the new Change Feed support, and how you can build responsive, scalable and robust applications using Azure DocumentDB.

Change Feed support in Azure DocumentDB

Azure DocumentDB is a fast and flexible NoSQL database service that is used for storing high-volume transactional and operational data with predictable single-digit millisecond latency for reads and writes. This makes it well-suited for IoT, gaming, retail, and operational logging applications. These applications often need to track changes made to DocumentDB data and perform various actions like update materialized views, perform real-time analytics, or trigger notifications based on these changes. Change Feed support allows you to build efficient and scalable solutions for these patterns.

Many modern application architectures, especially in IoT and retail, process streaming data in real-time to produce analytic computations. These application architectures (“lambda pipelines”) have traditionally relied on a write-optimized storage solution for rapid ingestion, and a separate read-optimized database for real-time query. With support for Change Feed, DocumentDB can be utilized as a single system for both ingestion and query, allowing you to build simpler and more cost effective lambda pipelines. For more details, read the paper on DocumentDB TCO.

 

clip_image002

Stream processing: Stream-based processing offers a “speedy” alternative to querying entire datasets to identify what has changed. For example, a game built on DocumentDB can use Change Feed to implement real-time leaderboards based on scores from completed games. You can use DocumentDB to receive and store event data from devices, sensors, infrastructure, and applications, and process these events in real-time with Azure Stream Analytics, Apache Storm, or Apache Spark using Change Feed support.

Triggers/event computing: You can now perform additional actions like calling an API when a document is inserted or modified. For example, within web and mobile apps, you can track events such as changes to your customer's profile, preferences, or location to trigger certain actions like sending push notifications to their devices using Azure Functions or App Services.

Data Synchronization: If you need to keep data stored in DocumentDB in sync with a cache, search index, or a data lake, then Change Feed provides a robust API for building your data pipeline. Change feed allows you to replicate updates as they happen on the database, recover and resume syncing when workers fail, and distribute processing across multiple workers for scalability.

 

Working with the Change Feed API

Change Feed is available as part of REST API 2016-07-11 and SDK versions 1.11.0 and above. See Change Feed API for how to get started with code.

 

 clip_image004

The change feed has the following properties:

  • Changes are persistent in DocumentDB and can be processed asynchronously.
  • Changes to documents within a collection are available immediately in the change feed.
  • Each change to a document appears only once in the change feed. Only the most recent change for a given document is included in the change log. Intermediate changes may not be available.
  • The change feed is sorted by order of modification within each partition key value. There is no guaranteed order across partition-key values.
  • Changes can be synchronized from any point-in-time, that is, there is no fixed data retention period for which changes are available.
  • Changes are available in chunks of partition key ranges. This capability allows changes from large collections to be processed in parallel by multiple consumers/servers.
  • Applications can request for multiple Change Feeds simultaneously on the same collection.

Next Steps

In this blog post, we looked the new Change Feed support in Azure DocumentDB.