Azure Stream Analytics and DocumentDB for your IoT application

已于 十月 13, 2015 发布

Principal Program Manager, Azure Stream Analytics

Microsoft’s trusted Azure cloud platform is used by businesses worldwide to harness the power of data and create actionable intelligence. As the Internet of Things emerges as a change agent in the enterprise, having the right tools to easily develop and deploy IoT solutions is more important than ever.

In addition to our recent launch of the Azure IoT Suite, Microsoft has enhanced integration between Azure Stream Analytics and Azure DocumentDB for even faster configuration—and decreased time to market. Stream Analytics jobs can now write processed data to DocumentDB, enabling archival and low latency querying over unstructured JSON data. The ability to output Stream Analytics data to DocumentDB has been highly requested and was the top-voted idea on the Azure Feedback Forum.

In this blog post, we will highlight the benefits of using these two services on top of your streaming data and explore how to configure them together.

Built for scale and performance

Many customers with stream processing solutions need to be able to execute flexible, low latency queries over their processed data. For example, in the toll booth scenario, millions of cars cross tolling stations every minute and require a near real time record of their account balance and toll station activities.  Such an application requires a highly scaled out scenario where events from several toll booths can be processed and persisted in parallel.

DocumentDB integrates naturally and seamlessly with Azure Stream Analytics, due to the fact that the data is stored natively in JSON without requiring any schema or transformations. DocumentDB’s automatic indexing at scale allows relational, hierarchical and spatial querying of rapidly evolving IoT data without requiring schema and any secondary indexes. You can learn more about DocumentDB’s query capabilities here.

Stream Analytics allows you to specify a list of DocumentDB collections, and automatically partitions incoming data based on the chosen PartitionKey value into the specified collections. This allows you to ingest and index large volumes of data by scaling out horizontally. You can also configure the performance levels of individual DocumentDB collections, and modify DocumentDB consistency levels and indexing policies in order to make fine grained performance tradeoffs.

Multiple write modes

Stream analytics supports two modes of writing to DocumentDB: upsert and append.

When the data contains an “id” property, Stream Analytics will upsert (insert or replace) documents when writing to DocumentDB. For example, this is useful when updating the latest activity data for a vehicle in the toll both scenario.

When an “id” property is not available or set to a unique identifier, Stream Analytics will create new documents. This is useful for archiving the entire history of a vehicle’s activity data.

How to set up your DocumentDB output

  1. In either the Azure Preview Portal or the Azure Management Portal, navigate to the Outputs of a Stream Analytics job and click Add.

    Outputs_Stream Analytics
  2. Select DocumentDB as the source type and provide the required connection and configuration information. Details on each of these fields can be found in Understanding Stream Analytics outputs.

    DocumentDB_Source Type
  3. Click the Create button to finish setting up your input.

Stay tuned for more features and updates in the IoT Suite. We look forward to hearing your feedback on the integration between Stream Analytics and DocumentDB on the Stream Analytics Forum and Azure Feedback Forum.