Real-time analytics on big data architecture

Azure Analysis Services
Azure Event Hubs
Azure Synapse Analytics

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

This solution idea describes how you can get insights from live streaming data. Capture data continuously from any IoT device, or logs from website clickstreams, and process it in near-real time.

Architecture

Diagram of a real-time analytics solution on a big data architecture that uses Azure Synapse Analytics with Azure Data Lake Storage, Event Hubs, Azure Analysis Services, Azure Cosmos DB, and Power BI.

Download a Visio file of this architecture.

Dataflow

  1. Easily ingest live streaming data for an application, by using Azure Event Hubs.
  2. Bring together all your structured data using Synapse Pipelines to Azure Blob Storage.
  3. Take advantage of Apache Spark pools to clean, transform, and analyze the streaming data, and combine it with structured data from operational databases or data warehouses.
  4. Use scalable machine learning/deep learning techniques, to derive deeper insights from this data, using Python, Scala, or .NET, with notebook experiences in Apache Spark pools.
  5. Apply Apache Spark pool and Synapse Pipelines in Azure Synapse Analytics to access and move data at scale.
  6. Build analytics dashboards and embedded reports in dedicated SQL pool to share insights within your organization and use Azure Analysis Services to serve this data to thousands of users.
  7. Take the insights from Apache Spark pools to Azure Cosmos DB to make them accessible through real time apps.

Components

  • Azure Synapse Analytics is the fast, flexible, and trusted cloud data warehouse that lets you scale, compute, and store elastically and independently, with a massively parallel processing architecture.
  • Synapse Pipelines Documentation allows you to create, schedule, and orchestrate your ETL/ELT workflows.
  • Azure Data Lake Storage: Massively scalable, secure data lake functionality built on Azure Blob Storage
  • Azure Synapse Analytics Spark pools is a fast, easy, and collaborative Apache Spark-based analytics platform.
  • Azure Azure Event Hubs Documentation is a big data streaming platform and event ingestion service.
  • Azure Cosmos DB is a globally distributed, multi-model database service. Then learn how to replicate your data across any number of Azure regions and scale your throughput independent from your storage.
  • Azure Synapse Link for Azure Cosmos DB enables you to run near real-time analytics over operational data in Azure Cosmos DB, without any performance or cost impact on your transactional workload, by using the two analytics engines available from your Azure Synapse workspace: SQL Serverless and Spark Pools.
  • Azure Analysis Services is an enterprise grade analytics as a service that lets you govern, deploy, test, and deliver your BI solution with confidence.
  • Power BI is a suite of business analytics tools that deliver insights throughout your organization. Connect to hundreds of data sources, simplify data prep, and drive unplanned analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices.

Alternatives

  • Synapse Link is the Microsoft preferred solution for analytics on top of Azure Cosmos DB data.
  • Azure IoT Hub can be used instead of Azure Event Hubs. IoT Hub is a managed service hosted in the cloud that acts as a central message hub for communication between an IoT application and its attached devices. You can connect millions of devices and their backend solutions reliably and securely. Almost any device can be connected to an IoT hub.

Scenario details

This scenario illustrates how you can get insights from live streaming data. You can capture data continuously from any IoT device, or logs from website clickstreams, and process it in near-real time.

Potential use cases

This solution is ideal for the media and entertainment industry. The scenario is for building analytics from live streaming data.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

You can use the Azure pricing calculator to get a customized pricing estimate.

Next steps