Solution architecture: Anomaly detection with machine learning

The services used by modern IT departments generate large volumes of telemetry data to track various aspects of operational health, system performance, usage insights, business metrics, alerting, and many others. Often, however, monitoring and gathering insights from all of this data isn’t fully automated and can be error prone, making it hard to effectively and accurately determine the health of the system at any given point in time.

This customizable anomaly-detection solution uses machine learning to ensure high IT-systems availability, and it provides an end-to-end pipeline that ingests data from on-premises and cloud data sources and reports anomalous events to downstream monitoring and ticketing systems.

With this solution, you’ll quickly detect and fix issues based on underlying health metrics from IT infrastructure (CPU, memory, etc.), services (timeouts, SLA variations, brownouts, etc.), and other key performance indicators (order backlog, login and payment failures, etc.).

Deploy to Azure

Use the following pre-built template to deploy this architecture to Azure

Deploy to Azure

Browse on GitHub

Anomaly detection with machine learning Diagram showing 10 connected icons. At the top left is the icon for time series data. Connected by a one-way arrow to the right is Event Hubs, and further to the right, connected by a one-way arrow is Stream Analytics. Continuing to the right and down, a one-way arrow leads to Azure SQL Database, which is connected by a one-way arrow to Power BI at the far right. Going back to Stream Analytics, a one-way arrow leads down to Table Storage, and Table storage is connected by a mutual arrow down to Data Factory. Data Factory is connected to four other icons. Connected by a mutual arrow to the left is Machine Learning for anomaly detection. Connected downward by a one-way arrow is Visual Studio Application Insights for monitoring and telemetry. Connected by a one-way arrow up and to the right is Azure SQL Database. That arrow is labeled “Save Machine Learning output.” Connected by a one-way arrow to the right is Service Bus topics, with publish and subscribe capabilities. That arrow is labeled “Publish anomalies detected.” Machine Learning(Anomaly Detection) Service Bus topics(Publish/subscribe capabilities) Visual Studio Application Insights(Monitoring and telemetry) Event Hub(Event queue) Table Storage(Big Data store) Stram Analytics(Realtime analytics) Metadata Save ML output Score each dataset Publish anomalies detected Power BI Azure SQL DB(Anomaly detection results) Data Factory Time series data

Implementation guidance

Products Documentation

Event Hubs

This is the entry point of the pipeline, where the raw timeseries data is ingested.

Stream Analytics

Stream Analytics performs aggregation at 5-minute intervals, and aggregates raw data points by metric name.

Storage

Azure Storage stores data aggregated by the Stream Analytics job.

Data Factory

Data Factory calls the Anomaly Detection API at regular intervals (every 15 minutes by default) on the data in Azure Storage. It stores the results in a SQL database.

SQL Database

SQL Database stores the results from the Anomaly Detection API, including binary detections and detection scores. It also stores optional metadata sent with the raw data points to allow for more complicated reporting.

Machine Learning

This hosts the Anomaly Detection API. Note that the API itself is stateless and requires historical data points to be sent in each API call.

Service Bus

Detected anomalies are published to a service bus topic to enable consumption by external monitoring services.

Application Insights

Application Insights allows for monitoring of the pipeline.

Power BI

Power BI provides dashboards showing the raw data, as well as detected anomalies.

Related solution architectures