Solution architecture: Anomaly detection with machine learning

The services used by modern IT departments generate large volumes of telemetry data to track various aspects of operational health, system performance, usage insights, business metrics, alerting and many others. Often, however, monitoring and gathering insights from all of this data isn’t fully automated and can be error-prone, making it hard to effectively and accurately determine the health of the system at any given point in time.

This customisable anomaly-detection solution uses machine learning to ensure high IT-systems availability, and it provides an end-to-end pipeline that ingests data from on-premises and cloud data sources and reports anomalous events to downstream monitoring and ticketing systems.

With this solution, you’ll quickly detect and fix issues based on underlying health metrics from IT infrastructure (CPU, memory etc.), services (timeouts, SLA variations, brownouts etc.) and other key performance indicators (order backlog, login, payment failures etc.).

Deploying to Azure

Use the following pre-built template to deploy this architecture to Azure

Deploying to Azure

View deployed solution

Browse on GitHub

Machine Learning(Anomaly Detection) Service Bus topics(Publish/subscribe capabilities) Visual Studio Application Insights(Monitoring and telemetry) Event Hub(Event queue) Table Storage(Big Data store) Stream Analytics(Realtime analytics) Metadata Save ML output Score each dataset Publish anomalies detected Power BI Azure SQL DB(Anomaly detection results) Data Factory Time series data

Implementation guidance

Products/Description Documentation

Event Hubs

This is the entry point of the pipeline, where the raw timeseries data is ingested.

Stream Analytics

Stream Analytics performs aggregation at five-minute intervals, and aggregates raw data points by metric name.

Storage

Azure Storage stores data aggregated by the Stream Analytics job.

Data Factory

Data Factory calls the Anomaly Detection API at regular intervals (every 15 minutes by default) on the data in Azure Storage. It stores the results in a SQL database.

SQL Database

SQL Database stores the results from the Anomaly Detection API, including binary detections and detection scores. It also stores optional metadata sent with the raw data points to allow for more complicated reporting.

Machine Learning Studio

This hosts the Anomaly Detection API. Note that the API itself is stateless and requires historical data points to be sent in each API call.

Service Bus

Detected anomalies are published to a service bus topic to enable consumption by external monitoring services.

Application Insights

Application Insights allows for monitoring of the pipeline.

Power BI

Power BI provides dashboards showing the raw data, as well as detected anomalies.

Related solution architectures