Anomaly Detection in Real-Time Data Streams

The Cortana Intelligence IT Anomaly Insights solution helps IT departments within large organisations to quickly detect and fix issues based on underlying health metrics from IT infrastructure (CPU, Memory, etc.), services (Timeouts, SLA variations, Brownouts, etc.), and other key performance indicators (KPIs) (Order backlog, Login and Payment failures, etc.) in an automated and scalable manner. This solution also offers an easy “Try it Now” experience that can be tried with customised data to realise the value offered by the solution. The “Deploy” experience allows you to quickly get started with the solution on Azure by deploying the end-to-end solution components into your Azure subscription and providing full control for customisation as needed.


Note: If you have already deployed this solution, click here to view your deployment.

Connect with one of our Advanced Analytics partners to arrange a proof of concept in your environment: Neal Analytics, Empired

Estimated provisioning time: 30 minutes

Today, modern services generate large volumes of telemetry data to track various aspects of operational health, system performance, usage insights, business metrics, alerting and many others. However, the monitoring and gathering of insights from this large volume of data for IT departments is often not fully automated and error prone (generally using rules or threshold-based alerts), making it hard to effectively and accurately determine the health of the system at any given point in time.

Cortana Intelligence IT Anomaly Insights solves this customer pain by providing a solution with a low barrier to entry that is based on Cortana Intelligence Solutions (for easy deployment of Azure services) and Azure Machine Learning Anomaly Detection API (for fully automated tracking of historical and real time data), making it easy for a business decision maker to evaluate and realise value within minutes, also allowing customers to bring their own data, customise and extend the solution in order to adapt it to their particular scenarios via quick proof of concepts. With this solution, organisations will be able to:

  • Leverage the state-of-the-art Azure Machine Learning Anomaly Detection API to learn and react to anomalies from both historical and real-time data. This eliminates human-in-the-loop, otherwise needed for recalibrating thresholds to detect missing anomalies and minimise false positives.
  • Quickly realise the potential of the solution by trying it out with their own data without any upfront investment. The “Try it Now” experience also provides users with the ability to determine the right set of sensitivity parameters for the relevant use case.
  • Deploy an end-to-end pipeline into their subscription to ingest data from on-premises and cloud data sources and report anomalous events to downstream monitoring and ticketing systems in a plug-and-play manner in a matter of minutes.

Try It experience with PowerBI

IT Anomaly Insights Preconfigured Solution Dashboard

Solution Diagram

See solution architecture and detailed instructions on GitHub.

As described in the solution diagram below, real-time metric streams originating from both on-premises-based or cloud-based systems can be pumped into Azure Event Hub queue. These events (or time series data points) are processed by Azure Stream Analytics where they are aggregated at five-minute intervals. Each time series is sent to the Azure Anomaly Detection API for evaluation at a 15-minute cadence. The results from the API, along with the dimensions provided during input, are then stored in Azure SQL DB. The detected anomalies are also published in Azure Service Bus so that they can be consumed by the downstream ticketing systems. The solution also provides instructions for setting up the Power BI dashboard so that anomalies can be visualised quickly for root cause analysis.

Anomaly Detection API

The Anomaly Detection API is used in the “Try It Now” experience and the deployed solution. It helps you to detect different types of anomalous patterns in your time series data. It assigns an anomaly score to each data point in the time series, which can be used for generating alerts, monitoring through dashboards or connecting with your ticketing systems. The anomaly detection API can detect the following types of anomalies in time series data:

  • Spikes and Dips: for example, when monitoring the number of login failures to a service or the number of checkouts on an e-commerce site, unusual spikes or dips could indicate security attacks or service disruptions.
  • Positive and negative trends: when monitoring memory usage in computing, for instance, shrinking free memory size is indicative of a potential memory leak; when monitoring service queue length, a persistent upward trend may indicate an underlying software issue.
  • Level changes and changes in dynamic range of values: for example, level changes in latencies of a service after a service upgrade or lower levels of exceptions after an upgrade can be interesting to monitor.


©2017 Microsoft Corporation. All rights reserved. This information is provided “as is” and may change without notice. Microsoft makes no warranties, express or implied, with respect to the information provided here. Third-party data was used to generate the solution. You are responsible for respecting the rights of others, including procuring and complying with relevant licences in order to create similar datasets.

Related solution architectures

Predictive maintenance

This Predictive Maintenance solution monitors aircraft and predicts the remaining useful life of aircraft engine components.

Quality assurance

Quality assurance systems allow businesses to prevent defects throughout their processes of delivering goods or services to customers. Building such a system that collects data and identifies potential problems along a pipeline can provide enormous advantages. For example, in digital manufacturing, quality assurance across the assembly line is imperative. Identifying slowdowns and potential failures before they occur rather than after they are detected can help companies reduce the cost of scrap and rework, while improving productivity.