Uncategorized, application-insights, machine-learning

Smart failure anomaly detection

By Yossi Yossifon Senior Program Manager, Microsoft Azure

Smart failure anomaly detection • 2 min read

Posted on April 11, 2016
2 min read

We recently added an automatic alert that will tell you if there’s a sudden disruption or degradation in your web app’s performance. If there’s an abnormal rise in the rate of failed requests, we’ll let you know within minutes, so you can investigate while most users are still unaware of the problem.

Best of all, you don’t need to do anything to configure it, provided your app is set up with Visual Studio Application Insights and is sending certain minimum of requests telemetry. It works for both .NET and Java web apps, whether hosted in the cloud or on your own servers. It automatically learns normal patterns of failure rate for your app, and raises the alert on an abnormal rise. Learn more.

Diagnostics drill-in

The mail alert doesn’t just warn you of the problem. It carries valuable diagnostic information, highlighting characteristics that are common across the failures, whether it’s the response code, the operation name, the application version, or other properties. It also carries an exception, trace and dependency call when these are relevant to the problem. From the links in the email, you can click straight through to see specific failed requests in the Application Insights portal, from there to dependency failure, exception, call stack or other related telemetry.

NRT Proactive Diagnostics drill-in example

What's the benefit of these alerts?

There are two great advantages of these alerts: automatic adaptation to the behavior of your app and appropriate diagnostic information.

As you probably know, you have always been able to set alerts on a chosen threshold of any metric. But the drawback there is that it can be difficult to determine the appropriate thresholds for each metric. It takes time for you to become familiar with the normal behavior of your system. There is in any case, no single ideal threshold. The failure rate may vary under load; some requests are more failure-prone; and so on. During this period, you learn what abnormal behavior looks like. You gradually find an optimal threshold that enables detection without too many false alarms. By contrast, the new Proactive Diagnostics alert does that learning for you, and raises the alarm when there’s a rise that is unexpected in the light of other factors.

Once a detection is made and you are aware of an issue, you still need to have more information when triaging it. What is the scale of the problem and its urgency? How many users are affected? Some of the information might be available in your dashboards, but often you have to perform some analysis on telemetry to get a sufficient view.

Diagnosis of the problem can be a difficult task, as the problem might be caused by a bug in the code, configuration, storage or other external services (databases, REST services) that the app is using. But in the Proactive Diagnostics alert, we collect information about the anomaly to highlight what is likely to be the cause.

Smart failure anomaly detection alert helps you detect service disruption, or degradation in minutes and provides you with supportive information that simplifies and expedites the diagnosis of the root cause.

Please share your ideas for new or improved features on the Application Insights UserVoice page and submit your questions to the Application Insights Forum.

Smart failure anomaly detection

Diagnostics drill-in

What's the benefit of these alerts?

Explore

Related posts

Best practices for queries used in log alert rules

A fintech startup pivots to Azure Cosmos DB

Cloud Commercial Communities webinar and podcast update

Accelerating AI in healthcare: Security, privacy, and compliance

Join the conversation

Vorgestellt

KI + Machine Learning

Analysen

Compute

Container

Datenbanken

DevOps

Entwicklungstools

Hybrid Cloud und Multi Cloud

Identität

Integration

Internet der Dinge

Verwaltung und Governance

Medien

Migration

Mixed Reality

Mobil

Netzwerk

Sicherheit

Speicher

Web

Windows Virtual Desktop

Anwendungsfälle

Anwendungsbereitstellung

KI

Cloudmigration und -modernisierung

Daten und Analysen

Hybrid Cloud und Infrastruktur

Internet der Dinge

Sicherheit und Governance

Organisationstyp

Ressourcen

Diagnostics drill-in

What's the benefit of these alerts?

Explore

Related posts

Join the conversation