Digging into network health using Application Insights extensibility

By Tom Moore Principal Service Engineer Manager

Digging into network health using Application Insights extensibility • 5 min read

Posted on May 23, 2016
5 min read

Application Insights (AI) released the telemetry processor feature that enables you to inspect, filter and modify each data point generated by the AI SDK before it is sent to the portal. I’ve been playing with this capability and realized that it gives you a programmatic solution for identifying your application’s remote dependencies at runtime. This enabled me to implement a custom extension that performs network diagnostic tests against my application’s dependencies. The resulting data extends and complements AI’s default telemetry. The flexibility and richness of the telemetry processor enables many scenarios but digging in on network health is what initially jumped out to me. This post shares what I’ve learned from the project.

Overview

I created a simple ASP.NET project that has a single remote HTTP dependency to fetch historical stock values. The application is instrumented with the AI SDK and implements a “ping mesh” telemetry processor extension that inspects AI remote-dependency-data (RDD) events, parses the distinct URIs, and performs an ongoing series of TCP port pings. The diagnostics results are then submitted as AI metrics that help isolate if a remote dependency slowdown is network related or not. The value of this telemetry is how it helps to either implicate the network for slowdowns/failures or to clarify that the issue is higher up that stack.

Objectives

Illustrate how you can register a custom AI telemetry processor extension and then process individual AI telemetry events.
Show how AI’s remote dependency telemetry events can be used to dynamically identify your applications dependencies at runtime.
Demonstrate how you can integrate custom code within the AI SDK to evaluate network health and performance for your dependencies.
Show how to log custom telemetry to your AI application.

Implementation summary

The Application Insights SDK enables you to register custom extensions which are integrated into the AI processing pipeline and can be used to process each telemetry item generated by AI. In our example we are interested in determining the remote dependencies for our application and therefore will process AI’s remote dependency events which are generated whenever your application calls a remote endpoint. The diagram below represents a simplified flow for the solution:

Your application code makes calls to remote dependencies.
The AI SDK intercepts each remote calls and sends a remote dependency event to the telemetry processor extension you registered.
The endpoint properties (host-name and port) are parsed from the remote dependency event and stored in a static hash table within the Ping Mesh object.
The Ping Mesh object continually polls the hash table and then creates ‘Ping’ threads that evaluate network health and performance for the endpoint.
As each Ping thread completes it submits AI metrics for network health and response time.

Further improvements could be added to capture client side network state (e.g. – netstat, tracert, ipconfig, etc.) at time of failure and submit those to AI as Trace events (SDK guidance here).

Results

Before going deeper into the implementation, I wanted to show the value of the solution by sharing a scenario. My application architecture has a front-end ASP.NET website that uses a WebAPI controller to call into a remote Yahoo finance API. Historical stock pricing data is fetched from the remote dependency and then returned to the user. The following diagram illustrates the flow:

If there is a network slowdown between my ASP.NET VM and the Yahoo API server, it will result in slow page loads for the website users. To help you understand the health of your remote dependencies, AI provides you with nice views of historical response times and failure rates. The example below shows the out-of-box metrics I get from AI for overall dependency call durations for my app:

However, it can be hard to tell if a slowdown is due to the network or not. The custom Ping Mesh extension provides ongoing baseline metrics that help to identify where the issues is. The basic patterns you are looking for are:

AI’s remote dependency telemetry provides you with free insights into the overall health and performance of calls into your remote end-point (i.e. – It gives you the blue line). With a bit of custom code implemented as an AI Telemetry Processor extension you can generate your own network health and performance metrics (i.e. – the orange line).

To simulate a network slowdown and to show how it looks in Application Insights, I setup my website on an Azure VM and used a network WAN simulator to introduce high network latency for the calls to the Yahoo API endpoint. Below you can see that how the metrics in AI provide you with clear data that matches the “Network Issue” pattern above (i.e. the network slowdown directly correlates to the remote dependency slowdown):

So, with a little bit of custom code I was able to extend to AI telemetry to give me ongoing network diagnostics that are dynamically spun up as my application connects to new endpoints. That is pretty cool!

Implementation details

The steps needed to implement the Ping Mesh solution were pretty minimal in terms of hooking into the AI Telemetry pipeline. Most of the code was implemented in a custom PingMesh.dll. The high level set of steps I followed to get the solution working were:

Instrument my application with latest AI SDK and setup an application within AI (right clicking on your VS project and a wizard steps you through the process).
Add a reference in my project for the custom ping mesh library that implements the network diagnostics.
Add code for my telemetry processor (AITelemetryProcessor.cs) and instantiate the PingMesh object in its constructor.
Update the ApplicationInsights.config with the configuration for the telemetry processor.
Build and run my project.

You can find the source code on GitHub.

The image below shows how the ITelemetryProcessor code was implemented:

After creating my telemetry processor code I had to register it in the AI configuration:

The SubmitEndpointToTargetList method, in the Ping Client, is responsible for parsing the endpoint into a format that can be pinged by our network diagnostics code:

As mentioned earlier, a “ping” thread is created when the Ping Client object is initialized. This ping thread executes every 30 seconds (I should move this setting into my extension config but have hard coded for this demo) and, as we’ll see, polls any endpoints added to the PingTarget hash set above.

Finally, in the Ping Client, all network diagnostics results are submitted to Application Insights as custom metrics that show if DNS or TCP issues were occurring during the test (Note: I replaced the “:” with an “_” to avoid potential issues with special characters in the metric name):

In summary

In this post we looked at how you can author code that plugs into AI Telemetry Processor pipeline to extend the default analytics provided by AI. In our example we leveraged the remote dependency discovery that AI performs to implement diagnostics that provide a historical baseline for your network health. You could extend this implementation further to capture network state on you operating system when issues occur. Hopefully this gives you an idea of how flexible and powerful the AI telemetry processor feature is.

Digging into network health using Application Insights extensibility

Overview

Objectives

Implementation summary

Results

Implementation details

In summary

Explore

Related posts

Best practices for queries used in log alert rules

A fintech startup pivots to Azure Cosmos DB

Application Insights improvements for Java and Node.js

Deploying WordPress Application using Visual Studio Team Services and Azure – Part two

Join the conversation

Sélection

IA + Machine Learning

Analyse

Calcul

Conteneurs

Bases de données

DevOps

Outils de développement

Hybride + multicloud

Identité

Intégration

Internet des Objets

Gestion et gouvernance

Données multimédias

Migration

Réalité mixte

Mobile

Mise en réseau

Sécurité

Stockage

Web

Bureau virtuel Windows

Cas d'utilisation

Développement d’applications

IA

Migration et modernisation cloud

Données et analyse

Cloud hybride et infrastructure

Internet des Objets

Sécurité et gouvernance

Type d’organisation

Ressources

Overview

Objectives

Implementation summary

Results

Implementation details

In summary

Explore

Related posts

Join the conversation