Questions? Feedback? powered by Olark live chat software
Přeskočit navigaci

Digging into network health using Application Insights extensibility

Publikováno dne 23 května, 2016

Principal Service Engineer Manager

Application Insights (AI) released the telemetry processor feature that enables you to inspect, filter and modify each data point generated by the AI SDK before it is sent to the portal. I’ve been playing with this capability and realized that it gives you a programmatic solution for identifying your application’s remote dependencies at runtime. This enabled me to implement a custom extension that performs network diagnostic tests against my application’s dependencies. The resulting data extends and complements AI’s default telemetry. The flexibility and richness of the telemetry processor enables many scenarios but digging in on network health is what initially jumped out to me. This post shares what I’ve learned from the project.

Overview

I created a simple ASP.NET project that has a single remote HTTP dependency to fetch historical stock values. The application is instrumented with the AI SDK and implements a “ping mesh” telemetry processor extension that inspects AI remote-dependency-data (RDD) events, parses the distinct URIs, and performs an ongoing series of TCP port pings. The diagnostics results are then submitted as AI metrics that help isolate if a remote dependency slowdown is network related or not. The value of this telemetry is how it helps to either implicate the network for slowdowns/failures or to clarify that the issue is higher up that stack.

Objectives

  1. Illustrate how you can register a custom AI telemetry processor extension and then process individual AI telemetry events.
  2. Show how AI’s remote dependency telemetry events can be used to dynamically identify your applications dependencies at runtime.
  3. Demonstrate how you can integrate custom code within the AI SDK to evaluate network health and performance for your dependencies.
  4. Show how to log custom telemetry to your AI application.

Implementation summary

The Application Insights SDK enables you to register custom extensions which are integrated into the AI processing pipeline and can be used to process each telemetry item generated by AI. In our example we are interested in determining the remote dependencies for our application and therefore will process AI’s remote dependency events which are generated whenever your application calls a remote endpoint. The diagram below represents a simplified flow for the solution:

2016-05-18_19h11_45

  • Your application code makes calls to remote dependencies.
  • The AI SDK intercepts each remote calls and sends a remote dependency event to the telemetry processor extension you registered.
  • The endpoint properties (host-name and port) are parsed from the remote dependency event and stored in a static hash table within the Ping Mesh object.
  • The Ping Mesh object continually polls the hash table and then creates ‘Ping’ threads that evaluate network health and performance for the endpoint.
  • As each Ping thread completes it submits AI metrics for network health and response time.

Further improvements could be added to capture client side network state (e.g. – netstat, tracert, ipconfig, etc.) at time of failure and submit those to AI as Trace events (SDK guidance here).

Results

Before going deeper into the implementation, I wanted to show the value of the solution by sharing a scenario. My application architecture has a front-end ASP.NET website that uses a WebAPI controller to call into a remote Yahoo finance API. Historical stock pricing data is fetched from the remote dependency and then returned to the user. The following diagram illustrates the flow:

2016-05-18_19h15_17

If there is a network slowdown between my ASP.NET VM and the Yahoo API server, it will result in slow page loads for the website users. To help you understand the health of your remote dependencies, AI provides you with nice views of historical response times and failure rates. The example below shows the out-of-box metrics I get from AI for overall dependency call durations for my app:

2016-05-18_19h18_16

However, it can be hard to tell if a slowdown is due to the network or not. The custom Ping Mesh extension provides ongoing baseline metrics that help to identify where the issues is. The basic patterns you are looking for are:

image

AI’s remote dependency telemetry provides you with free insights into the overall health and performance of calls into your remote end-point (i.e. – It gives you the blue line). With a bit of custom code implemented as an AI Telemetry Processor extension you can generate your own network health and performance metrics (i.e. – the orange line).

To simulate a network slowdown and to show how it looks in Application Insights, I setup my website on an Azure VM and used a network WAN simulator to introduce high network latency for the calls to the Yahoo API endpoint. Below you can see that how the metrics in AI provide you with clear data that matches the “Network Issue” pattern above (i.e. the network slowdown directly correlates to the remote dependency slowdown):

2016-05-18_19h00_13

So, with a little bit of custom code I was able to extend to AI telemetry to give me ongoing network diagnostics that are dynamically spun up as my application connects to new endpoints. That is pretty cool!

Implementation details

The steps needed to implement the Ping Mesh solution were pretty minimal in terms of hooking into the AI Telemetry pipeline. Most of the code was implemented in a custom PingMesh.dll. The high level set of steps I followed to get the solution working were:

  1. Instrument my application with latest AI SDK and setup an application within AI (right clicking on your VS project and a wizard steps you through the process).
  2. Add a reference in my project for the custom ping mesh library that implements the network diagnostics.
  3. Add code for my telemetry processor (AITelemetryProcessor.cs) and instantiate the PingMesh object in its constructor.
  4. Update the ApplicationInsights.config with the configuration for the telemetry processor.
  5. Build and run my project.

You can find the source code on GitHub.

The image below shows how the ITelemetryProcessor code was implemented:

2016-05-18_18h51_59

After creating my telemetry processor code I had to register it in the AI configuration:

2016-05-18_18h50_15[9]

The SubmitEndpointToTargetList method, in the Ping Client, is responsible for parsing the endpoint into a format that can be pinged by our network diagnostics code:

2016-05-18_18h57_55

As mentioned earlier, a “ping” thread is created when the Ping Client object is initialized. This ping thread executes every 30 seconds (I should move this setting into my extension config but have hard coded for this demo) and, as we’ll see, polls any endpoints added to the PingTarget hash set above.

2016-05-18_18h58_55

2016-05-18_18h59_24

Finally, in the Ping Client, all network diagnostics results are submitted to Application Insights as custom metrics that show if DNS or TCP issues were occurring during the test (Note: I replaced the “:” with an “_” to avoid potential issues with special characters in the metric name):

2016-05-18_18h59_53

In summary

In this post we looked at how you can author code that plugs into AI Telemetry Processor pipeline to extend the default analytics provided by AI. In our example we leveraged the remote dependency discovery that AI performs to implement diagnostics that provide a historical baseline for your network health. You could extend this implementation further to capture network state on you operating system when issues occur. Hopefully this gives you an idea of how flexible and powerful the AI telemetry processor feature is.