Solve Node.js issues faster with Application Insights for Node.js

在 六月 15, 2017 上貼文

Program Manager

Azure Application Insights is an application performance management (APM) platform which provides performance and diagnostic information about your running services and applications to help you discover and diagnose issues quickly. Use App Insights wherever your Node.js application runs containers, PaaS, IoT, and even Electron desktop apps. Just follow the instructions to drop the Node.js SDK into your app and watch helpful information flow into the Azure Portal in minutes.

APM’s help you understand and act upon what’s happening in your application, so one of our missions is to automatically collect and display information to help you pinpoint issues. We hope the following capabilities available in our latest SDK release (0.21.0) contribute to that.

Find related events

When pinpointing issues, reviewing all your traces and logs might be helpful, but it would be even more helpful to quickly filter to only those directly related to the problem at hand. For example, if your API service returns an error response, we’d like to help you quickly find all other traces related to that error. There’s a good chance some of them will lead you much closer to the source of the problem!

In this release we’ve made this correlation possible by including a shared correlation identifier in each App Insights item. Once you’ve drilled into the details view of any item, click “All available telemetry for this root operation” and instantly get a filtered view of related items based on correlation ID. For example, in the following screenshot a Node.js API request in turn leads to a MongoDB call, and the Request and Dependency items can be instantly viewed together.

01-correlation-mongo

Get started and learn more about Navigation and Dashboards in the Application Insights portal.

How it works

Instant filtering of correlated items in the Azure Portal requires adding the same correlation identifier (ID) to all related items sent by the SDK. That means we have to share that ID in the SDK between the original request context and the context of later operations like database and HTTP calls or sending a response. Sharing such data across callbacks and other asynchronous tasks in Node.js and JavaScript is challenging because JavaScript and Node.js don’t yet include a standard way to share context across callbacks, though several efforts are in progress.

Let’s illustrate the problem with a simple weather API service. A request to this API returns forecast details for a US zip code specified by an HTTP query parameter. The service itself gets these forecast details from the OpenWeatherMap HTTP API. The main handler follows and full code is in this gist.

function httpHandler (request, response) {

  let parsed_url = url.parse(request.url);
  let zip = parsed_url.query.zip;
  let country = parsed_url.query.country || 'us';

  let query_string = `zip=${zip},${country}&APPID=${appid}`
  http.get(`http://api.openweathermap.org/data/2.5/forecast?${query_string}`,
    weather_response => { weather_response.pipe(response) }
  );
}
http.createServer(httpHandler).listen(8080);

The topology of this app as discovered by App Insights’ App Map.

02-weather_topo

In this code, when Node’s HTTP server receives an incoming request, the App Insights SDK collects information about that request for you and specifies a correlation identifier (ID). When the associated HTTP request is then sent to the OpenWeatherMap API the SDK collects information about that request too including the previously specified correlation ID. When the response from the OpenWeatherMap API is received by the service and sent along to our original caller more events are sent by App Insights which include the original correlation identifier too.

To share such an identifier across all these operations and callbacks the App Insights SDK needs to access some storage shared across all of them. To meet this challenge, App Insights now utilizes the popular zone.js library by default to provide a persistent context to store and share a correlation identifier when a request starts, and retrieve and use it when other information is collected and sent to you. As a result, you, and the Portal on your behalf, can filter by this identifier to discover and fix problems in your app faster.

Find related events *across services*

With zone.js included App Insights is able to get you to the root of a problem in a single service faster. However, your application may in fact consist of multiple services, and the root problem may be in another one. To address this and provide you a filtered view of related traces and logs across all your services takes another step which we and the .NET team have begun to implement in this release.

We can now properly relate traces between multiple Node.js and .NET services, which communicate via HTTP when both utilize the same instrumentation key (ikey). As an example, in the following screens I’m investigating a failed request in an application with a Node.js API service which in turn invokes a .NET service. I drill in to one of these failures to find related events, where I find some events from the Node.js service and some from .NET. I’m quickly able to identify that an exception in the .NET service led to the failed Node.js API call!

03-transaction_trace_complete

Better App Map support

The correlation work described above also allows us to better represent your app’s topology in App Insights’ App Map. The same Node.js and .NET app mentioned above is represented in App Map as follows, with the Node.js API as “appinsights-node-02” and the .NET service as “api-11”. The error (“!”) icon in the api-11 node could serve as another entry point for the investigation described in the previous section.

04-appmap

We continue to experiment with details provided by App Map and would like your feedback on what’s needed to meet your needs.

Info from third-party modules

Next, we know how much value you derive from Node.js’s massive module ecosystem. To truly help you pinpoint problems you’ll often need insight into third-party modules too. For example, if a MongoDB call fails, it would help to know what message was sent to the Mongo service and what error was received.

App Insights provides a user API you can use to trace this activity yourself, but with this release we begin collecting this information for you automatically for MongoDB, MySQL, and Redis, as well as the Bunyan logging framework and console APIs. So now more detail about that failed database call is automatically available to you.

For example, you may have noticed in an earlier example that a Node.js API call which led to a MongoDB call actually listed *2* calls to Mongo. Drilling in helped me determine that both find and getMore commands were sent, as shown in the following screenshot.

05-transaction_trace_mongodb-03

Since maintaining module patches for third-party modules is hard to do consistently and reliably, we’ve published these patches and the mechanism we use to utilize them as the open source node-diagnostic-channel project on GitHub. We’re working on sharing this project with Project Glimpse and would love feedback from other tool providers and module authors on how we can collaborate.

Sampling

Last but not least, this release also includes support for percentage-based sampling so you can reduce the amount of data sent to your App Insights resource and thereby reduce costs. Don’t worry, our sampling algorithm is sensitive to the correlation work described above, so even if you’ve enabled sampling we’ll still send all related events from a sampled request.

To enable sampling, specify a percentage before starting the client as follows:

const appInsights = require("applicationinsights");
appInsights.client.config.samplingPercentage = 33;
appInsights.start();

Conclusion

For all the details on these and other updates, see the changelogs for v0.20.0, v0.20.1, and v0.21.0.

Our goal is to help you quickly discover and diagnose performance and functional issues in your Node.js services and applications. We’d love your feedback here and in GitHub on how we’re doing and what’s most important to you. Thanks!