Editor’s note: This post was written by Ewan Fairweather.
Welcome to the fourth blog entry on designing and implementing the Telemetry component in Cloud Service Fundamentals (CSF) on Windows Azure! So far, we have described basic principles around application health in Telemetry Basics and Troubleshooting including an overview to the fundamental tools, information sources, and scripts that you can use to gain information about your deployed Windows Azure solutions. In our second entry, we addressed Telemetry – Application Instrumentation describing how our applications are the greatest sources of information when it comes to monitoring, and how you must first properly instrument your application to achieve your manageability goals after the application goes into production. In our third article we described how to automate and scale a data acquisition pipeline to collect monitoring and diagnostics information across a number of different components and services within your solution; and consolidate this information in a queryable operational store.
The topic for this fourth blog is reporting, which basically entails showing you how to get the information you need about your system to suit the different type of analytical and reporting requirements in your organization. In this blog we will provide an overview of the solution we’ve provided and in the corresponding WIKI entry we will walk through in detail the implementation. Specifically we will show you how to quickly extract things like your database tier resource utilization, end-to-end execution time analysis, and how to turn these into reports and dashboards. In the wiki we then walk through the underlying implementation of the operational store along with examples of how to use analytical queries on it. We also cover the reporting package that we provide and how to utilize Excel to do a deeper level of analysis. We will then show you how you can extend the provided helper functions to get further information that suits your requirements. .
Telemetry Database in CSF
The previous article in this series discussed the data pipeline, which is the CSF implementation of the collector tasks shown in the below data flow diagram. These collector tasks are used by the CSF telemetry worker role and scheduler to populate the Telemetry database on a configurable periodic basic. In this article we will describe the thought process that you need to go through to determine your analytical and reporting requirements and then in the corresponding WIKI entry you can read more and learn how you can extract information (shown on the right-hand side of the diagram) to provide information through reporting services, SSMS and Excel.
Defining Our Reporting Scenario and Requirements
A key first step to gaining effective insight into your telemetry data is to define the reporting scenarios and their key requirements. When defining the CSF telemetry solution, a useful technique was to first define the three scenarios: operational reporting, alerting, and root-cause isolation. The “I can” approach to defining requirements was then used to define the key requirements. These were then prioritized, with the majority of the operational reporting and root-cause isolation scenarios implemented in the out-of-the-box CSF experience. The underlying data structure is there to service your alerting needs.
This technique allowed us to consider how the underlying schema would support the current and potential future requirements, both when defining and then later extending the telemetry database. This technique was an important first step, and you should especially use it if you plan to extend your telemetry database.
The following picture highlights the specific parts of the CSF package which are related to the telemetry database and reporting solution.
We hope that gives you an insight into the solution, if you want to understand how to actually use it please continue reading this WIKI article which has all the details.