• 3 min read

Azure HDInsight Integration with Azure Log Analytics is now generally available

I am excited to announce the general availability of HDInsight Integration with Azure Log Analytics. With this release we bring number of new monitoring and troubleshooting capabilities for your HDInsight environment. The new capabilities are aimed at helping our customers operate Big Data workloads at scale.

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px ‘Helvetica Neue’; color: #454545}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px ‘Helvetica Neue’; color: #454545; min-height: 14.0px}

I am excited to announce the general availability of HDInsight Integration with Azure Log Analytics.

Azure HDInsight is a fully managed cloud service for customers to do analytics at scale using the most popular open-source engines such as Hadoop, Hive/LLAP, Presto, Spark, Kafka, Storm, HBase etc. ​

Thousands of our customers run their big data analytical applications on HDInsight at global scale. The ability to monitor this infrastructure, detect failures quickly and take quick remedial action is key to ensuring a better customer experience.

Log Analytics is part of Microsoft Azure's overall monitoring solution. Log Analytics helps you monitors cloud and on-premises environments to maintain availability and performance.

Our integration with log analytics will make it easier for our customers to operate their big data production workloads more effective and simple manner.

Monitor & debug full spectrum of big data open source engines at global scale

Typical big data pipelines utilize multiple open source engines such as Kafka for Ingestion, Spark streaming or Storm for stream processing, Hive & Spark for ETL, Interactive Query [LLAP] for blazing fast querying of big data.

Additionally, these pipelines may be running in different datacenters across the globe.

With new HDInsight monitoring capabilities, our customers can connect different HDInsight clusters to Log Analytics workspace and monitor them with single pane of glass.

Image: Monitoring your global big data deployments with single pane of glass

Collect logs and metrics from open source analytics engines

Once Azure Log Analytics is enabled on your cluster, you will see important logs and metrics from number of different open source frameworks as well as cluster VM level metrics such as CPU usage, memory utilization and more. Customers will be able to get a full view into their cluster, from one location.

Many of our customers take advantage of elasticity of the cloud by creating and deleting clusters to minimize their costs. However, they want to retain the job logs and other useful information even after the cluster is terminated. With Azure log analytics, customers can retain the job information even after the cluster is deleted.

Below are some of the key metrics and logs collected from your HDInsight clusters.

Yarn Resource Manager, Yarn Applications, Hive, Mapreduce, Kafka, Storm, Hive Server 2, Hive Server Interactive, Oozie, Spark, Spark executor and driver Livy, Storm, HBase, Phoenix, Juypter, LLAP, Zookeeper, and many more.



Image: Logs & Metrics from various Open Source engines.

Visualize key metrics with solution templates

To make it easier we have created number of visualizations so that our customers can understand important metrics. We have published multiple solution templates for you to get started quickly. You can install these solutions templates from Azure portal directly, under Monitoring + Management.


Image: Installing HDInsight solution templates from Azure portal

Once installed, you can see visualize the key metrics. In the example below you can see the dashboard for your Spark clusters.


Image: Spark dashboard

Troubleshoot issues faster

It’s important to be able to detect and troubleshoot issues faster and find the root cause when developing big data applications in Hive, Spark or Kafka.

With log analytics portal, you can:

  • Write queries to quickly find issues of important data in your logs and metrics
  • Filter, sort, and group results within a time range
  • See your data in tabular format or in a chart

Below is the example query to look at application metrics from a Hive query


search *

| where Type contains "application_stats_dag_CL" and ClusterName_s contains "testhive02"

|order by TimeGenerated desc


Image: troubleshooting hive jobs 

Enabling Log Analytics

Log Analytics integration with HDInsight is enabled via the Azure portal, PowerShell or the Azure SDK. 

        [-ResourceGroupName ]
        [-DefaultProfile ]



Image: Enabling log Analytics from Azure portal 

Get started today

HDInsight integration with Azure Log Analytics help you to gain greater visibility into your Big Data environment. Learn more about the capabilities and to simplify monitoring of your Big Data applications.

Please reach out to AskHDInsight@Microsoft.com in case of any questions.