• 5 min read

Azure HDInsight training resources – Learn about big data using open source technologies

Azure HDInsight is the only fully-managed cloud Hadoop & Spark offering that gives you optimized open-source analytic clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and Microsoft R…

Azure HDInsight is the only fully-managed cloud Hadoop & Spark offering that gives you optimized open-source analytic clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and Microsoft R Server backed by a 99.9% SLA. You can deploy these big data technologies and ISV applications as managed clusters with enterprise-level security and monitoring.

Given the broad scope of open source technologies on big data, we have received many requests for a detailed guide around HDInsight. This post outlines the key areas covered by the guide and also outlines some other options for learning on HDInsight.

HDInsight Developer Guide

We have received requests from customers to have detailed documentation on how to architect, deploy, manage, monitor and secure big data solutions for use-cases and scenarios such as advanced analytics, streaming, business Iintelligence, ETL, and many more. In essence, customers have been looking for a guide that walks them through step-by-step on how to deploy and operate a big data solution. We are pleased to announce the release of the HDInsight Developer Guide, a guide that covers both basic as well as advanced scenarios useful for any developer, data scientists, or data engineer getting started or learning more with Azure HDInsight. The guide starts with a basic overview and use-cases, followed by best practices on how to configure cluster, plan capacity, and develop applications for different workloads such as Hive, Spark, and optimize workloads based. Finally, the guide concludes with advance use-cases and scenarios along with samples.

HDInsight training resources

In addition to the guide, we would also like to highlight the other resources available to learn or get to know more about HDInsight. Please see below for the different learning resources available for HDInsight including self-paced training, documentation, developer guide, videos, and many more.

Self-paced online trainings

  • Self-paced online training on edX, an online learning destination, offers high-quality courses from around the world’s best universities and institutions to learners everywhere. These self-paced training courses are available as part of Microsoft Professional Program for Big Data, which are available for free, or you can add a verified certificate for a fee. These courses have been updated and below are the three specific courses on HDInsight. 
  • Self-paced online training on Microsoft Virtual Academy, which provides free online training by world-class experts to help you build your technical skills and advance your career. Ready to continue your big data deep dive? Below are the in-depth course to explore Hadoop and Spark on HDInsight, which are a key part of the analytics portion of MVA Data Series.

Self-serve documentation

  • HDInsight Documentation: This is the landing page for HDInsight documentation that is useful to any developer, data scientists, or big data administrator. This 1385 page+ documentation includes all the section from getting started to specific topics on different scenarios and use-cases with HDInsight. You can also download the whole documentation using the “Download as PDF” option available on bottom left side of the page, or search for specific topics on the top left search box.
  • HDInsight Troubleshooting Guide: We are constantly updating the troubleshooting guide so that you can easily debug or troubleshoot issues.

Instructor led training

Whether you’re looking to enhance your proficiency in Azure Machine Learning, ML Server, the Cognitive Services CNTK toolkit, or another cloud specialty, we’ve likely got a course that can get you on your way. The instructor-led and self-paced video courses span from short webinars, to multi-day workshops, to longer-term deep dives on demand. Check back frequently because new offerings are added by Microsoft and our training partners.


The following videos are a good watch to learn about the scope and features in HDInsight.

2016-17 conference recordings

Ignite 2016

Ignite 2017


Hadoop Summit


Hands on labs

  • Data science lab: This lab specifically focuses on the Spark ML component of Spark and highlights its value proposition in the Apache Spark Big Data processing framework.
  • Hive lab: This lab focuses on how customers can leverage HDInsight Hive to analyze big data stored in Azure Blob Storage.

Get Microsoft certified on HDInsight 


We hope that you would find the developer guide and all the other resources helpful. If you have any feedback or questions, feel free to send us an email at hdiask@microsoft.com. We’d love to hear from you.

Table of content for the HDInsight Developer Guide

This is a short version of the table of contents. This should give you a good idea of what you can expect from this guide.

  • Overview
  • Azure HDInsight and Hadoop Architecture
    • Configuring the Cluster
    • Configuring Identity and Access Controls
    • Monitoring and managing the HDInsight cluster
  • Developing Hive applications
  • Developing Spark applications
    • Use Spark with notebooks
    • Use Spark with IntelliJ
    • Spark samples
  • Developing Spark ML applications
  • Deep Learning with Spark
  • Developing R scripts on HDInsight
  • Developing Spark Streaming applications
  • Optimizing Spark Performance
  • Use HBase
    • Use Phoenix with HBase on HDInsight
  • Apache Open Source Ecosystem
  • Advanced Scenarios and Deep Dives
  • Troubleshooting