• 4 min read

Get up to speed with Azure HDInsight: The comprehensive guide

HDInsight covers a wide variety of big data technologies, and we have received many requests for a detailed guide. Whether you want to just get started with HDInsight, or become a Big Data expert, this post has you covered with all the latest resources.

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics. With HDInsight, you get managed clusters for various Apache big data technologies, such as Spark, MapReduce, Kafka, Hive, HBase, Storm and ML Services backed by a 99.9% SLA. In addition, you can take advantage of HDInsight’s rich ISV application ecosystem to tailor the solution for your specific scenario.

HDInsight covers a wide variety of big data technologies, and we have received many requests for a detailed guide. Whether you want to just get started with HDInsight, or become a Big Data expert, this post has you covered with all the latest resources.

Latest content

The HDInsight team has been working hard releasing new features, including the launch of HDInsight 4.0. We make major product announcements on the Azure HDInsight and Big Data blogs. Here is a selection of the most recent updates:

HDInsight Developer Guide

The HDInsight Developer Guide covers both basic as well as advanced scenarios for developers, data scientists, or data engineers getting started or learning more with Azure HDInsight. This step-by-step guide starts with a basic overview and use-cases, followed by best practices on how to configure clusters, plan capacity, and develop applications for different workloads such as Hive, Spark, HBase and others. Finally, the guide concludes with advanced use-cases and scenarios along with samples.

HDInsight training resources

In addition to the guide, we would also like to highlight the other resources available to learn or know more about HDInsight. Please see below for the different learning resources available for HDInsight including self-paced training, documentation, videos, and more.

Self-paced online trainings

Self-paced online training on edX, an online learning destination, offers high-quality courses from the world’s best universities and institutions to learners everywhere. These self-paced training courses are available for free as part of Microsoft Professional Program for Big Data, or you can add a verified certificate for a fee. These courses have been updated and below are the three specific courses on HDInsight.

Also see self-paced online training on Microsoft Virtual Academy, which provides free online training by world-class experts to help you build your technical skills and advance your career. Ready to continue your big data deep dive? Below are the in-depth course to explore Hadoop and Spark on HDInsight, which are a key part of the analytics portion of MVA Data Series.

Self-serve documentation

HDInsight Documentation: This is the landing page for HDInsight documentation that is useful to any developer, data scientist, or big data administrator. This documentation includes everything from getting started to specific scenarios and use-cases with HDInsight. You can download the complete documentation using the “Download as PDF” option available on bottom left side of the page, or search for specific topics on the top left search box.

HDInsight Troubleshooting Guide: We are constantly updating the troubleshooting guide so that you can easily debug or troubleshoot issues.

Instructor led training

Whether you’re looking to enhance your proficiency in specific technologies like Azure Machine Learning Studio or in overall architecture of Big Data and Analytics, we’ve likely got a course that can get you on your way. The instructor-led and self-paced video courses span from short webinars, to multi-day workshops, to longer-term deep dives on demand. Check back frequently because new offerings are regularly added by Microsoft and our training partners.

Videos

HDInsight videos: Apart from the above resources, you can also search for specific topics from getting started to advanced topics on Channel 9 or YouTube.

The following videos are great to learn about the scope and features in HDInsight.

2017-18 conference recordings

Ignite 2018

DataWorks Summit 2018

//build

Connect()

Hands on labs

  • Data science lab: This lab specifically focuses on the Spark ML component of Spark and highlights its value proposition in the Apache Spark Big Data processing framework.
  • Hive lab: This lab focuses on how customers can leverage HDInsight Hive to analyze big data stored in Azure Blob Storage.

Get Microsoft certified on HDInsight

Other Resources

We hope that you will find the developer guide and all the other resources helpful. If you have any feedback or questions, feel free to send us an email at AskHDInsight@microsoft.com. We’d love to hear from you. You can also stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight.