Azure HDInsight is the only fully-managed cloud Hadoop & Spark offering that gives you optimized open-source analytic clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and Microsoft R Server backed by a 99.9% SLA. You can deploy these big data technologies and ISV applications as managed clusters with enterprise-level security and monitoring.
Given the broad scope of open source technologies on big data, we have received many requests for a detailed guide around HDInsight. This post outlines the key areas covered by the guide and also outlines some other options for learning on HDInsight.
HDInsight Developer Guide
We have received requests from customers to have detailed documentation on how to architect, deploy, manage, monitor and secure big data solutions for use-cases and scenarios such as advanced analytics, streaming, business Iintelligence, ETL, and many more. In essence, customers have been looking for a guide that walks them through step-by-step on how to deploy and operate a big data solution. We are pleased to announce the release of the HDInsight Developer Guide, a guide that covers both basic as well as advanced scenarios useful for any developer, data scientists, or data engineer getting started or learning more with Azure HDInsight. The guide starts with a basic overview and use-cases, followed by best practices on how to configure cluster, plan capacity, and develop applications for different workloads such as Hive, Spark, and optimize workloads based. Finally, the guide concludes with advance use-cases and scenarios along with samples.
HDInsight training resources
In addition to the guide, we would also like to highlight the other resources available to learn or get to know more about HDInsight. Please see below for the different learning resources available for HDInsight including self-paced training, documentation, developer guide, videos, and many more.
Self-paced online trainings
- Self-paced online training on edX, an online learning destination, offers high-quality courses from around the world’s best universities and institutions to learners everywhere. These self-paced training courses are available as part of Microsoft Professional Program for Big Data, which are available for free, or you can add a verified certificate for a fee. These courses have been updated and below are the three specific courses on HDInsight.
- Processing Big Data in Azure HDInsight: This course teaches you how to use the Hadoop technologies in Microsoft Azure HDInsight to build batch processing solutions that cleanse and reshape data for analysis.
- Implementing Real Time Analytics in Azure HDInsight: In this course, you’ll learn how to implement low-latency and streaming big data solutions using Hadoop technologies like HBase, Storm, and Spark on Microsoft Azure HDInsight.
- Implementing Predictive Analytics in Azure HDInsight: In this course, learn how to implement predictive analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight.
- Self-paced online training on Microsoft Virtual Academy, which provides free online training by world-class experts to help you build your technical skills and advance your career. Ready to continue your big data deep dive? Below are the in-depth course to explore Hadoop and Spark on HDInsight, which are a key part of the analytics portion of MVA Data Series.
Self-serve documentation
- HDInsight Documentation: This is the landing page for HDInsight documentation that is useful to any developer, data scientists, or big data administrator. This 1385 page+ documentation includes all the section from getting started to specific topics on different scenarios and use-cases with HDInsight. You can also download the whole documentation using the “Download as PDF” option available on bottom left side of the page, or search for specific topics on the top left search box.
- HDInsight Troubleshooting Guide: We are constantly updating the troubleshooting guide so that you can easily debug or troubleshoot issues.
Instructor led training
Whether you’re looking to enhance your proficiency in Azure Machine Learning, ML Server, the Cognitive Services CNTK toolkit, or another cloud specialty, we’ve likely got a course that can get you on your way. The instructor-led and self-paced video courses span from short webinars, to multi-day workshops, to longer-term deep dives on demand. Check back frequently because new offerings are added by Microsoft and our training partners.
Videos
- HDInsight videos: Apart from the above resources, you can also search for specific topics from getting started to advanced topics on Channel 9 or YouTube.
The following videos are a good watch to learn about the scope and features in HDInsight.
- Deep Dive on Apache Spark Performance Tuning on HDInsight: Part 1, Part 2, Part 3, and Part 4
- Optimizing HBase Performance in HDInsight
- Apache Kafka on Azure HDInsight
- Compliance Standards on HDInsight
- Securing Azure HDInsight
- Big Data Partner Program
2016-17 conference recordings
Ignite 2016
- Build successful Big Data infrastructure using Azure HDInsight
- Secure your Enterprise Hadoop environments on Azure
- Explore Spark 2.0 and structured streaming in Microsoft Azure HDInsight
- Leverage R and Spark in Azure HDInsight for scalable machine learning
- Establish modern data ethos with Big Data on Microsoft Azure
Ignite 2017
- Building Petabyte scale Interactive Data warehouse in Azure HDInsight
- Enterprise security and monitoring for big data solutions on Azure HDInsight
- Streaming Big Data on Azure with HDInsight Kafka, Storm and Spark
- Building modern data pipelines with Spark on Azure HDInsight
- Patterns, Architecture, & Best Practices: Scaling Machine Learning Algorithms with Azure HDInsight
- Operationalizing Microsoft Cognitive Toolkit and TensorFlow models with HDInsight Spark
Strata
- Big data, AI, the genome, and everything
- Spark at Scale in Bing
- Using Big Data, Cloud and AI to Enable Intelligence at Scale
Hadoop Summit
- Build Big Data Enterprise solutions faster on Azure HDInsight
- Big Data in the Cloud
- Big Data Application Architectures IoT
//build
Hands on labs
- Data science lab: This lab specifically focuses on the Spark ML component of Spark and highlights its value proposition in the Apache Spark Big Data processing framework.
- Hive lab: This lab focuses on how customers can leverage HDInsight Hive to analyze big data stored in Azure Blob Storage.
Get Microsoft certified on HDInsight
- Perform Data Engineering on Microsoft Azure HDInsight
- Designing and Implementing Big Data Analytics Solutions
Resources
We hope that you would find the developer guide and all the other resources helpful. If you have any feedback or questions, feel free to send us an email at hdiask@microsoft.com. We’d love to hear from you.
Table of content for the HDInsight Developer Guide
This is a short version of the table of contents. This should give you a good idea of what you can expect from this guide.
- Overview
- Azure HDInsight and Hadoop Architecture
- Configuring the Cluster
- Configuring Identity and Access Controls
- Monitoring and managing the HDInsight cluster
- Developing Hive applications
- Developing Spark applications
- Use Spark with notebooks
- Use Spark with IntelliJ
- Spark samples
- Developing Spark ML applications
- Deep Learning with Spark
- Developing R scripts on HDInsight
- Developing Spark Streaming applications
- Optimizing Spark Performance
- Use HBase
- Use Phoenix with HBase on HDInsight
- Apache Open Source Ecosystem
- Advanced Scenarios and Deep Dives
- Troubleshooting