Less More

Tutorials and Guides


What's new in the Hadoop cluster versions provided by HDInsight?

Learn what Hadoop components and versions are included in HDInsight.

Learning map for HDInsight

This page provides a quick overview of the learning resources for Azure HDInsight. Use the diagram to guide you in the most effective learning path.

Introduction to Hadoop in HDInsight

Get an overview of HDInsight components, common terminology, and scenarios. HDInsight deploys and provisions Apache Hadoop clusters in the cloud, providing a software framework designed to manage, analyze, and report on big data.

Get started using Hadoop in HDInsight

Learn how to provision clusters, query data with Hive, and output to Excel for analysis.

Overview of HBase in HDInsight

Learn about HBase, a NoSQL database built on Hadoop and designed for very large amounts of data. HBase clusters on HDInsight are configured to store data directly in Azure Blob storage, with low latency and increased elasticity in performance vs. cost.

Get started using HBase with Hadoop in HDInsight

Apache HBase is an open source, distributed, large-scale data store that provides low latency for random reads and writes. In this tutorial, learn how to create and query HBase tables with HDInsight.

Overview of Storm in HDInsight

Storm in HDInsight allows you to process streaming data in real time. As a managed cluster integrated into the Azure ecosystem, Storm can be configured to work with other Azure services for a complete real-time data processing and analysis solution.

Get started with a word count topology on Storm in HDInsight

Learn how to set up and run a word count topology on Storm in HDInsight. This tutorial will guide you through provisioning a Storm cluster, and then running, monitoring, and stopping a Storm topology.

Get started with the HDInsight Emulator

Learn how to use the Microsoft HDInsight Emulator for Azure, which provides a local development environment.


Azure HDInsight release notes

Keep up to date with improvements and updates in each HDInsight release. Learn about changes you may need to make to your HDInsight configuration, jobs, and so on.

Get started using HDInsight Tools for Visual Studio

Learn how to install and use HDInsight Tools for Visual Studio to connect to HDInsight and run Hive queries.

Run the Hadoop samples in HDInsight

The four samples included are intended to get you started quickly and to give you an extensible testing bed to work through concepts. Create data sets, and observe the effects of data size on jobs.

Script Action development with HDInsight

Use Script Actions to install additional software such as R and Spark or to change the configuration of applications running on an Hadoop cluster.

Customize HDInsight clusters using Script Actions

Customize HDInsight clusters to install additional components using custom scripts.

Install and use R on HDInsight Hadoop clusters

Data scientists and analysts can use R for big data processing on Hadoop clusters deployed in HDInsight.

Install and use Spark on HDInsight clusters

Use Script Actions to install Spark on HDInsight clusters.

Develop Java MapReduce programs for Hadoop in HDInsight

Follow this end-to-end scenario to learn how to develop and test a word-counting MapReduce job on HDInsight Emulator, and then deploy and run it on HDInsight.

Develop C# Hadoop streaming programs for HDInsight

Learn how to develop and test a Hadoop streaming MapReduce program on HDInsight Emulator, and then run it on HDInsight using a PowerShell script.

Use Maven to build Java applications that use HBase with Hadoop

Learn how to create and build an Apache HBase application in Java using Apache Maven. Then, use the application with HDInsight (Hadoop.)

Submit Hadoop jobs in HDInsight

Learn how to submit MapReduce and Hive jobs using PowerShell and HDInsight .NET SDK.

Use Hadoop MapReduce in HDInsight

Learn how to use Azure PowerShell from your workstation to submit a MapReduce program that counts word occurrences in text to an HDInsight cluster.

Use Hive with Hadoop in HDInsight

Use HiveQL to query data in an Apache log4j log file, and report basic statistics.

Use Pig with Hadoop in HDInsight

Write Pig Latin statements to analyze an Apache log4j log file, and run various queries on the data to generate output.

Use Python with Hive and Pig in HDInsight

Both Hive and Pig allow you to create User Defined Functions (UDF) using a variety of programming languages. Learn how to use a Python UDF from Hive and Pig.

Move data with Sqoop between a Hadoop cluster and a database

Learn how to use Azure PowerShell from a workstation to run Sqoop import and export between an HDInsight cluster and an Azure SQL database.

Use Oozie with Hadoop in HDInsight

Learn how to run an Apache Oozie workflow to process a log4j log file to count the occurrences of each log level type. Then, export the results to an Azure SQL database table.

Serialize data with the Microsoft .NET Library for Avro

Learn how to use the Microsoft .NET Library for Avro to serialize objects and other data structures into streams of bytes in order to persist them to memory, a database, or a file.

HDInsight SDK Reference Documentation

Configure and run jobs on HDInsight clusters using Azure HDInsight PowerShell. Develop applications that manage HDInsight jobs with the Azure HDInsight .NET SDK.

Debug Hadoop in HDInsight: Interpret error messages

Learn about errors that can occur when using PowerShell to manage HDInsight and the steps for recovering from them.

Develop streaming data processing applications with SCP.NET and C# on Storm

Learn how to develop streaming data processing applications with SCP.NET and C# on Storm in HDInsight.


Analyze real-time sensor data using Storm and Hadoop

Learn how to build a solution that uses a Storm cluster in HDInsight to process sensor data from Azure Event Hubs, and then displays the processed sensor data as near-real-time information on a web-based dashboard.

Analyze historical sensor data using Hive with Hadoop

Learn how to analyze sensor data using Hive with HDInsight (Hadoop), and then visualize the data in Microsoft Excel. In this sample, you'll use Hive to process historical data produced by HVAC systems to see which systems can't reliably maintain a set temperature.

Use Hive with HDInsight to analyze a website log

Learn how to use HiveQL in HDInsight to analyze website logs to get insight into the frequency of visits in a day from external websites, and a summary of website errors that the users experience.

Generate movie recommendations with Mahout

Learn how to use a Mahout recommendation engine to get movie recommendations based on user preferences. Mahout is a machine learning library for Apache Hadoop that contains algorithms for processing data, such as filtering, classification, and clustering.

Perform graph analysis with Giraph using Hadoop

Learn how to use Apache Giraph to find the shortest path between objects using Hadoop on HDInsight. With Giraph, you can gain insights into relationships, such as between friends on social networks (also called a "social graph") or between routers on a large network like the internet.

Analyze Twitter data with Hadoop in HDInsight

Learn how to use Hive to analyze Twitter data to find usage frequency of a particular word.

Analyze real-time Twitter sentiment with HBase

Using geo-tagged Tweets, learn how to do real-time sentiment analysis of big data using HBase in an HDInsight (Hadoop) cluster. Then, plot statistical results on Bing Maps.

Analyze flight delay data using Hadoop in HDInsight

Learn how to use Hive to calculate the average flight delay among airports, and how to use Sqoop to export the results to SQL Database.

Connect Excel to Hadoop with the Microsoft Hive ODBC Driver

Import data from Azure HDInsight into Excel using the Microsoft Hive ODBC Driver.

Connect Excel to Hadoop with Power Query

A key feature of Microsoft’s Big Data solution is solid integration of Apache Hadoop with Microsoft Business Intelligence (BI) components. Learn how to use Power Query to import HDInsight data into Excel.


Availability and reliability of Hadoop clusters in HDInsight

HDInsight clusters are enhanced to provide the reliability and availability required to manage enterprise workloads.

Manage Hadoop clusters in HDInsight using Management Portal

Learn how to use Azure Management Portal to create an HDInsight cluster, and how to open the administrative tools.

Manage Hadoop clusters in HDInsight using PowerShell

Learn how to manage HDInsight clusters using a local Azure PowerShell console.

Manage Hadoop clusters in HDInsight using the command-line interface

Learn how to use the Cross-Platform Command-Line Interface to manage HDInsight clusters.

Monitor Hadoop clusters in HDInsight using the Ambari API

Use the Apache Ambari APIs for provisioning, managing, and monitoring Hadoop clusters. Ambari has intuitive operator tools and robust APIs that hide the complexity of Hadoop.

Use time-based Oozie coordinator with Hadoop in HDInsight

Learn how to define workflows and coordinators, and how to trigger the Hadoop jobs based on time.

Access HDInsight application logs programmatically

Learn how to programmatically enumerate the YARN applications that have completed on a Hadoop cluster and access the application logs.

Provision Hadoop clusters in HDInsight

Learn how to provision HDInsight clusters using Azure Management Portal, PowerShell, the command-line interface, and the HDInsight .NET SDK.

Provision HBase clusters on Azure Virtual Network

Create an HBase cluster in HDInsight on Azure Virtual Network. Virtual network integration allows applications to communicate with HBase directly, improving performance and security.

Upload data for Hadoop jobs in HDInsight

Learn how to upload and access data in HDInsight using Azure Storage Explorer, Azure PowerShell, the Hadoop command line, or Sqoop.

Use Azure Blob storage with Hadoop in HDInsight

Learn how HDInsight works with data that is stored in Azure Blob storage, when to store data in HDFS, and when to store it in Blob storage.