Tutorials and Guides


What's new in the cluster versions provided by HDInsight?

Learn what Hadoop components and versions are included in HDInsight.

Get started using Hadoop 2.2 clusters with HDInsight

HDInsight cluster version 3.0 now supports Hadoop 2.2 and takes full advantage of this platform to provide a range of significant benefits to customers.

Get started with HDInsight

With HDInsight version 2.0 clusters, provision a cluster, run a sample MapReduce program from the sample gallery, examine the output of this program, and connect to BI tools.

Introduction to HDInsight

Get an overview of HDInsight components, common terminology, and scenarios, and see resources for HDInsight, Apache Hadoop, and Microsoft Business Intelligence.

Get started with the HDInsight Emulator

Learn how to use the Microsoft HDInsight Emulator for Azure, which provides a local development environment.

Run HDInsight samples

The four samples included are intended to get you started quickly and to give you an extensible testing bed to work through concepts. Create data sets, and observe the effects of data size on jobs.


Develop Java MapReduce programs for HDInsight

Follow this end-to-end scenario to learn how to develop and test a word-counting MapReduce job on HDInsight Emulator, and then deploy and run it on HDInsight.

Develop C# streaming programs for HDInsight

Learn how to develop and test a Hadoop streaming MapReduce program on HDInsight Emulator, and then run it on HDInsight using a PowerShell script.

HDInsight SDK Reference Documentation

Configure and run jobs on HDInsight clusters using Azure HDInsight PowerShell. Develop applications that manage HDInsight jobs with the Azure HDInsight .NET SDK.

Debug HDInsight: Interpret error messages

Learn about errors that can occur when using PowerShell to manage HDInsight and the steps for recovering from them.

Submit Hadoop jobs programmatically

Learn how to submit MapReduce and Hive jobs using PowerShell and HDInsight .NET SDK.

Use MapReduce with HDInsight

Learn how to use Azure PowerShell from your workstation to submit a MapReduce program that counts word occurrences in text to an HDInsight cluster.

Use Hive with HDInsight

Use HiveQL to query data in an Apache log4j log file, and report basic statistics.

Use Pig with HDInsight

Write Pig Latin statements to analyze an Apache log4j log file, and run various queries on the data to generate output.

Use Sqoop with HDInsight

Learn how to use Azure PowerShell from a workstation to run Sqoop import and export between an HDInsight cluster and a Azure SQL database.

Use Oozie with HDInsight

Learn how to run an Apache Oozie workflow to process a log4j log file to count the occurences of each log level type. Then. export the results to a Azure SQL database table.

Serialize data with the Microsoft Avro Library

Learn how to use the Microsoft Avro Library (Avro.NET) to serialize objects and other data structures into streams of bytes in order to persist them to memory, a database or a file.


Analyze Twitter data with HDInsight

Learn how to use Hive to analyze Twitter data to find usage frequency of a particular word.

Analyze flight delay data using HDInsight

Learn how to use Hive to calculate the average flight delay among airports, and how to use Sqoop to export the results to SQL Database.

Connect Excel to HDInsight with the Microsoft Hive ODBC Driver

Import data from Azure HDInsight into Excel using the Microsoft Hive ODBC Driver.

Connect Excel to HDInsight with Power Query

A key feature of Microsoft’s Big Data solution is solid integration of Apache Hadoop with Microsoft Business Intelligence (BI) components. Learn how to use Power Query to import HDInsight data into Excel.


Availability and reliability of HDInsight clusters

DInsight clusters are enhanced to provide the reliability and availability required to manage enterprise workloads.

Administer HDInsight clusters using Management Portal

Learn how to use Azure Management Portal to create an HDInsight cluster, and how to open the administrative tools.

Administer HDInsight using PowerShell

Learn how to manage HDInsight clusters using a local Azure PowerShell console.

Administer HDInsight using the command-line interface

Learn how to use the Cross-Platform Command-Line Interface to manage HDInsight clusters.

Monitor HDInsight clusters using Ambari API

Use the Apache Ambari APIs for provisioning, managing, and monitoring Hadoop clusters. Ambari has intuitive operator tools and robust APIs that hide the complexity of Hadoop.

Use time-based Oozie Coordinator with HDInsight

Learn how to define workflows and coordinators, and how to trigger the coordinator jobs based on time.

Provision HDInsight clusters

Learn how to provision HDInsight clusters using Azure Management Portal, PowerShell, the command-line interface, and the HDInsight .NET SDK.

Upload data to HDInsight

Learn how to upload and access data in HDInsight using Azure Storage Explorer, Azure PowerShell, the Hadoop command line, or Sqoop.

Use Azure Blob storage with HDInsight

Learn how HDInsight works with data that is stored in Azure Blob storage, when to store data in HDFS, and when to store it in Blob storage.