• Sales
  • My Account
  • Portal
Microsoft Azure Free trial
  • Features
    • What is Azure?
    • Compute

      Compute

      Virtual Machines
      Provision Windows and Linux Virtual Machines and applications in minutes
      Cloud Services
      Create highly available, infinitely scalable cloud applications and APIs
      Batch
      Run large-scale parallel and batch compute jobs
      Scheduler
      Run your jobs on simple or complex recurring schedules
      RemoteApp
      Deploy Windows client apps in the cloud, run on any device

      Get credits that enable:

      4 Windows or Linux VMs

      24 x 7 for a month

      And much more...

      Learn more

    • Web & Mobile

      Web & Mobile

      Websites
      Deploy and scale web apps in seconds
      Mobile Services
      Build and host the backend for any mobile app
      API Management
      Publish APIs to developers, partners and employees securely and at scale
      Notification Hubs
      Scalable, cross-platform push notification infrastructure

      Get credits that enable:

      Deploy 20 websites

      over 10M mobile API calls

      And much more...

      Learn more

    • Data & Storage

      Data & Storage

      SQL Database
      Managed Relational SQL Database-as-a-service
      DocumentDB
      Managed NoSQL document database-as-a-service
      Redis Cache
      High throughput, low latency data access to build fast and scalable applications
      Storage
      Durable, highly available and massively scalable cloud storage
      StorSimple
      Hybrid cloud storage for enterprises, reduces costs and improves data security
      Azure Search
      Fully-managed search-as-a-service

      Get credits that enable:

      8 standard SQL Databases

      Hadoop instance for a week

      And much more...

      Learn more

    • Analytics

      Analytics

      HDInsight
      Provision managed Hadoop clusters
      Machine Learning
      Powerful cloud-based predictive analytics
      Stream Analytics
      Real-time stream processing
      Data Factory
      Orchestrate and manage data transformation and movement
      Event Hubs
      Ingest, persist, and process millions of events per second

      Get credits that enable:

      8 standard SQL Databases

      Hadoop instance for a week

      And much more...

      Learn more

    • Networking

      Networking

      Virtual Network
      Provision private networks, optionally connect to on-premises datacenters
      ExpressRoute
      Dedicated private network fiber connections to Azure
      Traffic Manager
      Load balance incoming traffic for high performance and availability

      Connect Virtual Machines with Virtual Network for free.

      Learn more

    • Storage & Backup

      Storage & Backup

      Storage
      Durable, highly available and massively scalable cloud storage
      Backup
      Simple and reliable server backup to the cloud
      Site Recovery
      Orchestrate protection and recovery of private clouds

      Get credits that enable:

      Over 8 terabytes of storage

      Backup 700GB for a month

      And much more...

      Learn more

    • Media & CDN

      Media & CDN

      Media Services
      Encode, store, and stream video and audio at scale
      CDN
      Deliver content to end-users through a robust network of global data centers

      Get credits that enable:

      Encoding 100GB of media

      Transfer up to 1600GB of content

      And much more...

      Learn more

    • Hybrid Integration

      Hybrid Integration

      BizTalk Services
      Seamlessly integrate the enterprise and the cloud
      Service Bus
      Connect across private and public cloud environments
      Backup
      Simple and reliable server backup to the cloud
      Site Recovery
      Orchestrate protection and recovery of private clouds

      Get credits that enable:

      Send 200 million messages

      And much more...

      Learn more

    • Identity & Access Management

      Identity & Access Management

      Active Directory
      Synchronize on-premises directories and enable single sign-on
      Multi-Factor Authentication
      Safeguard access to your data and apps with an extra level of authentication

      Get credits that enable:

      Store 500,000 objects in Active Directory

      Multi-factor auth with 100 users

      And much more...

      Learn more

    • Developer Services

      Developer Services

      Visual Studio Online
      Plan, build and ship software, all from one place
      Application Insights
      Detect and solve problems to continuously improve web apps

      Each free account includes:

      Five free Basic licenses

      Unlimited private code repos

      And more!

      Learn more

    • Management

      Management

      Preview Portal
      Explore the new Azure Preview portal
      Scheduler
      Run your jobs on simple or complex recurring schedules
      Automation
      Simplify cloud management with process automation
      Operational Insights
      Collect, search and visualize machine data from on-premises and cloud

      Sign-up for free and get $200 to spend on all Azure services

      Learn more

    • Case Studies
  • Pricing
  • Documentation
  • Downloads
  • Marketplace
  • Blog
  • Community
    • Service Updates
    • Training
    • Events
    • Partners
      • Partner opportunity
      • Grow your business
      • Build your offering
      • Get started
    • Education
    • Newsletter
  • Support
    • Support Options
    • Support Plans
    • Forums
    • Service Dashboard
    • Trust Center
      • Overview
      • Security
      • Privacy
      • Compliance
      • Resources
      • FAQ
    • Legal
      • Overview
      • Subscription Agreement
      • Services Terms
      • Offer Details
      • Privacy Statement
      • Service Level Agreements
      • Preview Supplemental Terms
      • Store Terms
      • Website Terms Of Use
    • FAQ

HDInsight (Hadoop) documentation

  • Find documentation to get you started using Hadoop in the cloud.
  • Create Hadoop clusters in minutes, process big data, and analyze results with Excel.
  • Learn how to develop solutions with streaming or historical data.

Quick links

Service overview

Solutions you can deliver

Pricing details

Featured

Learning map: Get guidance
for learning Hadoop

Start now

Get started

  • Introduction to Hadoop in HDInsight
  • Query: Start using Hadoop
  • NoSQL: Start using HBase
  • Stream analytics: Start using Storm
  • Get started with the HDInsight Emulator

What's new

  • Release notes
  • Cluster versions in HDInsight
  • Overview of Storm in HDInsight
  • Overview of HBase in HDInsight
  • HBase clusters on a virtual network

Reference

  • PowerShell Cmdlets
  • .NET SDK for Hadoop
  • .NET library for Avro
  • .NET SDK for HBase
  • Stream computing: SDK for SCP.NET
More Less

Analytics

Process data

  • Use Hive with Hadoop in HDInsight

    Use HiveQL to query data in an Apache log4j log file, and report basic statistics.

  • Use Pig with Hadoop in HDInsight

    Write Pig Latin statements to analyze an Apache log4j log file, and run various queries on the data to generate output.

  • Use Python with Hive and Pig in HDInsight

    Both Hive and Pig allow you to create User Defined Functions (UDFs) using a variety of programming languages. Learn how to use a Python UDF from Hive and Pig.

  • Use Hadoop MapReduce in HDInsight

    Learn how to use Azure PowerShell from your workstation to submit a MapReduce program that counts word occurrences in text to an HDInsight cluster.

Business intelligence

  • Connect Excel to Hadoop with Power Query

    A key feature of Microsoft’s Big Data solution is solid integration of Apache Hadoop with Microsoft Business Intelligence (BI) components. Learn how to use Power Query to import HDInsight data into Excel.

  • Use Hive with HDInsight to analyze a website log

    Learn how to use HiveQL in HDInsight to analyze website logs to get insight into the frequency of visits in a day from external websites, and a summary of website errors that the users experience.

  • Analyze historical sensor data using Hive with Hadoop

    Learn how to analyze sensor data using Hive with HDInsight (Hadoop), and then visualize the data in Microsoft Excel. This sample uses Hive to process historical data produced by HVAC systems to report on reliability.

  • Analyze flight delay data using Hadoop in HDInsight

    Learn how to use Hive to calculate the average flight delay among airports, and then use Sqoop to export the results to SQL Database.

  • Connect Excel to Hadoop with the Microsoft Hive ODBC Driver

    Import data from Azure HDInsight into Excel using the Microsoft Hive ODBC Driver.

Social media

  • Analyze Twitter data with Hadoop in HDInsight

    Learn how to use Hive to analyze Twitter data to find usage frequency of a particular word.

  • Analyze real-time Twitter sentiment with HBase

    Using geo-tagged Tweets, learn how to do real-time sentiment analysis of big data using HBase in an HDInsight (Hadoop) cluster. Then, plot statistical results on Bing Maps.

Machine learning

  • Generate movie recommendations with Mahout

    Learn how to use a Mahout recommendation engine to get movie recommendations based on user preferences. Mahout is a machine learning library for Hadoop that contains algorithms for processing data.

  • Install and use R on HDInsight Hadoop clusters

    Learn how to install R for big data processing for machine learning on clusters deployed on Hadoop in HDInsight.

  • Install and use Spark on HDInsight clusters

    Use Script Actions to install Spark on HDInsight clusters.

Development

Get started

  • Get started using HDInsight Tools for Visual Studio

    Learn how to install and use HDInsight Tools for Visual Studio to connect to HDInsight and run Hive queries.

  • Use Maven to build Java applications that use HBase with Hadoop

    Learn how to create and build an Apache HBase application in Java using Apache Maven. Then, use the application with HDInsight (Hadoop.)

  • Submit Hadoop jobs in HDInsight

    Learn how to submit MapReduce and Hive jobs using PowerShell and HDInsight .NET SDK.

  • Develop Java MapReduce programs for Hadoop in HDInsight

    Follow this end-to-end scenario to learn how to develop and test a word-counting MapReduce job on HDInsight Emulator, and then deploy and run it on HDInsight.

  • Develop C# Hadoop streaming programs for HDInsight

    Learn how to develop and test a Hadoop streaming MapReduce program on HDInsight Emulator, and then run it on HDInsight using a PowerShell script.

  • Run the Hadoop samples in HDInsight

    The four samples included are intended to get you started quickly and to give you an extensible testing bed to work through concepts. Create data sets, and observe the effects of data size on jobs.

Import & export data

  • Move data with Sqoop between a Hadoop cluster and a database

    Learn how to use Azure PowerShell from a workstation to run Sqoop import and export between an HDInsight cluster and an Azure SQL database.

  • Serialize data with the Microsoft .NET Library for Avro

    Learn how to use the Microsoft .NET Library for Avro to serialize objects and other data structures into streams of bytes in order to persist them to memory, a database, or a file.

Workflow coordination

  • Use Oozie with Hadoop in HDInsight

    Learn how to run an Apache Oozie workflow to process a log4j log file to count the occurrences of each log level type. Then, export the results to an Azure SQL database table.

  • Use time-based Oozie coordinator with Hadoop in HDInsight

    Learn how to define workflows and coordinators, and how to trigger the Hadoop jobs based on time.

Extensibility

  • Script Action development with HDInsight

    Use Script Actions to install additional software such as R and Spark or to change the configuration of applications running on an Hadoop cluster.

  • Customize HDInsight clusters using Script Actions

    Customize HDInsight clusters to install additional components using custom scripts.

  • Install and use Giraph on HDInsight clusters

    This tutorial shows you how to install Giraph on an HDInsight cluster using Script Action customization while it's being deployed. Giraph enables graph processing, modeling relationships between objects.

  • Install and use Solr on HDInsight clusters

    This tutorial shows you how to install Solr on an HDInsight cluster using Script Action customization while it's being deployed. Solr enables enterprise-level search capabilities on data managed by Hadoop.

Big data applications

  • Analyze real-time sensor data using Storm and HBase

    Learn how to build a solution that uses a Storm cluster in HDInsight to process sensor data from Azure Event Hubs, and then displays the processed sensor data as near-real-time information on a web-based dashboard.

  • Develop streaming data processing applications with SCP.NET and C# on Storm

    Learn how to develop streaming data processing applications with SCP.NET and C# on Storm in HDInsight.

  • Perform graph analysis with Giraph using Hadoop

    Learn how to use Apache Giraph to find the shortest path between objects using Hadoop on HDInsight. With Giraph, you can gain insights into relationships, such as between friends on social networks (also called a "social graph") or between routers on a large network like the internet.

Debug & troubleshoot

  • Azure HDInsight release notes

    Keep up to date with improvements and updates in each HDInsight release. Learn about changes you may need to make to your HDInsight configuration, jobs, and so on.

  • Collect heap dumps for debugging and analysis

    Automatically collect heap dumps for debugging and analysis in your Blob storage account.

  • Access HDInsight application logs programmatically

    Learn how to programmatically enumerate the YARN applications that have completed on a Hadoop cluster and access the application logs.

  • Debug Hadoop in HDInsight: Interpret error messages

    Learn about errors that can occur when using PowerShell to manage HDInsight and the steps for recovering from them.

Operations

Cluster management

  • Monitor Hadoop clusters in HDInsight using the Ambari API

    Use the Apache Ambari APIs for provisioning, managing, and monitoring Hadoop clusters. Ambari has intuitive operator tools and robust APIs that hide the complexity of Hadoop.

  • Manage Hadoop clusters in HDInsight using Management Portal

    Learn how to use Azure Management Portal to create an HDInsight cluster, and how to open the administrative tools.

  • Manage Hadoop clusters in HDInsight using PowerShell

    Learn how to manage HDInsight clusters using a local Azure PowerShell console.

  • Manage Hadoop clusters in HDInsight using the command-line interface

    Learn how to use the Cross-Platform Command-Line Interface to manage HDInsight clusters.

  • Availability and reliability of Hadoop clusters in HDInsight

    HDInsight clusters are enhanced to provide the reliability and availability required to manage enterprise workloads.

Cluster provisioning

  • Provision Hadoop clusters using custom options

    Learn how to provision HDInsight clusters with custom options using Azure Management Portal, PowerShell, the command-line interface, and the HDInsight .NET SDK.

  • Provision HBase clusters on Azure Virtual Network

    Create an HBase cluster in HDInsight on Azure Virtual Network. Virtual network integration allows applications to communicate with HBase directly, improving performance and security.

Data management

  • Upload data for Hadoop jobs in HDInsight

    Learn how to upload and access data in HDInsight using Azure Storage Explorer, Azure PowerShell, the Hadoop command line, or Sqoop.

  • Use Azure Blob storage with Hadoop in HDInsight

    Learn how HDInsight works with data that is stored in Azure Blob storage, when to store data in HDFS, and when to store it in Blob storage.

Videos

Create an HDInsight cluster using the Windows Azure Management portal

02-12-2014 04 min, 18 sec

Create an HDInsight cluster using the Windows Azure Management portal

02-12-2014 04 min, 18 sec

Create an HDInsight cluster using the Windows Azure Management portal

Leveraging Hadoop 2 in Azure HDInsight

03-31-2014 00 min, 00 sec

Leveraging Hadoop 2 in Azure HDInsight

Integrating HDInsight with your Azure Apps

03-31-2014 01 hr, 04 min, 12 sec

Integrating HDInsight with your Azure Apps

Make Your Apps Smarter with Azure HDInsight

06-25-2013 01 hr, 05 min, 07 sec

Make Your Apps Smarter with Azure HDInsight

View more HDInsight videos

Looking for more resources?

ForumsAsk questions, share insights and discuss the platform ReferenceHDInsight SDK Documentation DownloadsAzure PowerShell Cmdlets HadoopWhy choose Hadoop on Azure?

Have an idea or suggestion for HDInsight?

Share your ideas with Microsoft and the community

See more ideas from the community

  • Go Social
  • Facebook
  • Twitter
  • Rss
  • Newsletter
  • Microsoft Azure
  • Features
  • Services
  • Regions
  • Case Studies
  • Pricing
  • Calculator
  • Documentation
  • Downloads
  • Marketplace
  • Microsoft Azure in China
  • Microsoft Azure Government
  • Community
  • Blogs
  • Service Updates
  • Forums
  • Events

  • Support
  • Forums
  • Service Dashboard
  • Support
  • Account
  • Subscriptions
  • Profile
  • Preview Features
  • Management Portal
  • Trust Center
  • Security
  • Privacy
  • Compliance
  • Hello from Seattle.
  • English (US)
    English (US) Čeština Dansk Deutsch
    English (India) English (UK) Español (ES) Suomi
    Français Ελληνικά Italiano Magyar
    Nederlands Bokmål Polski Português (BR)
    Português (PT) Svenska Română Türkçe
    Українська русский 日本語 한국어
    中文(简体) 中文(繁體)
     
  • USD
    US Dollar ($) Euro (€) Swiss Frank (chf) Argentine Peso ($)
    Australian Dollar ($) Danish Krone (kr) Canadian Dollar ($) Indonesian Rupiah (Rp)
    Japanese Yen (¥) Korean Won (₩) New Zealand Dollar ($) Norwegian Krone (kr)
    Russian Ruble (руб) Saudi Riyal (SR) South African Rand (R) Swedish Krona (kr)
    Taiwanese Dollar (NT$) Turkish Lira (TL) British Pound (£) Mexican Peso (MXN$)
    Malaysian Ringgit (RM$) Indian Rupee (₹) Hong Kong Dollar (HK$) Brazilian Real (R$)
     
  • Nutzungsbedingungen
  • Impressum
  • Contact Us
  • Trademarks
  • Privacy & Cookies
  • Feedback
Microsoft © 2014 Microsoft