- Insights with familiar tools: Through deep integration with Microsoft BI tools such as PowerPivot, Power View, HDInsight enables you to easily find insights in data using Hadoop. Seamlessly combine data from several sources, including HDInsight, with Power Query. Easily map your data with the new Power Map, a 3D mapping tool in Excel 2013.
- Agility - HDInsight offers agility to meet the changing needs of your organization. With a rich array of Powershell scripts you can deploy and provision a Hadoop cluster in minutes instead of hours or days. If you need a larger cluster, simply delete your cluster and create a bigger one in minutes without losing any data.
- Enterprise-ready Hadoop: HDInsight offers enterprise-class security and manageability. Thanks to a dedicated Secure Node, HDInsight helps you secure your Hadoop cluster. In addition, we simplify manageability of your Hadoop cluster through extensive support for PowerShell scripting.
- Rich Developer Experience: HDInsight offers powerful programming capabilities with a choice of languages including .NET, Java and other languages. .NET developers can exploit the full power of language-integrated query with LINQ to Hive.
Getting Started with HDInsight
An HDInsight cluster can be created from the Windows Azure Management portal by clicking the new button and selecting HDInsight from the Data Services menu. To create an HDInsight cluster specify a name for the cluster, the size of the cluster in number of data nodes and a password for logging in. A cluster must have at least one storage account associated with it that will be the permanent storage mechanism for that cluster and the region the cluster is created in will always be the same region as the storage account chosen. At the time of general availability the storage account must reside in either West US, East US or North Europe to be associated with an HDInsight cluster. Additional storage accounts can be associated with a cluster using the custom create option.

$jarFile = "/example/jars/hadoop-examples.jar"
$className = "wordcount"
$statusDirectory = "/samples/wordcount/status"
$outputDirectory = "/samples/wordcount/output"
$inputDirectory = "/example/data/gutenberg"
$wordCount = New-AzureHDInsightMapReduceJobDefinition -JarFile $jarFile -ClassName
$className -Arguments $inputDirectory, $outputDirectory -StatusFolder $statusDirectory
Run these commands to get your subscription information and start execution of the MapReduce program. MapReduce jobs are typically long-running this so example shows how to use the asynchronous commands to kick off execution of the job.$subscriptionId = (Get-AzureSubscription -Current).SubscriptionId
$wordCountJob = $wordCount | Start-AzureHDInsightJob -Cluster HadoopIsAwesome -
Subscription $subscriptionId | Wait-AzureHDInsightJob -Subscription $subscriptionId
Finally, run this command to retrieve the results of execution and display those on the PowerShell command line.Get-AzureHDInsightJobOutput -Subscription (Get-AzureSubscription -Current).SubscriptionId -
Cluster bc-newhdstorage -JobId $wordCountJob.JobId –StandardError
The result of a MapReduce job is the information on the execution of the job itself as shown below.


Use-AzureHDInsightCluster HadoopIsAwesome (Get-AzureSubscription -Current).SubscriptionID
Next run this command to submit a HiveQL statement to the cluster. The statement uses a sample Hive table that is setup on the cluster by default when it is created.Invoke-Hive "select country, state, count(*) as records from hivesampletable group by country, state order by records desc limit 5"
The query is a fairly simple select-groupby and when complete will display the results on the PowerShell command line.
- Getting Started with the HDInsight Service
- Provisioning HDInsight clusters
- Submit Hadoop jobs programmatically
- Connect Excel to Windows Azure HDInsight with Power Query