Editor’s Note: This post comes from Shayne Burgess of the Windows Azure HDInsight Team.
Yesterday we released an important preview of HDInsight Service on Windows. This second blog in our 5-part series provides a quick walkthrough of this new update of HDInsight Service.
HDInsight provides everything you need to quickly deploy, manage and use Hadoop clusters running on Windows Azure.
Login to the portal and select HDInsight from the menu that appears after clicking the New button in the bottom left corner. Specify a name for the cluster, a password for logging into the cluster and the size of cluster you need. The size of the cluster determines the price for the cluster so be careful when choosing your cluster size.
A storage account is required to create a cluster and in the current public preview the storage account must reside in the East US region. The Azure Storage account you associate with your cluster is where you will store the data that you will analyze in HDInsight.
Creating a cluster will take a few minutes to create and configure the necessary Virtual Machines (VMs) that together make up your HDInsight cluster. The Hadoop components installed as part of an HDInsight cluster are outlined here. Once the cluster is created, drill into the dashboard view to see the cluster quick glance screen. This quick glance allows you to see the basic information about your cluster and gives you a simple method to connect to the cluster.
To open the cluster’s main dashboard page, click the Manage button. The cluster will ask you to login using the username and password you specified when creating the cluster (if you used the quick create option, the default username is admin).
Your cluster’s main portal page also contains a Samples tiles that you can use to learn some of the basics of using Hadoop.
Each sample highlights a different scenario when using HDInsight – exploring the samples will give you an overview of some of the capabilities of HDInsight and teach you how to do things like executing Hive queries and setting up SQOOP connectors.
The WordCount sample, for instance, shows you how to execute a MapReduce job that calculates the number of times a word occurs in a text file. The samples all contain a Deploy to your cluster button that will execute the sample MapReduce job on your cluster.
Examining Output in the Interactive Console
Run the WordCount sample to start a MapReduce job that will calculate the number of times words appear in the Notebooks of Leonardo DaVinci Project Gutenberg EBook. When the job completes you can use the Interactive console to view the output that has been stored in your Blob Storage account.
To view the word count, enter the command: “file = fs.read(“asv:///DaVinciAllTopWords”)” in the console prompt. Scroll back up to see the long list of words and their summary counts.
To continue learning about HDInsight, visit our Getting Started page.
We hope you will find HDInsight a valuable new service and are looking forward to your feedback.
Visit us on Wednesday for the next blog in our 5-part series that will focus on HDInsight and Azure Storage.