Using the Azure preview portal, you can create Hadoop clusters in Azure HDInsight, change Hadoop user password, and enable Remote Desktop Protocol (RDP) so you can access the Hadoop command console on the cluster.
The information in this article only applies to Window-based HDInsight clusters. For information on managing Linux-based clusters, click the tab selector above.
Click the tab selector for information on creating Hadoop clusters in HDInsight using other tools.
The steps in this document use the Azure preview portal. Microsoft recommends using the Azure preview portal when creating new services. For an explanation of the advantages of the preview portal, see DevOps just got a whole lot more awesome.
Services and resources created in the Azure preview portal are not visible in the Azure portal, as they use a new resource model.
For a version of this document that uses the Azure portal, see the following link:
Before you begin this article, you must have the following:
After you open the portal, you can:
Click New from the left menu to create a new cluster:
Click HDInsight Clusters from the left menu.
If HDInsight doesn't appear in the left menu, click Browse.
For the creation instructions using the preview portal, see Create HDInsight clusters.
HDInsight works with a wide range of Hadoop components. For the list of the components that have been verified and supported, see What version of Hadoop is in Azure HDInsight. You can customize HDInsight by using one of the following options:
Some native Java components, like Mahout and Cascading, can be run on the cluster as JAR files. These JAR files can be distributed to Azure Blob storage, and submitted to HDInsight clusters through Hadoop job submission mechanisms. For more information, see Submit Hadoop jobs programmatically.
If you have issues deploying JAR files to HDInsight clusters or calling JAR files on HDInsight clusters, contact Microsoft Support.
Cascading is not supported by HDInsight, and is not eligible for Microsoft Support. For lists of supported components, see What's new in the cluster versions provided by HDInsight?.
Installation of custom software on the cluster by using Remote Desktop Connection is not supported. You should avoid storing any files on the drives of the head node, as they will be lost if you need to re-create the clusters. We recommend storing files on Azure Blob storage. Blob storage is persistent.
Click HDInsight Clusters from the left menu:
If HDInsight doesn't appear in the left menu, click Browse:
You shall see a list of clusters if there are any:
Use Filter items and "Subscription" to narrow down the list.
Double-click a cluster from the list to show the details.
Menu and essentials:
Users ( ): Allows you to set permissions for portal management of this cluster for other users on your Azure subscription.
Tags ( ): Tags allows you to set key/value pairs to define a custom taxonomy of your cloud services. For example, you may create a key named project, and then use a common value for all services associated with a specific project.
Documentation: Links to documentation for Azure HDInsight.
To manage the services provided by the HDInsight cluster, you must use Ambari Web or the Ambari REST API. For more information on using Ambari, see Manage HDInsight clusters using Ambari.
The properties lists the following:
Delete a cluster will not delete the default storange account or any linked storage accounts. You can re-create the cluster by using the same storage accounts and the same metastores.
See also Pause/shut down clusters.
The cluster scaling feature allows you to change the number of worker nodes used by a cluster that is running in Azure HDInsight without having to re-create the cluster.
Only clusters with HDInsight version 3.1.3 or higher are supported. If you are unsure of the version of your cluster, you can check the Properties page. See List and show clusters.
The impact of changing the number of data nodes for each type of cluster supported by HDInsight:
You can seamlessly increase the number of worker nodes in a Hadoop cluster that is running without impacting any pending or running jobs. New jobs can also be submitted while the operation is in progress. Failures in a scaling operation are gracefully handled so that the cluster is always left in a functional state.
When a Hadoop cluster is scaled down by reducing the number of data nodes, some of the services in the cluster are restarted. This causes all running and pending jobs to fail at the completion of the scaling operation. You can, however, resubmit the jobs once the operation is complete.
You can seamlessly add or remove nodes to your HBase cluster while it is running. Regional Servers are automatically balanced within a few minutes of completing the scaling operation. However, you can also manually balance the regional servers by logging into the headnode of cluster and running the following commands from a command prompt window:
>pushd %HBASE_HOME%\bin >hbase shell >balancer
For more information on using the HBase shell, see 
You can seamlessly add or remove data nodes to your Storm cluster while it is running. But after a successful completion of the scaling operation, you will need to rebalance the topology.
Rebalancing can be accomplished in two ways:
Please refer to the Apache Storm documentation for more details.
The Storm web UI is available on the HDInsight cluster:
Here is an example how to use the CLI command to rebalance the Storm topology:
## Reconfigure the topology "mytopology" to use 5 worker processes, ## the spout "blue-spout" to use 3 executors, and ## the bolt "yellow-bolt" to use 10 executors $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
To scale clusters
Enter Number of Worker nodes. The limit on the number of cluster node varies among Azure subscriptions. You can contact billing support to increase the limit. The cost information will reflect the changes you have made to the number of nodes.
Most of Hadoop jobs are batch jobs that are only ran occasionally. For most Hadoop clusters, there are large periods of time that the cluster is not being used for processing. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it is not in use. You are also charged for an HDInsight cluster, even when it is not in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they are not in use.
There are many ways you can program the process:
An HDInsight cluster can have two user accounts. The HDInsight cluster user account is created during the creation process. You can also create an RDP user account for accessing the cluster via RDP. See Enable remote desktop.
To change the HDInsight cluster user name and password
Change the Cluster Login Name and/or the Cluster Login Password, and then click Save.
HDInsight clusters have the following HTTP web services (all of these services have RESTful endpoints):
By default, these services are granted for access. You can revoke/grant the access from the Azure Preview portal.
To grant/revoke HTTP web services access
Each HDInsight cluster has a default storage account. The default storage account and its keys for a cluster apears under Settings/Properties/Azure Storage Keys. See List and show clusters.
In the ARM mode, each HDInsight cluster is created with an Azure resource group. The Azure resource group that a cluster belongs to appears in:
The HDInsight Query console includes the following features:
Hive Editor: A GUI web interface for submitting Hive jobs. See Run Hive queries using the Query Console.
Job history: Monitor Hadoop jobs.
Click Query Name to show the details including Job properties, Job Query, and **Job Output. You can also download both the query and the output to your workstation.
File Browser: Browse the default storage account and the linked storage accounts.
On the screenshot, the
From *Hadoop UI, you can browse files, and check logs.
To ran Hive jobs from the Preview portal, click Hive Editor in the HDInsight Query console. See Open HDInsight Query console.
To monitor jobs from the Preview portal, click Job History in the HDInsight Query console. See Open HDInsight Query console.
To browse files stored in the default storage account and the linked storage accounts, click File Browser in the HDInsight Query console. See Open HDInsight Query console.
You can also use the Browse the file system utility from the Hadoop UI in the HDInsight console. See Open HDInsight Query console.
The Usage section of the HDInsight cluster blade dislays information about the number of cores available to your subscription for use with HDInsight, as well as the number of cores allocated to this cluster and how they are allocated for the nodes within this cluster. See List and show clusters.
To monitor the services provided by the HDInsight cluster, you must use Ambari Web or the Ambari REST API. For more information on using Ambari, see Manage HDInsight clusters using Ambari
To monitor the cluster, browse the file system, and check logs, click Hadoop UI in the HDInsight Query console. See Open HDInsight Query console.
To use Yarn user interface, click Yarn UI in the HDInsight Query console. See Open HDInsight Query console.
The credentials for the cluster that you provided at its creation give access to the services on the cluster, but not to the cluster itself through Remote Desktop. You can turn on Remote Desktop access when you provision a cluster or after a cluster is provisioned. For the instructions about enabling Remote Desktop at creation, see Create HDInsight cluster.
To enable Remote Desktop
Enter Expires On, Remote Desktop Username and Remote Desktop Password, and then click Enable.
The default valus for Expires On is a week.
You can also use the HDInsight .NET SDK to enable Remote Desktop on a cluster. Use the EnableRdp method on the HDInsight client object in the following manner: client.EnableRdp(clustername, location, "rdpuser", "rdppassword", DateTime.Now.AddDays(6)). Similarly, to disable Remote Desktop on the cluster, you can use client.DisableRdp(clustername, location). For more information on these methods, see HDInsight .NET SDK Reference. This is applicable only for HDInsight clusters running on Windows.
To connect to a cluster by using RDP
To connect to the cluster by using Remote Desktop and use the Hadoop command line, you must first have enabled Remote Desktop access to the cluster as described in the previous section.
To open a Hadoop command line
From the desktop, double-click Hadoop Command Line.
For more information on Hadoop commands, see Hadoop commands reference.
In the previous screenshot, the folder name has the Hadoop version number embedded. The version number can changed based on the version of the Hadoop components installed on the cluster. You can use Hadoop environment variables to refer to those folders. For example:
cd %hadoop_home% cd %hive_home% cd %hbase_home% cd %pig_home% cd %sqoop_home% cd %hcatalog_home%
In this article, you have learned how to create an HDInsight cluster by using the preview portal, and how to open the Hadoop command-line tool. To learn more, see the following articles: