Using the Azure preview portal, you can provision Hadoop clusters in Azure HDInsight, change Hadoop user password, and enable Remote Desktop Protocol (RDP) so you can access the Hadoop command console on the cluster.
The steps in this document use the Azure preview portal. Microsoft recommends using the Azure preview portal when creating new services. For an explanation of the advantages of the preview portal, see DevOps just got a whole lot more awesome.
Services and resources created in the Azure preview portal are not visible in the Azure portal, as they use a new resource model.
For a version of this document that uses the Azure portal, see the following link:
The steps in this document are specific to working with Windows-based Hadoop clusters. For information on working with Linux-based clusters, see Manage Hadoop clusters in HDInsight by using the Azure preview portal
There are also other tools available for administering HDInsight in addition to the preview portal.
Before you begin this article, you must have the following:
For the provision instructions using the preview portal, see Provision HDInsight clusters.
HDInsight works with a wide range of Hadoop components. For the list of the components that have been verified and supported, see What version of Hadoop is in Azure HDInsight. You can customize HDInsight by using one of the following options:
Some native Java components, like Mahout and Cascading, can be run on the cluster as JAR files. These JAR files can be distributed to Azure Blob storage, and submitted to HDInsight clusters through Hadoop job submission mechanisms. For more information, see Submit Hadoop jobs programmatically.
If you have issues deploying JAR files to HDInsight clusters or calling JAR files on HDInsight clusters, contact Microsoft Support.
Cascading is not supported by HDInsight, and is not eligible for Microsoft Support. For lists of supported components, see What's new in the cluster versions provided by HDInsight?.
Installation of custom software on the cluster by using Remote Desktop Connection is not supported. You should avoid storing any files on the drives of the head node, as they will be lost if you need to re-create the clusters. We recommend storing files on Azure Blob storage. Blob storage is persistent.
To access the cluster
Click one of the clusters from the list to open the cluster blade:
Click Settings to see the configuration details and to configure the cluster:
|Properties||Display the cluster properties.|
|Azure Storage Keys||Display the default Azure storage account information.|
|Cluster Login||Grant/revoke the HTTP web services access, and configure the cluster login information.|
|External Metastores||Display the Hive/Oozie metastore information.|
|Scale cluster||Increase/decrease the number of workers nodes of the cluster.|
|Remote Desktop||Enable/disable the Remote Desktop connectivity, connect to the cluster via Remote Desktop.|
HDInsight clusters have the following HTTP web services (all of these services have RESTful endpoints):
By default, these services are granted for access. You can revoke/grant the access from the Azure portal.
To grant/revoke HTTP web services access
This can also be done through the Azure PowerShell cmdlets:
An HDInsight cluster can have two user accounts. The HDInsight cluster user account is created during the provisioning process. You can also create an RDP user account for accessing the cluster via RDP. See Enable remote desktop.
To change the HDInsight cluster user name and password
Change the Cluster Login Name and/or the Cluster Login Password, and then click Save.
The cluster scaling feature allows you to change the number of worker nodes used by a cluster that is running in Azure HDInsight without having to re-create the cluster.
Only clusters with HDInsight version 3.1.3 or higher are supported. If you are unsure of the version of your cluster, you can check the Properties page. See Get familiar with the cluster portal interface.
The impact of changing the number of data nodes for each type of cluster supported by HDInsight:
You can seamlessly increase the number of worker nodes in a Hadoop cluster that is running without impacting any pending or running jobs. New jobs can also be submitted while the operation is in progress. Failures in a scaling operation are gracefully handled so that the cluster is always left in a functional state.
When a Hadoop cluster is scaled down by reducing the number of data nodes, some of the services in the cluster are restarted. This causes all running and pending jobs to fail at the completion of the scaling operation. You can, however, resubmit the jobs once the operation is complete.
You can seamlessly add or remove nodes to your HBase cluster while it is running. Regional Servers are automatically balanced within a few minutes of completing the scaling operation. However, you can also manually balance the regional servers by logging into the headnode of cluster and running the following commands from a command prompt window:
>pushd %HBASE_HOME%\bin >hbase shell >balancer
For more information on using the HBase shell, see 
You can seamlessly add or remove data nodes to your Storm cluster while it is running. But after a successful completion of the scaling operation, you will need to rebalance the topology.
Rebalancing can be accomplished in two ways:
Please refer to the Apache Storm documentation for more details.
The Storm web UI is available on the HDInsight cluster:
Here is an example how to use the CLI command to rebalance the Storm topology:
## Reconfigure the topology "mytopology" to use 5 worker processes, ## the spout "blue-spout" to use 3 executors, and ## the bolt "yellow-bolt" to use 10 executors $ storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
To scale clusters
Enter Number of Worker nodes. The limit on the number of cluster node varies among Azure subscriptions. You can contact billing support to increase the limit. The cost information will reflect the changes you have made to the number of nodes.
The credentials for the cluster that you provided at its creation give access to the services on the cluster, but not to the cluster itself through Remote Desktop. You can turn on Remote Desktop access when you provision a cluster or after a cluster is provisioned. For the instructions about enabling Remote Desktop at provision, see Provision HDInsight cluster.
To enable Remote Desktop
Enter Expires On, Remote Desktop Username and Remote Desktop Password, and then click Enable.
The default valus for Expires On is a week.
You can also use the HDInsight .NET SDK to enable Remote Desktop on a cluster. Use the EnableRdp method on the HDInsight client object in the following manner: client.EnableRdp(clustername, location, "rdpuser", "rdppassword", DateTime.Now.AddDays(6)). Similarly, to disable Remote Desktop on the cluster, you can use client.DisableRdp(clustername, location). For more information on these methods, see HDInsight .NET SDK Reference. This is applicable only for HDInsight clusters running on Windows.
To connect to a cluster by using RDP
To connect to the cluster by using Remote Desktop and use the Hadoop command line, you must first have enabled Remote Desktop access to the cluster as described in the previous section.
To open a Hadoop command line
From the desktop, double-click Hadoop Command Line.
For more information on Hadoop commands, see Hadoop commands reference.
In the previous screenshot, the folder name has the Hadoop version number embedded. The version number can changed based on the version of the Hadoop components installed on the cluster. You can use Hadoop environment variables to refer to those folders. For example:
cd %hadoop_home% cd %hive_home% cd %hbase_home% cd %pig_home% cd %sqoop_home% cd %hcatalog_home%
In this article, you have learned how to create an HDInsight cluster by using the preview portal, and how to open the Hadoop command-line tool. To learn more, see the following articles: