We are excited to announce production support of Cloudera Enterprise on Azure. Customers can now deploy Cloudera Enterprise, Data Hub Edition via the Azure Marketplace. In this new offering on Azure, Cloudera has expanded support in two key areas:
- Support of Impala, HBase, Spark, and Solr components under all production workload types. This is suitable for higher resource-consuming services and production workloads running a variety of services.
- Support for configuration on DS-14 instance types with up to 10 1TB Premium Storage VHDs attached for worker nodes. This increases the node storage density to 10TB on each worker node. In addition, a 512GB VHD is allocated for logs.
Cloudera Enterprise can be deployed with a single click from Azure Marketplace, or from a Azure Resource Management template hosted on GitHub with additional level of control for customization.
Cloudera provides an enterprise-ready, open source distribution that includes Apache Hadoop and related projects. Cloudera Enterprise includes CDH, the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management. This massively scalable platform unites storage with an array of powerful processing and analytics frameworks and adds enterprise-class management, data security, and governance. Cloudera Enterprise includes core elements of Hadoop (HDFS, MapReduce, YARN) as well as HBase, Impala, Solr, Spark and more.
Cloudera Enterprise Architecture on Azure
The Cloudera cluster consists of DS14 virtual machine instances for both worker nodes and master nodes. All nodes are deployed in an Azure Virtual Network so that they can communicate with one another. Access to the nodes is protected with Network Security Groups (NSG) both at the subnet level and VM level. Edge nodes can be deployed separately to directly access the cluster’s internal network.
The nodes are provisioned with a CentOS 6.6 based Cloudera VM image. This image is configured for optimizing performance of Cloudera workload. Each worker node can have up to ten 1TB Premium Storage disks attached. Each master node has three 512GB Premium Storage disks. In addition, there is a 512GB Premium Storage disk attached for logs per node. Each node is in its own Azure Storage Account to maximum throughput.
A cluster of minimum four nodes, including three worker nodes and one master node, can be deployed for evaluation purpose. Production deployment consists three master nodes and three to 30 worker nodes. High Availability (HA) is supported by provisioning a standby master node.
Refer to this whitepaper for more details on Cloudera architecture on Azure.
Cloudera Enterprise Deployment on Azure Marketplace
To deploy a Cloudera cluster on Azure, you will need to have a sufficient number of CPU cores in your Azure subscription. The cluster deploys a minimum of four DS14 VMs, each with 16 cores. So a minimum of 64 cores are needed. To request an increase of quota for cores, open a support ticket and state the number of cores you need, the region you need them in, and specify that the cores are for Azure Resource Manager.
You can find the Cloudera Enterprise offering in the Azure Marketplace by navigating to Marketplace in the Azure portal, and searching for Cloudera:
Step 1: Follow the wizard to enter the “Basics” configuration for the cluster deployment such as cluster name, VM credentials, and resource group as shown below:
Step 2: Specify network topology by entering Azure Virtual Network and Subnet information for the cluster nodes to be deployed to:
Step 3: Enter Cloudera Manager credentials and cluster size:
Step 4: Enter user information. Please reference privacy statement for details about how user information is used:
Step 5: Review summary:
Step 6: Purchase and deploy the cluster:
Accessing the Provisioned Cloudera Cluster
After the cluster is provisioned successfully, access Cloudera Manager at http://[dnsName]-mn0.[region].cloudapp.azure.com:7180 using the Cloudera Manager user name and password specified during deployment.
If you run into an error during deployment, please navigate to the resource group that contains the Cloudera cluster in the Azure portal:
Click on the failed deployment:
Scroll down to find the oldest failed event, click on it to see the detailed error:
If the error appears to be transient, you may remove the resource group if it doesn’t contain any other resources created outside the Cloudera cluster deployment, and try again.
Cloudera Enterprise Deployment from GitHub
If you need a greater level of customization when deploying a Cloudera cluster, you can find this Azure Resource Management template published on GitHub. You can click on the “Deploy to Azure” button to deploy the cluster with a similar experience as deploying from Marketplace, except more parameters are exposed, for example, the address space for virtual network and subnet. You can also use Azure PowerShell or Azure Cross Platform Client Tool to deploy the template.
If you need to customize sub templates for master nodes or data nodes, for example, change the number of disks attached to each node, then download all the template files and scripts from GitHub, modify them as needed, and upload to your own GitHub repo. Finally, change the variable “scriptsUri” in AzureDeploy.json to point to your GitHub repo.