What are the different Hadoop components available with HDInsight?
Find out about the different service levels offered by HDInsight as well as the versions of different hadoop components included with HDInsight.
Azure HDInsight provides the big data cloud offerings in two categories: Standard and Premium. The table below section lists the features that are available only as part of Premium. Features that are not explicitly called out in the table here are available as part of Standard.
|HDInsight Premium feature||Description|
|Microsoft R Server (Preview)||Microsoft R Server is the most broadly deployed enterprise-class analytics platform for scalable R. The R language supports a variety of big data statistics, predictive modeling, and machine learning capabilities. As part of HDInsight Premium, you can now create an HDInsight cluster with R Server ready to be used with massive datasets and models. This new capability provides data scientists and statisticians a familiar R interface that can scale on-demand through HDInsight, without the overhead of cluster setup and maintenance. |
For more information, see Getting Started with R Server on HDInsight.
The following table lists the HDInsight cluster type and Premium support matrix.
This table will be updated as more cluster types are included in HDInsight Premium.
For information on pricing and SLA for HDInsight Premium, see HDInsight pricing.
Azure HDInsight supports multiple Hadoop cluster versions that can be deployed at any time. Each version choice creates a specific version of the Hortonworks Data Platform (HDP) distribution and a set of components that are contained within that distribution. The component versions associated with HDInsight cluster versions are itemized in the following table. Note that the default cluster version used by Azure HDInsight is currently 3.4, and, as of 09/14/2016, based on HDP 2.4.
The default version from the service may change without notice. We recommend that you specify the version when you create clusters using .NET SDK/Azure PowerShell and Azure CLI, if you have a version dependency.
|Component||HDInsight version 3.4 (Default)||HDInsight Version 3.3||HDInsight Version 3.2||HDInsight Version 3.1||HDInsight Version 3.0|
|Hortonworks Data Platform||2.4||2.3||2.2||2.1.7||2.0|
|Apache Hadoop & YARN||2.7.1||2.7.1||2.6.0||2.4.0||2.2.0|
|Apache Hive & HCatalog||1.2.1||1.2.1||0.14.0||0.13.1||0.12.0|
|Apache Spark||1.6.0 (Linux only)||1.5.2 (Linux only/Experimental build)||1.3.1 (Windows-only)|
Get current component version information
The component versions associated with HDInsight cluster versions may change in future updates to HDInsight. One way to determine the available components and to verify which versions are being used for a cluster is to use the Ambari REST API. The GetComponentInformation command can be used to retrieve information about a service component. For details, see the Ambari documentation. Another way to obtain this information is to log in to a cluster by using Remote Desktop and examine the contents of the "C:\apps\dist\" directory directly.
See HDInsight release notes for additional release notes on the latest versions of HDInsight.
The following table lists the versions of HDInsight currently available, the corresponding Hortonworks Data Platform versions that they use, and their release dates. When known, their support expiration and deprecation dates are also provided. Please note the following:
- Highly available clusters with two head nodes are deployed by default for HDInsight 2.1 and above. They are not available for HDInsight 1.6 clusters.
- Once the support has expired for a particular version, it may not be available through the Azure portal. The following table indicates which versions are available on the Azure Classic Portal. Cluster versions will continue to be available using the
Versionparameter in the Windows PowerShell New-AzureRmHDInsightCluster command and the .NET SDK until its deprecation date.
|HDInsight Version||HDP Version||VM OS||High Availability||Release Date||Available on Azure portal||Support Expiration Date||Deprecation Date|
|HDI 3.4||HDP 2.4||Ubuntu 14.0.4 LTS||Yes||03/29/2016||Yes|
|HDI 3.3||HDP 2.3||Ubuntu 14.0.4 LTS or Windows Server 2012R2||Yes||12/02/2015||Yes||06/27/2016||07/31/2017|
|HDI 3.2||HDP 2.2||Ubuntu 12.04 LTS or Windows Server 2012R2||Yes||2/18/2015||Yes||3/1/2016||04/01/2017|
|HDI 3.1||HDP 2.1||Windows Server 2012R2||Yes||6/24/2014||No||05/18/2015||06/30/2016|
|HDI 3.0||HDP 2.0||Windows Server 2012R2||Yes||02/11/2014||No||09/17/2014||06/30/2015|
|HDI 2.1||HDP 1.3||Windows Server 2012R2||Yes||10/28/2013||No||05/12/2014||05/31/2015|
|HDI 1.6||HDP 1.1||No||10/28/2013||No||04/26/2014||05/31/2015|
Deployment of non-default clusters
The SLA is defined in terms of a "Support Window". A Support Window refers to the period of time that an HDInsight cluster version is supported by Microsoft Customer Service and Support. An HDInsight cluster is outside the Support Window if its version has a Support Expiration Date past the current date. A list of supported HDInsight cluster versions can be found in the table above. The support expiration date for a given HDInsight version X (once a newer X+1 version is available) is calculated as the later of:
- Formula 1: Add 180 days to the date HDInsight cluster version X was released.
- Formula 2: Add 90 days to the date HDInsight cluster version X+1 (the subsequent version after X) is made available in the Portal.
The Deprecation Date is the date after which the cluster version cannot be created on HDInsight.
Windows-based HDInsight cluster (including version 2.1, 3.0, 3.1, 3.2 and 3.3) run on Azure Guest OS Family 4, which uses the 64-bit version of Windows Server 2012 R2 and supports .NET Framework 4.0, 4.5, 4.5.1, and 4.5.2.
HDInsight cluster version 3.4 uses a Hadoop distribution that is based on Hortonworks Data Platform 2.4. This is the default Hadoop cluster created when using the portal.
HDInsight cluster version 3.3 uses a Hadoop distribution that is based on Hortonworks Data Platform 2.3.
HDInsight cluster version 3.2 uses a Hadoop distribution that is based on Hortonworks Data Platform 2.2.
HDInsight cluster version 3.1 uses a Hadoop distribution that is based on Hortonworks Data Platform 2.1.7.HDInsight 3.1 clusters created before 11/7/2014 were based on the Hortonworks Data Platform 2.1.1.
HDInsight cluster version 3.0 uses a Hadoop distribution that is based on Hortonworks Data Platform 2.0.
HDInsight cluster version 2.1 uses a Hadoop distribution that is based on Hortonworks Data Platform 1.3.
HDInsight cluster version 1.6 uses a Hadoop distribution that is based on Hortonworks Data Platform 1.1.