Microsoft HDInsight release notes
Notes for 6/24/2014 release
This release contains several new enhancements to HDInsight service:
- HDP 2.1 Availability: HDInsight 3.1 which contains HDP 2.1 is now generally available and is the default version for new clusters.
- HBase – Azure Management Portal Improvements: We are making HBase clusters available in Preview. You can now create HBase clusters from the portal with 3 clicks.
With HBase, you can build a variety of real-time workloads on HDInsight, from interactive websites that work with large datasets to services storing sensor and telemetry data from millions of end points. The next step would be to analyze the data in these workloads with Hadoop jobs and this is immediately possible in HDInsight through the experiences provided like PowerShell and Hive cluster dashboard.
Apache™ Mahout Now Pre-Installed on HDInsight 3.1
Mahout is preinstalled on HDInsight 3.1 Hadoop clusters. So you can run Mahout jobs without the need for any additional cluster configuration. For example, you can remote into an Hadoop cluster using the Remote Desktop Protocol (RDP) and without additional steps execute the Hello world Mahout command:
mahout org.apache.mahout.classifier.df.tools.Describe -p /user/hdp/glass.data -f /user/hdp/glass.info -d I 9 N L
mahout org.apache.mahout.classifier.df.BreimanExample -d /user/hdp/glass.data -ds /user/hdp/glass.info -i 10 -t 100
For a more complete explanation of this procedure, see the documentation of the Breiman Example on the Apache Mahout website.
Hive Queries can use Tez in HDinsight 3.1
Hive 0.13 is now available in HDInsight 3.1 and is capable of running queries using Tez, which can be leveraged for substantial performance improvements. Tez is not enable by default for Hive queries. To use it, you must opt in. You can enable Tez by running the following code snippet:
select sc_status, count(*), histogram_numeric(sc_bytes,5) from website_logs_orc_local group by sc_status;
Hortonworks has published a detailed breakdown of Hive query performance enhancements with Tez as delivered in standard benchmarks. For details, see Benchmarking Apache Hive 13 for Enterprise Hadoop.
For more details on using Hive with Tez, check out the Hive on Tez wiki page.
With the release of Azure HDInsight on Hadoop 2.2, Microsoft has made HDInsight available in all major Azure geographies. Specifically, west Europe and southeast Asia data centers have been brought online. This enables customers to locate clusters in a data center that is close and potentially in a zone of similar compliance requirements.
Prefix syntax: Only the "wasb://" syntax is supported in HDInsight 3.0 and 3.1 clusters. The older "asv://" syntax is supported in HDInsight 2.1 and 1.6 clusters, but it is not supported in HDInsight 3.0 clusters or later versions. This means that any jobs submitted to an HDInsight 3.0 or 3.1 cluster that explicitly use the “asv://” syntax will fail. The wasb:// syntax should be used instead. Also, jobs submitted to any HDInsight 3.0 or 3.1 clusters that are created with an existing metastore that contains explicit references to resources using the asv:// syntax will fail. These metastores will need to be recreated using the wasb:// to address resources.
Ports: The ports used by the HDInsight service have been changed. The port numbers which were being used were within the Windows OS ephemeral port range. Ports are allocated automatically from a predefined ephemeral range for short-lived internet protocol-based communications. The new set of allowed Hortonworks Data Platform (HDP) service port numbers are now outside of this range to avoid encountering conflicts that could arise with the ports used by services running on the headnode. The new port numbers should not cause any breaking changes. The numbers used now are as follows:
HDInsight 1.6 (HDP 1.1)
HDInsight 3.0 and 3.1 (HDP 2.0 and 2.1)
The SQL Server JDBC Driver is used internally by HDInsight and is not used for external operations. If you wish to connect to HDInsight using ODBC, please use the Microsoft Hive ODBC driver. For more information on using Hive ODBC, [Connect Excel to HDInsight with the Microsoft Hive ODBC Driver][connect-excel-with-hive-ODBC].
With this release, we have refreshed the following HDInsight (Hortonworks Data Platform - HDP) versions with several bug fixes:
- HDInsight 2.1 (HDP 1.3)
- HDInsight 3.0 (HDP 2.0)
- HDInsight 3.1 (HDP 2.1)
Hortonworks Release Notes
Release notes for the HDPs that are used by the versions of HDInsight cluster are available at the following locations.