Skip Navigation

Exciting new capabilities on Azure HDInsight

Posted on September 6, 2018

Principal Program Manager, Azure HDInsight

Friends of Azure HDInsight, it's been a busy summer. I wanted to summarize several noteworthy enhancements we’ve recently brought to HDInsight. We have even more exciting releases coming up at Ignite so please stay tuned!

Product updates

Apache Phoenix and Zeppelin integration

You can now query data in Apache Phoenix from Zeppelin.

Apache Phoenix is an open source, massively parallel relational database layer built on HBase. Phoenix allows you to use SQL like queries over HBase. Phoenix uses Java Data Connectivity (JDBC) drivers underneath to enable users to create, delete, alter SQL tables, indexes, views and sequences, and upsert rows individually and in bulk. Phoenix uses NoSQL native compilation rather than using MapReduce to compile queries, enabling the creation of low-latency applications on top of HBase.

Apache Phoenix enables online transaction processing (OLTP) and operational analytics in Hadoop for low latency applications by combining the best of both worlds. In Azure HDInsight Apache Phoenix is delivered as a first-class Open Source framework.

Read More: Azure #HDInsight Apache Phoenix now supports Zeppelin 

Oozie support in HDInsight enterprise security package

Oozie is a workflow scheduler system for managing Apache Hadoop jobs. You can now use Oozie in domain-joined Hadoop clusters to build secure Oozie workflows in Azure HDInsight.

Read More: Build secure Oozie workflows in Azure HDInsight with Enterprise Security Package

Azure Data Lake Storage Gen2 integration

Microsoft announced a preview of Azure Data Lake Storage Gen2, a globally available HDFS filesystem to store and analyze petabyte-size files and trillions of objects. HDInsight clusters can work with Azure Data Lake Storage Gen 2.

Read More: HDInsight with Azure Data Lake Storage Gen2

ML Services 9.3 and open-source R capabilities on HDInsight

ML Services on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size, loaded to either Azure Blob or Data Lake storage. Since ML Services cluster is built on open-source R, the R-based applications you build can leverage any of the 8,000+ open-source R packages. The routines in ScaleR, Microsoft’s big data analytics package are also available.

Learn More: Introducing ML Services 9.3 in Azure HDInsight

Virtual Network Service Endpoints support

We announced support for Virtual Network Service Endpoints which allows customers to securely connect to Azure Blob Storage, Azure Data Lake Storage Gen2, Cosmos DB and SQL databases. By enabling a Service Endpoint for HDInsight, traffic flows through a secured route from within the Azure data center.

Read More: How to enhance HDInsight security with service endpoints

Kafka 1.0 and 1.1 support

Kafka on HDInsight provides a high-throughput, low-latency ingestion platform for your real-time data pipeline. We announced the support for Kafka 1.0 and 1.1

Read More: Kafka 1.0 on HDInsight lights up real-time analytics scenarios and Kafka 1.1 support on Azure HDInsight

Support for Spark 2.3

We made Apache Spark 2.3.0 for production use on HDInsight. Ranging from bug fixes (more than 1,400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.

Read More: Azure HDInsight now supports Apache Spark 2.3

Move large data sets to Azure using WANDisco on Azure HDInsight

With WANdisco Fusion, you can move data that you have in other large-scale analytics platforms to Azure Blob Storage, ADLS Gen1 and Gen2 without downtime or disruption to your existing environment. Customers can also replicate the data, and metadata (Hive database schema, authorization policies using Apache Ranger, Sentry, and more) across different regions to make the data lake available globally for analytics.

Read more: Globally replicated data lakes with LiveData using WANdisco on Azure

More blogs:

Azure #HDInsight Interactive Query: simplifying big data analytics architecture

How Microsoft drives exabyte analytics on the world’s largest YARN cluster

Top 8 reasons to choose Azure HDInsight

Avoid Big Data pitfalls with Azure HDInsight and these partner solutions 

Enterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight

Microsoft deepens its commitment to Apache Hadoop and open source analytics

Siphon: Streaming data ingestion with Apache Kafka

Azure HDInsight Interactive Query: Ten tools to analyze big data faster

How to use DBeaver with Azure #HDInsight

HDInsight HBase: Migrating to new HDInsight version

Customer stories

Chinese vending machine innovator automates shopping with cloud technology

Argentinian company harnesses data to give e-commerce sellers valuable business insights

Data analytics firm signs two major deals after introductions to potential customers

We are excited to see the customer momentum across different industry verticals, and will continue to bring new capabilities to HDInsight to solve new big data challenges.

About HDInsight

Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Azure HDInsight powers mission-critical applications across a wide variety of sectors and address a broad range of use cases including ETL, streaming, and interactive querying.

Additional resources