Friends of Azure HDInsight, it's been a busy summer. I wanted to summarize several noteworthy enhancements we’ve recently brought to HDInsight. We have even more exciting releases coming up at Ignite so please stay tuned!
Apache Phoenix and Zeppelin integration
You can now query data in Apache Phoenix from Zeppelin.
Apache Phoenix is an open source, massively parallel relational database layer built on HBase. Phoenix allows you to use SQL like queries over HBase. Phoenix uses Java Data Connectivity (JDBC) drivers underneath to enable users to create, delete, alter SQL tables, indexes, views and sequences, and upsert rows individually and in bulk. Phoenix uses NoSQL native compilation rather than using MapReduce to compile queries, enabling the creation of low-latency applications on top of HBase.
Apache Phoenix enables online transaction processing (OLTP) and operational analytics in Hadoop for low latency applications by combining the best of both worlds. In Azure HDInsight Apache Phoenix is delivered as a first-class Open Source framework.
Oozie support in HDInsight enterprise security package
Oozie is a workflow scheduler system for managing Apache Hadoop jobs. You can now use Oozie in domain-joined Hadoop clusters to build secure Oozie workflows in Azure HDInsight.
Azure Data Lake Storage Gen2 integration
Microsoft announced a preview of Azure Data Lake Storage Gen2, a globally available HDFS filesystem to store and analyze petabyte-size files and trillions of objects. HDInsight clusters can work with Azure Data Lake Storage Gen 2.
Read More: HDInsight with Azure Data Lake Storage Gen2
ML Services 9.3 and open-source R capabilities on HDInsight
ML Services on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size, loaded to either Azure Blob or Data Lake storage. Since ML Services cluster is built on open-source R, the R-based applications you build can leverage any of the 8,000+ open-source R packages. The routines in ScaleR, Microsoft’s big data analytics package are also available.
Learn More: Introducing ML Services 9.3 in Azure HDInsight
Virtual Network Service Endpoints support
We announced support for Virtual Network Service Endpoints which allows customers to securely connect to Azure Blob Storage, Azure Data Lake Storage Gen2, Cosmos DB and SQL databases. By enabling a Service Endpoint for HDInsight, traffic flows through a secured route from within the Azure data center.
Kafka 1.0 and 1.1 support
Kafka on HDInsight provides a high-throughput, low-latency ingestion platform for your real-time data pipeline. We announced the support for Kafka 1.0 and 1.1
Support for Spark 2.3
We made Apache Spark 2.3.0 for production use on HDInsight. Ranging from bug fixes (more than 1,400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.
Move large data sets to Azure using WANDisco on Azure HDInsight
With WANdisco Fusion, you can move data that you have in other large-scale analytics platforms to Azure Blob Storage, ADLS Gen1 and Gen2 without downtime or disruption to your existing environment. Customers can also replicate the data, and metadata (Hive database schema, authorization policies using Apache Ranger, Sentry, and more) across different regions to make the data lake available globally for analytics.
We are excited to see the customer momentum across different industry verticals, and will continue to bring new capabilities to HDInsight to solve new big data challenges.
Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Azure HDInsight powers mission-critical applications across a wide variety of sectors and address a broad range of use cases including ETL, streaming, and interactive querying.
- Learn more about Azure HDInsight
- Open Source component guide on HDInsight
- HDInsight release notes
- Ask HDInsight questions on Msdn forums
- Ask HDInsight questions on stackoverflow