Friends of Azure HDInsight, it's been a busy summer. I wanted to summarize several noteworthy enhancements we’ve recently brought to HDInsight. We have even more exciting releases coming up at Ignite so please stay tuned!
Product updates
Apache Phoenix and Zeppelin integration
You can now query data in Apache Phoenix from Zeppelin.
Apache Phoenix is an open source, massively parallel relational database layer built on HBase. Phoenix allows you to use SQL like queries over HBase. Phoenix uses Java Data Connectivity (JDBC) drivers underneath to enable users to create, delete, alter SQL tables, indexes, views and sequences, and upsert rows individually and in bulk. Phoenix uses NoSQL native compilation rather than using MapReduce to compile queries, enabling the creation of low-latency applications on top of HBase.
Apache Phoenix enables online transaction processing (OLTP) and operational analytics in Hadoop for low latency applications by combining the best of both worlds. In Azure HDInsight Apache Phoenix is delivered as a first-class Open Source framework.
Read More: Azure #HDInsight Apache Phoenix now supports Zeppelin
Oozie support in HDInsight enterprise security package
Oozie is a workflow scheduler system for managing Apache Hadoop jobs. You can now use Oozie in domain-joined Hadoop clusters to build secure Oozie workflows in Azure HDInsight.
Read More: Build secure Oozie workflows in Azure HDInsight with Enterprise Security Package
Azure Data Lake Storage Gen2 integration
Microsoft announced a preview of Azure Data Lake Storage Gen2, a globally available HDFS filesystem to store and analyze petabyte-size files and trillions of objects. HDInsight clusters can work with Azure Data Lake Storage Gen 2.
Read More: HDInsight with Azure Data Lake Storage Gen2
ML Services 9.3 and open-source R capabilities on HDInsight
ML Services on HDInsight provides the latest capabilities for R-based analytics on datasets of virtually any size, loaded to either Azure Blob or Data Lake storage. Since ML Services cluster is built on open-source R, the R-based applications you build can leverage any of the 8,000+ open-source R packages. The routines in ScaleR, Microsoft’s big data analytics package are also available.
Learn More: Introducing ML Services 9.3 in Azure HDInsight
Virtual Network Service Endpoints support
We announced support for Virtual Network Service Endpoints which allows customers to securely connect to Azure Blob Storage, Azure Data Lake Storage Gen2, Cosmos DB and SQL databases. By enabling a Service Endpoint for HDInsight, traffic flows through a secured route from within the Azure data center.
Read More: How to enhance HDInsight security with service endpoints
Kafka 1.0 and 1.1 support
Kafka on HDInsight provides a high-throughput, low-latency ingestion platform for your real-time data pipeline. We announced the support for Kafka 1.0 and 1.1
Read More: Kafka 1.0 on HDInsight lights up real-time analytics scenarios and Kafka 1.1 support on Azure HDInsight
Support for Spark 2.3
We made Apache Spark 2.3.0 for production use on HDInsight. Ranging from bug fixes (more than 1,400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.
Read More: Azure HDInsight now supports Apache Spark 2.3
Move large data sets to Azure using WANDisco on Azure HDInsight
With WANdisco Fusion, you can move data that you have in other large-scale analytics platforms to Azure Blob Storage, ADLS Gen1 and Gen2 without downtime or disruption to your existing environment. Customers can also replicate the data, and metadata (Hive database schema, authorization policies using Apache Ranger, Sentry, and more) across different regions to make the data lake available globally for analytics.
Read more: Globally replicated data lakes with LiveData using WANdisco on Azure
More blogs:
Azure #HDInsight Interactive Query: simplifying big data analytics architecture
How Microsoft drives exabyte analytics on the world’s largest YARN cluster
Top 8 reasons to choose Azure HDInsight
Avoid Big Data pitfalls with Azure HDInsight and these partner solutions
Enterprises get deeper insights with Hadoop and Spark updates on Azure HDInsight
Microsoft deepens its commitment to Apache Hadoop and open source analytics
Siphon: Streaming data ingestion with Apache Kafka
Azure HDInsight Interactive Query: Ten tools to analyze big data faster
How to use DBeaver with Azure #HDInsight
HDInsight HBase: Migrating to new HDInsight version
Customer stories
Chinese vending machine innovator automates shopping with cloud technology
Argentinian company harnesses data to give e-commerce sellers valuable business insights
Data analytics firm signs two major deals after introductions to potential customers
We are excited to see the customer momentum across different industry verticals, and will continue to bring new capabilities to HDInsight to solve new big data challenges.
About HDInsight
Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Azure HDInsight powers mission-critical applications across a wide variety of sectors and address a broad range of use cases including ETL, streaming, and interactive querying.
Additional resources
- Learn more about Azure HDInsight
- Open Source component guide on HDInsight
- HDInsight release notes
- Ask HDInsight questions on Msdn forums
- Ask HDInsight questions on stackoverflow