General availability of HDInsight Interactive Query – blazing fast queries on hyper-scale data

Publié le 27 septembre, 2017

Principal Program Manager, Azure HDInsight

It’s 2017, and big data challenges are as real as they get. Our customers have petabytes of data living in elastic and scalable commodity storage systems such as Azure Data Lake Store and Azure Blob storage.

One of the central questions today is finding insights from data in these storage systems in an interactive manner, at a fraction of the cost. 

Interactive Query leverages [Hive on LLAP] in Apache Hive 2.1, brings the interactivity to your complex data warehouse style queries on large datasets stored on commodity cloud storage.

Today, we announce the general availability of the Interactive Query cluster type in Azure HDInsight (formerly known as Interactive Hive). With this offering, we are bringing the following benefits to our customers:

Fast Data warehouse style SQL queries on petabyte-scale data

Intelligent caching and optimizations in Interactive Query produces blazing-fast query results on remote Cloud storage, such as Azure Blob and Azure Data Lake Store.

Interactive Query enables data analysts to query data interactively in the same storage where data is prepared, eliminating the need for moving data from storage to another analytical engine for data warehousing needs. With zero data migration, you gain faster insights, operational resiliency, and reduced efforts, as well as simplified architecture.

Modern scalable query concurrency architecture

With the introduction of much improved fine-grain resource management and preemption, Interactive Query [Hive on LLAP] makes it better for concurrent users. In addition, HDInsight supports creating multiple clusters on shared Azure storage, and Hive metastore helps in achieving a high degree of concurrency

Rich connectivity with the most popular authoring tools

Interactive Query enables end-users to consume data from rich business intelligence tools, such as PowerBI, Tableau, Excel, Hive View 2.0, Beeline, Hive CLI, and Visual Studio, as well as built-in Zeppelin notebook.

Today, we are happy to announce the preview of Interactive Query tools for Visual Studio code. Rich connectivity options eliminate user learning curves so that they are more productive sooner.

Leverage your existing investments in HDInsight by sharing the data and Hive metastore

If you already run your Batch and ETL workloads in HDInsight, leveraging Interactive Query cluster for fast querying is straightforward. Customers can attach an Interactive Query cluster to existing metastore and data storage, and start querying the data right away.

Achieve low latency with SSD caching without the cost of SSDs

Interactive Query SSD Cache enables you to combine RAM and SSD into a giant pool of memory with all of the other benefits the LLAP cache brings. By using the LLAP SSD cache, a typical daemon can cache four times more data, letting you process larger datasets or support more users. In HDInsight, cluster nodes have built-in SSD at no extra cost.

Say no to data format conversion in order to get faster results

Fast analytics on Hadoop have always come with one big catch: they require up-front conversion to a columnar format like ORCFile, Parquet or Avro, which is time-consuming, complex and limits your agility. With Interactive Query Dynamic Text Cache, which converts CSV or JSON data into optimized in-memory format on-the-fly, caching is dynamic, so the queries determine what data is cached. After text data is cached, analytics run just as fast as if you had converted it to specific file formats.

Enterprise Grade Security and Monitoring (preview)

Interactive Query is built on top of highly secure Azure & HDInsight Platform. With features such as, Domain-joined HDInsight clusters, you can create an interactive query cluster joined to an Active Directory domain, and configure a list of employees from the enterprise who can authenticate through Azure Active Directory to log on to HDInsight cluster.

You can monitor Interactive Query clusters with built-in tools such as Grafana and Ambari, as well as the integration we have built with Azure Log Analytics to monitor all of your resources with a single pane of glass.

Additional resources

Summary

This week at Ignite, we are pleased to announce general availability of Azure HDInsight Interactive Query. Backed by our enterprise-grade SLA, HDInsight Interactive Query brings sub-second speed to data warehouse style SQL queries to the hyper-scale data stored in commodity cloud storage.