Bring Interactive Analytics to Azure HDInsight: Kyligence Analytics Platform enables sub-second query

Publicado el 7 agosto, 2017

Program Manager II, OSS and Analytics

In resource-intensive systems, queries will compete for runtime resources and it takes hours to return when the work load is high. SQL on Hadoop is improving continuously, but it is still common to wait minutes or even a couple hours for one single query to return, especially when the dataset is huge. Most of these systems are resource-intensive where queries compete for runtime resources and performance declines when the workload is high.

To solve this problem, Kyligence Analytics Platform (KAP) enables interactive analytics with sub-second query latency on massive dataset. KAP is a leading big data intelligence platform powered by Apache Kylin. It enables interactive analytics with sub-second query latency, even on massive data-set, and is widely adopted by enterprises such as Lenovo, China Mobile, and many more. We are happy to announce that the Kyligence team and Azure HDInsight team have worked closely with each other to bring OLAP capabilities to HDInsight, and KAP is now available on Azure HDInsight as an HDInsight application.

HDInsight Application Platform

Azure HDInsight is the only fully-managed cloud Hadoop offering that provides optimized open source analytical clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and R Server backed by a 99.9% SLA. Each of these big data technologies and ISV applications are easily deployable as managed clusters with enterprise-level security and monitoring.

The open source ecosystem of applications has grown with the goal of making it easier for customers to build their big data and analytical solutions. Today, customers find it challenging to discover these productivity applications, and struggle to install and configure the apps. To address this gap, HDInsight Application Platform provides a unique experience to Microsoft where ISV’s can directly offer their applications to customers, and customers can easily discover, install, and use ISV applications built for the big data ecosystem.

As part of this integration, KAP can be easily deployed by one-click on HDInsight.

Interactive Analytics with Trillions of Data on HDInsight

Hadoop is designed for large scale data processing, but is not efficient enough for interactive analytics. KAP provides interactive analytics ability on HDInsight by providing the following integration with HDInsight:

  • Native SQL support on Hadoop and HDInsight: Many existing big data analytics technologies have their own query language or proprietary storage engine optimized for analytics scenarios. It is difficult for analysts to learn a new query language or move data out of HDFS/BLOB storage to other platforms. With KAP's native SQL support and ODBC drivers, customers can use the standard SQL interface and choose their favorite BI tools on their large amount of data.
  • Sub-second query response: The query performance is the bottleneck for most big data use cases. The performance will decline if the cluster resource cannot scale out when the original data grows 10x. To make the sub-second query response consistent is the key for interactive analytics and KAP on HDInsight solves this problem by providing pre-calculated Cubes.
  • Elastic architecture: The dataset normally ranges from gigabytes, terabytes, and more. Hadoop provides the elastic infrastructure for batch processing, and KAP as an interactive analytics technology, also leverages the elastic capability of Hadoop to enable the scale-out solutions.
  • Native Integration with HDInsight: Cloud is an effortless way to adopt new technology without worrying about deployment or monitoring. With KAP + HDInsight as a full-managed cloud solution, it can help users reduce operation cost as well as achieve high availability. KAP can work with all the supported Azure storage services (Azure BLOB storage and Azure Data Lake Store), and can also work with HDInsight Kafka clusters to ingest data from Kafka.

KAP - Enterprise-ready data warehouse powered by Apache Kylin

KAP, an enterprise OLAP on Hadoop powered by Apache Kylin, enables sub-second SQL query latency on petabyte scale dataset, provides high concurrency at internet scale, and empowers analysts to architect BI on Hadoop with industry-standard data warehouse and business intelligence methodology. KAP is a unified analytics platform simplified Big Data Analytics for business users, analysts, and engineers with self-service, seamless integrated with BI tools and no programming required. KAP is a native on Hadoop OLAP solution which interacts with cluster only via standard APIs and supports main Hadoop distributions from on-prem environment to in the Cloud.

On Azure, most data are stored in Azure BLOB storage or Azure Data Lake Store, and then are loaded into Hive as external tables. KAP builds the cube (index) by using MapReduce/Spark according to the data model designed by the modeler before analysis. During query runtime, all queries can access the pre-aggregated cube data and the result will be returned in sub-second. By leveraging the unique pre-calculation technology, KAP provides consistent query latency regardless of how much data grows, even with limited resources. KAP also provides native integration with various Azure storage services, such as Azure BLOB storage and Azure Data Lake Store. It can also connect with HDInsight Kafka clusters to ingest data from Kafka.

The screenshot below shows the KAP modeling GUI:

clip_image002

 

Compared to Hive query, KAP is 100x faster without modifying the queries into HiveQL dialect. ANSI SQL and JDBC/ODBC drivers are also supported, so users can choose their familiar BI tools to do interactive analytics, for example PowerBI or Tableau. Below is the performance comparison between Apache Kylin and Apache Hive on SSB dataset:

Installing KAP on Azure HDInsight

With the KAP on Azure HDInsight solution, user can install KAP on their exiting HDInsight cluster or standalone optimized cluster designed for KAP with a single click. Currently, KAP works as an application on HDInsight HBase cluster.

clip_image004

After the one-click installation, you will get the following components:

  • KAP: The enterprise version of Apache Kylin, which provides the core OLAP analysis on HDInsight by building pre-calculated cubes.
  • KyAnalyzer: The built-in OLAP agile BI tool for quick BI analysis by connecting to KAP.

KAP will be installed on the Edge Node in the HBase cluster. To learn more details on how to use KAP on HDInsight, please check the Kyligence blog post.

Summary

KAP on Azure HDInsight brings quick insight into massive dataset in sub-second latency and empowers interactive analytics on Hadoop for trillion level records. It offers web-scale OLAP solutions for various industries to build their online and offline analytics platforms. With the cloud based technologies, computing resources can extend and shrink when processing burst data, with a more efficient deployment model, thus helping customers reduce cost and improve productivity.

For more resources to get started, please check the "more resources" section below. If you have any feedbacks or questions, feel free to drop us an email at hdiask@microsoft.com. We love to hear from you!

More resources