• 3 min read

Introducing H2O.ai on Azure HDInsight

We are excited to announce that H2O’s AI platform is now available on Azure HDInsight Application Platform. Customers can now use H2O.ai’s Sparkling Water solution on a HDInsight cluster along with Azure’s collection of cloud services. Together, the H2O team and Azure HDInsight team will integrate technologies to deliver enterprises the best business solutions and to build relationships within the open analytics community.

We are excited to announce that H2O’s AI platform is now available on Azure HDInsight Application Platform. Users can now use H2O.ai’s open source solutions on Azure HDInsight, which allows reliable open source analytics with an industry-leading SLA.

To learn more about H2O integration with HDInsight, register for the webinar held by H2O and Azure HDInsight team.

HDInsight and H2O to make data science on big data easier

Azure HDInsight is the only fully-managed cloud Hadoop offering that provides optimized open source analytical clusters for Spark, Hive, MapReduce, HBase, Storm, Kafka, and R Server backed by a 99.9% SLA. Each of these big data technologies and ISV applications, such as H2O, are easily deployable as managed clusters with enterprise-level security and monitoring.

The ecosystem of data science has grown rapidly in the last a few years, and H2O’s AI platform provides open source machine learning framework that works with Spark sparklyr and PySpark. H2O’s Sparkling Water allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark. With Sparkling Water, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers.

Setting up an environment to perform advanced analytics on top of big data is hard, but with H2O Artificial Intelligence for HDInsight, customers can get started with just a few clicks. This solution will install Sparkling Water on an HDInsight Spark cluster so you can exploit all the benefits from both Spark and H2O. The solution can access data from Azure Blob storage and/or Azure Data Lake Store in addition to all the standard data sources that H2O support. It also provides Jupyter Notebooks with in-built examples for an easy jumpstart, and a user-friendly H2O FLOW UI to monitor and debug the applications.

Getting started

With the industry leading Azure cloud platform, getting started with H2O on HDInsight is super easy with just a few clicks. Customer can install H2O during the creation of a new HDInsight cluster by simply selecting the customer applications when creating a cluster, selecting “H2O Artificial Intelligence for HDInsight”, and agreeing to the license terms.


Customers can also deploy H2O when on an existing HDInsight Spark cluster by clicking the “Application” link:


Sparkling Water integrates H2O's fast scalable machine learning engine with Spark. It provides utilities to publish Spark data structures (RDDs, DataFrames) as H2O's frames and vice versa. Python interface enabling use of Sparkling Water directly from pySpark and many others. The architecture for H2O on HDInsight is as below:


After installing H2O on HDInsight, you can simply use Jupyter notebooks, which is built-in to Spark clusters, to write your first H2O on HDInsight applications. You can simply open the Jupyter Notebook, and will see a folder named “H2O-PySparkling-Examples”, which has a few getting started examples.


H2O Flow is an interactive web-based computational user interface where you can combine code execution, text, mathematics, plots, and rich media into a single document. It provides richer visualization experience for the machine learning models, and provides native support for hyper-parameter tuning, ROC Curve, etc.

H2O Flow

Together with this combined offering of H2O on HDInsight, customers can easily build data science solutions and run them at enterprise grade and scale. Azure HDInsight provides the tools for a user to create a Data Science environment with underlying big data frameworks like Hadoop and Spark, while H2O’s technology brings a set of sophisticated, fully distributed algorithms to rapidly build and deploy highly accurate models at scale.

H2O.ai is now available on the Microsoft Azure marketplace and in HDInsight application. For more technical details, please refer to H2O documentation and this technical blog post on HDInsight blog.



We are pleased to announce the expansion of HDInsight Application Platform to include H2O.ai. By deploying H2O on HDInsight, customers can easily build analytical solutions and run them at enterprise grade and scale.