Run Hortonworks clusters and easily access Azure Data Lake

Publisert på 5 september, 2017

Principal Program Manager

Enterprise customers love Hortonworks for running Apache Hive, Apache Spark and other Apache Hadoop workloads. They also love the value that Azure Data Lake Store (ADLS) provides, like high throughput access to cloud data of any size, sharing easily and securely with its true hierarchical file system, Posix ACLs, along with Role-based Access Control (RBAC), and encryption-at-rest.

Azure HDInsight managed workloads – which offers built-in integration with and access to ADLS – vastly simplifies the management of enterprise clusters for many enterprises. Customers have a choice, and some Hortonworks customers choose to customize and manage their own clusters deployed directly on Azure cloud infrastructure, and those deployments need direct access ADLS.

With the recent announcement of Hortonworks Data Platform (HDP®) 2.6.1 with Azure Data Lake Store support, now customers can do just that. Customers can deploy HDP clusters and easily access and interoperate HDP with ADLS data. With HDP 2.6.1 and its access to ADLS, we bring another way for our customers to realize the business value of their data. Here’s how some customers are enriching key scenarios:

  1. One or more Hortonworks clusters can access data in the same Azure Data Lake.
  2. On-premises clusters can directly access data in ADLS facilitating access to data in the cloud using standard Hadoop utilities, like DistCp.

In addition to using HDP directly, Hortonworks is also making Cloudbreak for Hortonworks Data Platform available via the Azure Marketplace. Cloudbreak for Hortonworks Data Platform simplifies the provisioning, management, and monitoring of HDP clusters in the cloud environments. This is a great way to get started trying HDP and HDP + ADLS.

Ready to get started with HDP and ADLS?

You can start to deploy HDP clusters with ADLS today, simply from the Azure Marketplace using Cloudbreak for Hortonworks Data Platform. After you set up your ADLS account, follow the instructions to launch Cloudbreak on Azure and create a cluster, adding your ADLS as a file system. Once the cluster is deployed, you can give the cluster data lake access to all, or part of, your Azure data lake. Then try it out using some simple commands. 

Alternatively, visit documentation for information on how to custom deploy your Hortonworks cluster with ADLS. See also this recent Azure blog, titled "Hortonworks extends IaaS offering on Azure with Cloudbreak," to find out more information about Cloudbreak.

hortonworks-azure-data-lake-store