Yesterday, we unveiled our full vision of Microsoft’s Azure Data Lake. Our goal is to make big data technology simpler and more accessible to the greatest number of people possible. Simply put, we want to bring big data to everybody.
As part of this vision, we announced that Azure Data Lake makes HDInsight, our Apache Hadoop-based service a key part of Microsoft’s data lake solution. We also announced the general availability of HDInsight on Linux. This is a strategic step by Microsoft to meet customers where they are and our commitment to openness. This has been underscored by open sourcing the .NET core, our contributions to Apache Hadoop, and support for Docker containers. This approach to openness made it a natural decision to give customers the choice of running their Hadoop workloads on Linux in addition to Windows. Being committed to openness at Microsoft means we will continue to collaborate with others in the industry and being open in how we listen to our customers. We recently announced our support for Apache Spark, one of the popular open source big data projects, our contributions to YARN as part of the Azure Data Lake, and contributed the complete Power BI visualization framework on GitHub.
Since we first offered Azure HDInsight on Ubuntu Linux, we’ve seen adoption accelerate from customers and partners. Many Hadoop ecosystem partners with applications that run on-premises in Linux are now offering solutions in Azure with HDInsight. This includes applications that provide end-to-end big data analytics like Datameer, technologies that address big data security and governance like Dataguise and BlueTalon, unified stream and batch with DataTorrent, and tools that give business users the ability to visualize and analyze data in compelling ways like AtScale and Zoomdata. Support from our partners ensures that you have the best applications available as you get started with Azure Data Lake.
As ISV’s like these grow their business on Azure with their Hadoop-based solutions, we are also creating more choice for our customers. Investments like these enable our customers to optimize their business with data which leads to transformative changes, faster insights and increased value to their stakeholders.
Getting Started With Azure HDInsight on Linux
Customers logging into the Azure Portal will now have a choice to select either Windows or Linux when deploying HDInsight. Both options are first class citizens, offering simple deployment, 99.9% SLA, technical support for the entire stack, ranging from Hadoop to the operating system.
For customers with prior experience deploying Hadoop in Linux on-premise, they will be able to leverage their existing skillsets and tooling (documentation, samples, and templates) to work with HDInsight. Familiar tools like SSH and Ambari are now available to use. Ambari provides a single view of the performance and state of your Hadoop cluster, as well as provide the ability to customize configuration settings. It also provides monitoring and alerting within the HDInsight cluster.
Additionally, Azure HDInsight on Linux includes new features that wasn’t available at Public Preview such as, cluster scaling, virtual network integration and script action support. You can also create HBase and Storm clusters on Linux for your NoSQL and real time processing needs e.g. building an IoT application.
Script action support is similar to HDInsight on Windows that lets you customize your Linux cluster by installing additional applications or Hadoop components that are not part of default HDInsight deployment. This can be accomplished using Bash scripts with script action capability. For example, you can now install Hue on an HDInsight Linux cluster using script action.
And, here’s a video on Azure HDInsight on Linux General Availability:
For more information about Azure HDInsight on Linux:
- Watch the Channel 9 Video on Data Exposed
- Head over to T.K. “Ranga” Rengarajan’s announcement blog to get more details
- Read the Hortonworks blog about Hortonworks HDP powering Azure HDInsight on Linux
- Go to the Canonical blog about Ubuntu powering Azure HDInsight on Linux
How to Information:
- Get started using Hadoop with Hive for HDInsight on Linux
- Develop Python streaming programs for HDInsight
- Provision Hadoop Linux clusters in HDInsight using custom options
- Manage HDInsight clusters by using Ambari
- Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X
- Information about using HDInsight on Linux
- Overall HDInsight documentation page