• 4 min read

Introducing Azure Data Lake – Microsoft’s expanded vision for making big data easy

As we get ready for AzureCon and to join the thousands of big data enthusiasts at Stata + Hadoop World in NYC this week, we’re excited to share a new and expanded Azure Data Lake that makes big data more accessible.

As we get ready for AzureCon and to join the thousands of big data enthusiasts at Stata + Hadoop World in NYC this week, we’re excited to share a new and expanded Azure Data Lake that makes big data processing and analytics simpler and more accessible.

Previously at the Build conference, we announced the Azure Data Lake Store. The Azure Data Lake Store provides a single repository where you can easily capture data of any size, type and speed without forcing changes to your application as data scales. In the store, data can be securely shared for collaboration and is accessible for processing and analytics from HDFS applications and tools like HDInsight, Hortonworks, Cloudera, or MapR.

Today, our full vision was unveiled adding a new analytics service and our existing Azure HDInsight solution to the Azure Data Lake. Our goal is to make big data technology simpler and more accessible to the greatest number of people possible. Simply put, we want to bring big data to everybody. Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications.


What’s new in this expanded vision?

In addition to the Azure Data Lake Store, we are announcing a new analytics service. Azure Data Lake Analytics lets you focus on the logic of your application, not the distributed infrastructure running it. Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insight. Built on Apache YARN, and designed for the cloud, the analytics service can handle jobs of any scale instantly by simply setting the dial for how much power you need. The analytics service for Azure Data Lake is cost-efficient because you only pay for your job when it is running, and support for Azure Active Directory lets you manage access and roles simply and integrates with your on-premises identity system.


We know that many developers and data scientists struggle to be successful with big data using existing technologies and tools. Azure Data Lake Analytics includes U-SQL, a language that unifies the benefits of SQL with the expressive power of your own code. The U-SQL language is built on the same distributed runtime that powers the big data systems inside Microsoft. Millions of SQL and .NET developers can now process and analyze all of their data with the skills they already have.


The U-SQL support in Azure Data Lake Tools for Visual Studio includes state of the art support for authoring, debugging and advanced performance analysis features for increased productivity when optimizing jobs running across thousands of nodes. Visualizations of your U-SQL code allows you to see how your code runs at scale and identify performance bottlenecks and cost optimizations, making it easier to tune your queries.


Azure HDInsight becomes a key part of Azure Data Lake

Azure Data Lake makes HDInsight, our Apache Hadoop-based service a key part of the Azure Data Lake. As one of the fastest growing services in Azure, HDInsight gives you the breadth of the Hadoop ecosystem in a managed service that’s monitored and supported by Microsoft. Furthering our commitment to productivity, we’ve also updated our Visual Studio Tools for authoring, advanced debugging, and tuning for Hive queries and Storm topologies running in HDInsight.

Today, we are announcing the general availability of HDInsight on Linux. We work closely with Hortonworks and Canonical to provide the HDP™ distribution on the Ubuntu Operating System that powers the Linux version of HDInsight in the Data Lake. This is another strategic step by Microsoft to meet customers where they are and make it easier. It also shows out commitment to openness. This has been underscored by open sourcing the .NET core, our contributions to Apache Hadoop, and support for Docker containers. This approach to openness made it a natural decision to give customers the choice of running their Hadoop workloads on Linux in addition to Windows. Being committed to openness at Microsoft means we will continue to collaborate with others in the industry and being open in how we listen to our customers. We recently announced our support for Apache Spark, one of the popular open source big data projects and contributed the complete Power BI visualization framework on GitHub.


HDInsight on Linux was first announced as a public preview at Strata + Hadoop World, we’ve been amazed at the reception and adoption from both partners and customers on our support of HDInsight on Linux. Many Hadoop ecosystem partners that have integrated their application with Hadoop on-premises in Linux are now building integration to the cloud with HDInsight. This includes applications that provide end-to-end big data analytics like Datameer, technologies that address big data security and governance like Dataguise and BlueTalon, unified stream and batch with DataTorrent, and tools that give business users the ability to visualize and analyze data in compelling ways like AtScale and Zoomdata. Support from our partners ensures that you have the best applications available as you get started with Azure Data Lake.

With general availability of Azure HDInsight on Linux, Microsoft will offer all customers a service level agreement guarantee of 99.9% uptime and full technical support for the entire stack, ranging from Hadoop to the underlying Linux operating system.

We are excited about the Azure Data Lake and making big data and analytics simpler and more accessible. Join us at AzureCon and Stata + Hadoop World in NYC this week to learn more.

And, here’s a video on how we make big data easy in Azure:


Additional Information: