Today, we are pleased to announce that Azure Data Lake Analytics is generally available. Since we announced the public preview, Azure Data Lake has become one of the fastest growing Azure service now with thousands of customers. With the GA announcement, we are revealing improvements we’ve made to the service including making it more productive for end users and security and availability improvements to make it ready for production deployments.
What is Azure Data Lake?
Today’s Big data solutions have been driving some organizations from “rear-view mirror” thinking to forward-looking and predictive analytics. However, there has been adoption challenges and the widespread usage of big data has not yet occurred. Azure Data Lake was introduced to drive big data adoption by making big data easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all your data while making it faster to get up and running with big data. Azure Data Lake includes three services:
- Azure Data Lake Store, a no limits data lake that powers big data analytics
- Azure Data Lake Analytics, a massively parallel on-demand job service
- Azure HDInsight, a full managed Cloud Hadoop and Spark offering
What is Azure Data Lake Analytics?
Azure Data Lake Analytics service is a new distributed analytics job service that dynamically scales so you can focus on your business goals, not on distributed infrastructure. Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insights. The analytics service can handle jobs of any scale instantly by simply setting the dial for how much power you need. You only pay for your job when it is running making it cost-effective.
Azure Data Lake Analytics also provides a unified big data developer platform that integrates language, runtime, tooling, development environments, resource management, extensibility security that makes developers and ISVs far more productive. It supports the entire end-to-end big data development lifecycle from authoring, to debugging, monitoring, and optimization.
Start in seconds, Scale instantly, Pay per job:
Our on-demand service will have you processing Big Data jobs within 30 seconds. There is no infrastructure to worry about because there are no servers, VMs, or clusters to wait for, manage or tune. You can instantly scale the analytic units (processing power) from one to thousands for each job with literally a single slider. You only pay for the processing used per job. This model dramatically simplifies the lives of developers who want to start working with big data.
- Tutorial: get started with Azure Data Lake Analytics using Azure portal
- Demo: Getting Started with Azure Data Lake
Develop massively parallel programs with simplicity:
U-SQL is a simple, expressive, and extensible language that allows you to write code once and automatically have it be parallelized for the scale you need. U-SQL blends the declarative nature of SQL with the expressive power of C#. In other declarative SQL-based languages used for big data, their extensibility model is “bolted-on” and much harder to use. U-SQL allows developers to easily define and utilize user-defined types and user-defined functions defined in any .NET language.
Big data developers need to accommodate any type of data: images, audio, video, documents. However, to handle those kinds of data, there are many existing libraries that are not all readily accessible to big data languages. U-SQL can seamlessly reuse any .NET library either one that is locally developed or published in repositories such as NuGet to handle any type of data. Developers can also use code written in R or in Python in their U-SQL scripts. After the code is written, you can deploy it as a massively parallel program letting you easily scale out diverse workload categories such as ETL, machine learning, cognitive science, machine translation, imaging processing, and sentiment analysis by using U-SQL and leveraging existing libraries.
- Tutorial: Get started with Azure Data Lake Analytics U-SQL language
- Develop U-SQL User defined operators for Azure Data Lake Analytics jobs
- U-SQL Language Reference
- Video: Introducing U-SQL – A new language for Massive Data Processing
- Video: U-SQL Query Execution
- Video: U-SQL Extensibility
Debug and Optimize your Big Data programs with ease:
With the tools that exist today, developers face serious challenges as their data workloads increase. Understanding bottlenecks in performance and scale is challenging and requires experts in distributed computing and infrastructure. For example, developers must carefully account for the time & cost of data movement across a cluster and rewrite their queries or repartition their data to improve performance. Optimizing code and debugging failures in cloud distributed programs are now as easy as debugging a program in your personal environment. Our execution environment actively analyzes your programs as they run and offers recommendations to improve performance and reduce cost. For example, if you requested 1000 AUs for your program and only 50 AUs were needed, the system would recommend that you only use 50 AUs resulting in a 20x cost savings.
Today, we are also announcing the availability of this big data productivity environment in Visual Studio Code allowing users to have this type of productivity in a free cross-platform code editor that is available on Windows, Mac OS X, and Linux.
- Tutorial: develop U-SQL scripts using Data Lake Tools for Visual Studio
- Video: Data Lake Developer Tools
- Video: Getting Started with Debugging U-SQL
Virtualize your analytics:
The power to act on all your data with optimized data virtualization of your relational sources such as Azure SQL Database, and Azure SQL Data Warehouse. Queries are automatically optimized by moving processing close to the source data, without data movement, thereby maximizing performance and minimizing latency.
Enterprise-grade Security, Auditing and Support:
Extend your on-premises security and governance controls to the cloud for meeting your security and regulatory compliance needs. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. Role Based Access control, and the ability to audit all processing and management operations are on by default. We guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
How do I get started?
To get started, customers will need to have an Azure subscription or a free trial to Azure. With this in hand, you should be able to get an Azure Data Lake Analytics up and running in seconds by going through this getting started guide. Also, visit our free Microsoft Virtual Academy course on Data Lake.
- Free course: Microsoft Virtual Academy on Azure Data Lake
- Overview of Azure Data Lake Analytics
- Get started using Microsoft Azure Portal
- Get started using Azure PowerShell
- Get started using .NET SDK
- Develop U-SQL Scripts using Data Lake Tools for Visual Studio
- Use Data Lake Analytics interactive tutorial
- Analyze weblogs using Data Lake Analytics
- Get started with U-SQL
- U-SQL reference
- .NET SDK reference