Questions? Feedback? powered by Olark live chat software
Skip Navigation

Data Lake

Batch, real-time, and interactive analytics made easy

Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications. We’ve drawn on the experience of working with enterprise customers and running some of the largest scale processing and analytics in the world for Microsoft businesses like Office 365, Xbox Live, Azure, Windows, Bing and Skype. Azure Data Lake solves many of the productivity and scalability challenges that prevent you from maximizing the value of your data assets with a service that’s ready to meet your current and future business needs.

Watch a quick video for an overview

Introducing the preview of our new distributed analytics service

The Data Lake analytics service is a new distributed analytics service built on Apache YARN that dynamically scales so you can focus on your business goals, not on distributed infrastructure. Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insights. The analytics service can handle jobs of any scale instantly by simply setting the dial for how much power you need. You only pay for your job when it is running making it cost-effective. The analytics service supports Azure Active Directory letting you simply manage access and roles, integrated with your on-premises identity system. It also includes U-SQL, a language that unifies the benefits of SQL with the expressive power of user code. U-SQL’s scalable distributed runtime enables you to efficiently analyze data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse. Learn more.

HDInsight-managed Apache Hadoop®, Spark, HBase, and Storm clusters

The Azure Data Lake offers fully managed and supported 100% Apache Hadoop®, Spark, HBase, and Storm clusters. You can get up and running quickly on any of these workloads with a few clicks and within a few minutes without buying hardware or hiring specialized operations teams typically associated with big data infrastructure. You have the choice of running Linux or Windows with the Hortonworks Hadoop Data Platform, making it easy to move code and projects to the cloud. Finally, the rich ecosystem of Apache Hadoop-based applications provides security, governance, data preparation, and advanced analytics, letting you get the most out of your data faster. With Data Lake, Hadoop is made easy. Learn more.

Introducing the preview of our store

The Data Lake store provides a single repository where you can capture data of any size type and speed simply without forcing changes to your application as the data scales. In the store, data can be shared for collaboration with enterprise-grade security. It is also designed for high-performance processing and analytics from HDFS applications and tools, including support for low latency workloads. For example, data can be ingested in real-time from sensors and devices for IoT solutions, or from online shopping websites into the store without the restriction of fixed limits on account or file size unlike current offerings in the market. Learn more.

Use the skills you have to develop faster and optimize your code smarter

Finding the right tools to design and tune your big data queries can be difficult. Data Lake makes it easy through deep integration with Visual Studio, so that you can use familiar tools to run, debug, and tune your code. Visualizations of your U-SQL, Apache Hive, and Apache Storm jobs let you see how your code runs at scale and identify performance bottlenecks and cost optimizations, making it easier to tune your queries. Data engineers, DBAs, and data architects can use existing skills, like SQL, Apache Hadoop, Apache Spark, and .NET, to become productive on day one. Data scientists and analysts can use a rich notebook experience powered by Apache Spark or your preferred visualization tool, such as Power BI, Tableau, or Qlik, to do interactive analytics over all of your data.

Integrates seamlessly with your existing IT investments

One of the top challenges of big data is integration with existing IT investments. We address this by making sure that Data Lake can use your existing IT investments for identity, management, security, and data warehousing, simplifying data governance and making it easy to extend your data applications. Out of the box, Data Lake is integrated with Active Directory for user management and permissions and includes web-based and programmatic tools for management and monitoring. Data Lake is also a key part of the Cortana Analytics Suite, meaning that it works with Azure SQL Data Warehouse, Power BI, and Data Factory for a complete cloud big data and advanced analytics platform that helps you with everything from data preparation to doing interactive analytics on large-scale datasets. Finally, because Data Lake is in Azure, you can connect to any data generated by applications or ingested by devices in Internet of Things (IoT) scenarios.

Store and analyze data of any size

Data Lake was architected from the ground up for cloud scale and performance. With a few clicks, you can provision any amount of resources to do analytics on terabytes or even exabytes of data. The store in Data Lake has no fixed limits on account or file size and provides massive throughput to increase analytic performance. This means that you don’t have to rewrite code as you increase or decrease the size of the data stored or the amount of compute being spun up. This lets you focus on your business logic only and not on how you process and store large datasets. Data Lake also takes away the complexities normally associated with big data in the cloud, ensuring that it can meet your current and future business needs. Inside Microsoft, the same technology is being used by more than 10,000 developers to run analytics on exabytes of data for Microsoft Bing, Office, and Xbox Live.

Affordable and cost effective

Data Lake is a cost-effective solution to run big data workloads. You can choose between on-demand clusters or a pay-per-job model when data is processed. In both cases no hardware, licenses, or service specific support agreements are required. The system scales up or down with your business needs, meaning that you never pay for more than you need. It also lets you independently scale storage and compute, enabling more economic flexibility than traditional big data solutions. Finally, it minimizes the need to hire specialized operations teams typically associated with running a big data infrastructure. Data Lake minimizes your costs while maximizing the return on your data investment.

Enterprise grade

Data Lake is fully managed and supported by Microsoft, backed by an enterprise-grade SLA and support at general availability. With 24/7 customer support, you can contact us to address any challenges that you face with your entire big data solution. Our team monitors your deployment so that you don’t have to, guaranteeing that it will run continuously. This ensures that you’re ready to meet the demands of your mission-critical deployments.

Build Data Lake solutions using these powerful solutions

Apache Hadoop® and associated open source project names are trademarks of the Apache Software Foundation.