Questions? Feedback? powered by Olark live chat software
Skip navigation

Data Lake

Batch, real-time and interactive analytics made easy

Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all your data while making it faster to get up and running with batch, streaming and interactive analytics. Azure Data Lake works with existing IT investments for identity, management and security for simplified data management and governance. It also integrates seamlessly with operational stores and data warehouses so that you can extend current data applications. We’ve drawn on the experience of working with enterprise customers and running some of the largest-scale processing and analytics in the world for Microsoft businesses such as Office 365, Xbox Live, Azure, Windows, Bing and Skype. Azure Data Lake solves many of the productivity and scalability challenges that prevent you from maximising the value of your data assets with a service that’s ready to meet your current and future business needs.

Watch a quick video for an overview

Introducing the preview of our new distributed analytics service

The Data Lake analytics service is a new distributed analytics service built on Apache YARN that dynamically scales so that you can focus on your business goals, not on distributed infrastructure. Instead of deploying, configuring and tuning hardware, you write queries to transform your data and extract valuable insights. The analytics service can handle jobs of any scale instantly simply by setting the dial for how much power you need. You only pay for your job when it is running, making it cost-effective. The analytics service supports Azure Active Directory, letting you manage access and roles easily, integrated with your on-premises identity system. It also includes U-SQL, a language that unifies the benefits of SQL with the expressive power of user code. U-SQL’s scalable distributed runtime enables you to efficiently analyse data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse. Learn more.

HDInsight-managed Apache Hadoop®, Spark, HBase and Storm clusters

The Azure Data Lake offers fully managed and supported 100% Apache Hadoop®, Spark, HBase and Storm clusters. You can be up and running quickly on any of these workloads with a few clicks and within a few minutes without buying hardware or hiring specialised operations teams typically associated with big data infrastructure. You have the choice of running Linux or Windows with the Hortonworks Hadoop Data Platform, making it easy to move code and projects to the cloud. Finally, the rich ecosystem of Apache Hadoop-based applications provides security, governance, data preparation and advanced analytics, enabling you to get the most out of your data faster. With Data Lake, Hadoop is made easy. Learn more.

Introducing the preview of our store

The Data Lake store provides a single repository where you can capture data of any size, type and speed simply without forcing changes to your application as the data is scaled. In the store, data can be shared for collaboration with enterprise-grade security. It is also designed for high-performance processing and analytics from HDFS applications and tools, including support for low-latency workloads. For example, data can be ingested in real time from sensors and devices for IoT solutions, or from online shopping websites into the store without the restriction of fixed limits on account or file size, unlike current offerings in the market. Learn more.

Use the skills you have to develop faster and optimise your code smarter

Finding the right tools to design and tune your big data queries can be difficult. Data Lake makes it easy through deep integration with Visual Studio, so that you can use familiar tools to run, debug and tune your code. Visualisations of your U-SQL, Apache Hive and Apache Storm jobs let you see how your code runs at scale and identify performance bottlenecks and cost optimisations, making it easier to tune your queries. Data engineers, DBAs and data architects can use existing skills, such as SQL, Apache Hadoop, Apache Spark and .NET, to become productive on day one. Data scientists and analysts can use a rich notebook experience powered by Apache Spark or your preferred visualisation tool, such as Power BI, Tableau or Qlik, to do interactive analytics over all of your data.

Integrates seamlessly with your existing IT investments

One of the top challenges of big data is integration with existing IT investments. We address this by making sure that Data Lake can use your existing IT investments for identity, management, security and data warehousing, simplifying data governance and making it easy to extend your data applications. Out of the box, Data Lake is integrated with Active Directory for user management and permissions and includes web-based and programmatic tools for management and monitoring. Data Lake is also a key part of the Cortana Analytics Suite, meaning that it works with Azure SQL Data Warehouse, Power BI and Data Factory for a complete cloud big data and advanced analytics platform that helps you with everything from data preparation to doing interactive analytics on large-scale datasets. Finally, because Data Lake is in Azure, you can connect to any data generated by applications or ingested by devices in Internet of Things (IoT) scenarios.

Store and analyse data of any size

Data Lake was architected from the ground up for cloud scale and performance. With a few clicks, you can provision any amount of resources to do analytics on terabytes or even exabytes of data. The store in Data Lake has no fixed limits on account or file size and provides massive throughput to increase analytic performance. This means that you don’t have to rewrite code as you increase or decrease the size of the data stored or the amount of compute being spun up. This lets you focus on your business logic only and not on how you process and store large datasets. Data Lake also takes away the complexities normally associated with big data in the cloud, ensuring that it can meet your current and future business needs. Inside Microsoft, the same technology is being used by more than 10,000 developers to run analytics on exabytes of data for Microsoft Bing, Office and Xbox Live.

Affordable and cost-effective

Data Lake is a cost-effective solution to run big data workloads. You can choose between on-demand clusters or a pay-per-job model when data is processed. In both cases, no hardware, licences or service-specific support agreements are required. The system scales up or down with your business needs, meaning that you never pay for more than you need. It also lets you independently scale storage and compute, enabling more economic flexibility than traditional big data solutions. Finally, it minimises the need to hire specialised operations teams typically associated with running a big data infrastructure. Data Lake minimises your costs while maximising the return on your data investment.

Enterprise grade

Data Lake is fully managed and supported by Microsoft, backed by an enterprise-grade SLA and support at general availability. With 24/7 customer support, you can contact us to address any challenges that you’re facing with your entire big data solution. Our team monitors your deployment so that you don’t have to, guaranteeing that it will run continuously. This ensures that you’re ready to meet the demands of your mission-critical deployments.

Build Data Lake solutions using these powerful solutions

Apache Hadoop® and associated open-source project names are trademarks of the Apache Software Foundation.