Introducing Azure Data Lake

Introducing Azure Data Lake • 2 min read

Posted on April 29, 2015
2 min read

In the industry, the concept of a data lake is relatively new. It’s as an enterprise wide repository of every type of data collected in a single place prior to any formal definition of requirements or schema. This allows every type of data to be kept without discrimination regardless of its size, structure, or how fast it is ingested. Organizations can then use Hadoop or advanced analytics to find patterns of the data. Data lakes can also serve as a repository for lower cost data preparation prior to moving curated data into a data warehouse.

While the potential of the data lake can be profound, it has yet to be fully realized. Limits to storage capacity, hardware acquisition, scalability, performance and cost are all potential reasons why customers haven’t been able to implement a data lake. Today at Build, we announced the Azure Data Lake, Microsoft’s hyperscale repository for big data analytic workloads in the cloud. This offering is built for the cloud, compatible with HDFS, and has unbounded scale with massive throughput and enterprise-grade capabilities.

HDFS for the Cloud: The Azure Data Lake is a Hadoop File System compatible with HDFS enabling Microsoft offerings such as Azure HDInsight, Revolution-R Enterprise, industry Hadoop distributions like Hortonworks and Cloudera all to connect to it.
Petabyte files, massive throughput: The goal of the data lake is to run Hadoop and advanced analytics on all your data to discover conclusions from the data itself. To do this, the data lake must be built to support massively parallel queries so that discoveries can be returned in a timely fashion. Azure Data Lake meets this requirement with no fixed limits to how much data can be stored in a single account. It can also store very large files with no fixed limits to size . Finally, it is built to handle high volumes of small writes at low latency making it optimized for near real-time scenarios like website analytics, Internet of Things (IoT), analytics from sensors, and others.
Enterprise ready: Being “enterprise ready” means that you can run this solution as an important part of your existing data platform. Azure Data Lake does this by leveraging Azure Active Directory as well as providing data replication to ensure high durability and availability.

Microsoft has been on a journey for broad big data adoption with a suite of big data and advanced analytics solutions like Azure HDInsight, Azure Data Factory, Revolution R Enterprise and Azure Machine Learning. We are excited for what Azure Data Lake will bring to this ecosystem, and when our customers can run all of their analysis on Exabyte’s of data. To learn more about this solution or sign up to be notified of public preview, go to https://azure.com/datalake.

Introducing Azure Data Lake

Explore

Related posts

Introducing Azure Storage Actions: Serverless storage data management

Reflecting on 2023—Azure Storage

Democratizing FinOps: Transform your practice with FOCUS and Microsoft Fabric

Efficiently store data with Azure Blob Storage Cold Tier—now generally available

Popular

AI + machine learning

Analytics

Compute

Containers

Databases

DevOps

Developer tools

Hybrid + multicloud

Identity

Integration

Internet of Things

Management and governance

Media

Migration

Mixed reality

Mobile

Networking

Security

Storage

Web

Virtual desktop infrastructure

Use cases

Application development

AI

Cloud migration and modernization

Data and analytics

Hybrid cloud and infrastructure

Internet of Things

Security and governance

Organization type

Resources

Explore

Related posts