Today, we are pleased to announce that Azure Data Lake Store is generally available. Since we announced the public preview, Azure Data Lake has become one of the fastest growing Azure service now with thousands of customers. With the GA announcement, we are revealing improvements we’ve made to the service including making it more secure and highly available to make it ready for production deployments.
What is Azure Data Lake?
Today’s Big data solutions have been driving some organizations from “rear-view mirror” thinking to forward-looking and predictive analytics. However, there has been adoption challenges and the widespread usage of big data has not yet occurred. Azure Data Lake was introduced to drive big data adoption by making big data easy for developers, data scientists, and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. It removes the complexities of ingesting and storing all your data while making it faster to get up and running with big data. Azure Data Lake includes three services:
- Azure Data Lake Store, a no limits data lake that powers big data analytics
- Azure Data Lake Analytics, a massively parallel on-demand job service
- Azure HDInsight, a full managed Cloud Hadoop and Spark offering
What is Azure Data Lake Store?
The value of a data lake resides in the ability to develop solutions across data of all types – unstructured, semi-structured and structured. This begins with the Azure Data Lake Store, the first cloud Data Lake for enterprises that is secure, massively scalable and built to the open HDFS standard. With no limits to the size of data and the ability to run massively parallel analytics, you can now unlock value from all your analytics data. For example, data can be ingested in real-time from sensors and devices for IoT solutions, or from online shopping websites into the store.
Petabyte size files and Trillions of objects:
Prior to the Azure Data Lake Store, storing large datasets in the cloud has been a major challenge. Artificial limits placed by object stores make them unsuitable to store large files that can be hundreds of terabytes in size such as high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries. Azure Data Lake Store has revolutionary technology for analyzing and storing massive data sets. A single Azure Data Lake Store account can store trillions of files where a single file can be greater than a petabyte in size which is 200x larger than other cloud stores.
This makes Data Lake Store ideal for storing any type of data including massive datasets like high-resolution video, genomic and seismic datasets, medical data, and data from a wide variety of industries.
Scalable throughput for massively parallel analytics:
Data Lake Store is built for running large analytic systems that require massive throughput to process and analyze petabytes of data. Without redesigning your application or repartitioning your data at higher scale, Data Lake Store scales throughput to support any size of analytic workload. It provides massive throughput to run analytic jobs with thousands of concurrent executors that read and write hundreds of terabytes of data efficiently. You only need only focus on the application logic, and we automatically optimize the store for any throughput level.
HDFS for the Cloud:
Microsoft Azure Data Lake Store supports any application that uses the open Apache Hadoop Distributed File System (HDFS) standard. By supporting HDFS, you can easily migrate your existing Hadoop and Spark data to the cloud without recreating your
- Use with Hadoop clusters
- Use with Data Lake Analytics
- Use with Stream Analytics
- Use with Data Catalog
- Use with Power BI
Always encrypted, Role-based security & Auditing:
Data Lake Store protects your data assets and extends your on-premises security and governance controls to the cloud easily. Data is always encrypted; in motion using SSL, and at rest using service or user managed HSM-backed keys in Azure Key Vault. Capabilities such as single sign-on (SSO), multi-factor authentication and seamless management of millions of identities is built-in through Azure Active Directory. You can authorize users and groups with fine-grained POSIX-based ACLs for all data in the Store enabling role-based access controls. Finally, you can meet security and regulatory compliance needs by auditing every access or configuration change to the system. Finally, we guarantee a 99.9% enterprise-grade SLA and 24/7 support for your big data solution.
- Security overview
- Access control lists
- Secure massive datasets
- Active Directory authentication
- Video: Overview of Security in Azure Data Lake
- Video: Developing with OAuth in Azure Data Lake
- Video: Authorization in Azure Data Lake
How do I get started?
To get started, customers will need to have an Azure subscription or a free trial to Azure. With this in hand, you should be able to get an Azure Data Lake Analytics up and running in seconds by going through this getting started guide.
Also, visit our free Microsoft Virtual Academy course on Data Lake.
- Free course: Microsoft Virtual Academy on Azure Data Lake
- Video: Introduction to Azure Data Lake Store
- What is Data Lake Store
- Create account and upload data
- Self-guided learning
- Copy to and from Azure Blob Storage
- Start with the REST API
- Start with the .NET SDK
- Start with the Java SDK
- Security overview
- Active Directory authentication