Silo busting 2.0—Multi-protocol access for Azure Data Lake Storage

Publicado em 18 julho, 2019

Senior Program Manager, Azure Storage

Cloud data lakes solve a foundational problem for big data analytics—providing secure, scalable storage for data that traditionally lives in separate data silos. Data lakes were designed from the start to break down data barriers and jump start big data analytics efforts. However, a final “silo busting” frontier remained, enabling multiple data access methods for all data—structured, semi-structured, and unstructured—that lives in the data lake.

Providing multiple data access points to shared data sets allow tools and data applications to interact with the data in their most natural way. Additionally, this allows your data lake to benefit from the tools and frameworks built for a wide variety of ecosystems. For example, you may ingest your data via an object storage API, process the data using the Hadoop Distributed File System (HDFS) API, and then ingest the transformed data using an object storage API into a data warehouse.

Single storage solution for every scenario

We are very excited to announce the preview of multi-protocol access for Azure Data Lake Storage! Azure Data Lake Storage is a unique cloud storage solution for analytics that offers multi-protocol access to the same data. Multi-protocol access to the same data, via Azure Blob storage API and Azure Data Lake Storage API, allows you to leverage existing object storage capabilities on Data Lake Storage accounts, which are hierarchical namespace-enabled storage accounts built on top of Blob storage. This gives you the flexibility to put all your different types of data in your cloud data lake knowing that you can make the best use of your data as your use case evolves.

image

Single storage solution

Expanded feature set, ecosystem, and applications

Existing blob features such as access tiers and lifecycle management policies are now unlocked for your Data Lake Storage accounts. This is paradigm-shifting because your blob data can now be used for analytics. Additionally, services such as Azure Stream Analytics, IoT Hub, Azure Event Hubs capture, Azure Data Box, Azure Search, and many others integrate seamlessly with Data Lake Storage. Important scenarios like on-premises migration to the cloud can now easily move PB-sized datasets to Data Lake Storage using Data Box.

Multi-protocol access for Data Lake Storage also enables the partner ecosystem to use their existing Blob storage connector with Data Lake Storage.  Here is what our ecosystem partners are saying:

“Multi-protocol access for Azure Data Lake Storage is a game changer for our customers. Informatica is committed to Azure Data Lake Storage native support, and Multi-protocol access will help customers accelerate their analytics and data lake modernization initiatives with a minimum of disruption.”

- Ronen Schwartz, Senior Vice President and General Manager of Data Integration, Big Data, and Cloud, Informatica

You will not need to update existing applications to gain access to your data stored in Data Lake Storage. Furthermore, you can leverage the power of both your analytics and object storage applications to use your data most effectively.Graph displaying multi-protocol access that enables storage features, Azure ecosystem, partner ecosystem, and custom applications.

Multi-protocol access enables features and ecosystem

Multiple API endpoints—Same data, shared features

This capability is unprecedented for cloud analytics services because not only does this support multiple protocols, this supports multiple storage paradigms. We now bring you this powerful capability to your storage in the cloud. Existing tools and applications that use the Blob storage API gain these benefits without any modification. Directory and file-level access control lists (ACL) are consistently enforced regardless of whether an Azure Data Lake Storage API or Blob storage API is used to access the data.  

Both the Blob storage API and Azure Data Lake Storage API go through the Hierarchical Namespace, which is built on top of Blob storage.

Multi-protocol access on Azure Data Lake Storage

Features and expanded ecosystem now available on Data Lake Storage

Multi-protocol access for Data Lake Storage brings together the best features of Data Lake Storage and Blob storage into one holistic package. It enables many Blob storage features and ecosystem support for your data lake storage.

Features More information
Access tiers Cool and Archive tiers are now available for Data Lake Storage. To learn more, see the documentation “Azure Blob storage: hot, cool, and archive access tiers.”
Lifecycle management policies You can now set policies to a tier or delete data in Data Lake Storage. To learn more, see the documentation “Manage the Azure Blob storage lifecycle.”
Diagnostics logs Logs for the Blob storage API and Azure Data Lake Storage API are now available in v1.0 and v2.0 formats. To learn more, see the documentation "Azure Storage analytics logging."
SDKs Existing blob SDKs can now be used with Data Lake Storage. To learn more, see the below documentation:
PowerShell PowerShell for data plane operations is now available for Data Lake Storage. To learn more, see the Azure PowerShell quickstart.
CLI Azure CLI for data plane operations is now available for Data Lake Storage. To learn more, see the Azure CLI quickstart.
Notifications via Azure Event Grid You can now get Blob notifications through Event Grid. To learn more, see the documentation “Reacting to Blob storage events.” Azure Data Lake Storage Gen2 notifications are currently available.

 

Ecosystem partner More information
Azure Stream Analytics Azure Stream Analytics now writes to, as well as reads from, Data Lake Storage.
Azure Event Hubs capture The capture feature within Azure Event Hubs now lets you pick Data Lake Storage as one of its destinations.
IoT Hub IoT Hub message routing now allows routing to Azure Data Lake Storage Gen 2.
Azure Search You can now index and apply machine learning models to your Data Lake Storage content using Azure Search.
Azure Data Box You can now ingest huge amounts of data from on-premises to Data Lake Storage using Data Box.

Please stay tuned as we enable more Blob storage features using this amazing capability.

Next steps

All these new capabilities are available today in these documented regions. Sign up for the preview today. For more information, please see our documentation on multi-protocol access for Azure Data Lake Storage.