Filesystem SDKs for Azure Data Lake Storage Gen2 now generally available

Publicado em 19 março, 2020

Senior Program Manager, Azure Storage

Since the general availability of Azure Data Lake Storage (ADLS) Gen2 in Feb 2019, customers have been getting insights for their big data analytics workloads at cloud scale. Integration to analytics engines is critical for their analytics workloads, and equally important is the ability to programmatically ingest, manage, and analyze data. This ability is critical for key areas of enterprise data lakes such as data ingestion, event-driven big data platforms, machine learning (ML), and advanced analytics. Programmatic access is possible today using ADLS Gen2 REST APIs, Blob REST APIs, or capabilities via Multi-Protocol Access. As part of our developer ecosystem journey, our goal is to make customer application development for programmatic access easier than ever before.

Towards this goal, we're announcing the general availability of Python, .NET, Java, and JS filesystem SDKs for Azure Data Lake Storage (ADLS) Gen2 in all Azure regions. This includes support for CRUD operations for filesystem, directories, files, and permissions with filesystem semantics for ADLS Gen2. Customers can now use this familiar filesystem programming model to simplify application development for ADLS Gen2. These filesystem SDKs streamline our customers’ ability to ingest, manage, and analyze data for ADLS Gen2 and help them gain insights at cloud scale faster than ever before.

Preview feedback

Many of our customers have tried out the ADLS Gen2 SDK preview builds for their scenarios successfully. Here are some common themes based on preview feedback:

  • The SDK is working seamlessly with new filesystem semantics and has successfully moved key data domains to ADLS Gen2. The SDK expedited the transfer of 450 GB data from ADLS Gen1 to ADLS Gen2 within a few hours. The permissions set up at the root-level directory is working well with hierarchical namespace enabled and all the permissions are propagating perfectly to the child items through the folder hierarchy.
  • The SDK is critical to the way customers orchestrate their deployments.
  • The SDK has helped ingest large amounts of IoT data to be used by data scientists for their analytics workloads. This has been instrumental in providing self-service environments for the researchers with access to their own set of directories.
  • Data ingestion pipelines have used the SDK to integrate drone image data, satellite image data, ground sensor data, and weather data into ADLS Gen2. This helps build custom ML models which generate additional business insights for customers. Customers can use these ML models or aggregate raw data based on their needs and store processed results back into ADLS Gen2.
  • Customers appreciate that the SDK preview feedback has been addressed as part of the preview builds and are eagerly awaiting general availability.
  • Customers have successfully executed various tests including creating and appending files using the ADLS Gen2 SDK and testing reads using the Blob REST API. 

Based on your preview feedback, we have also introduced new APIs for bulk upload that simplifies the experience for larger data writes/appends for ADLS Gen2. Detailed documentation is available in the links below:

PowerShell and CLI will continue to be available for preview globally in all Azure regions.  We will announce General Availability for PowerShell and CLI as soon as we have addressed preview feedback.

Next steps 

We welcome your feedback to continue to enrich the ADLS Gen2 developer experience and thank everyone for their collaboration towards achieving this high value release. We look forward to these strong partnerships in future investments as well for our developer ecosystem journey.