Skip to main content

 Subscribe
On February 7, 2019, Azure Data Lake Gen 2 became generally available. For more information, please refer to the blog post, “Individually great, collectively unmatched: Announcing updates to 3 great Azure Data Services.”

The modern business landscape is ruled by data, with analytics and AI now essential for driving transformation. Customers have benefited tremendously from the performance, flexibility, and low cost offered by Azure for analytics and AI workloads. Today, we are introducing new capabilities in Azure that make it easier to deliver, build, and manage powerful analytics and AI solutions.

First, we are excited to announce the preview of Azure Data Lake Storage Gen2, the only cloud scale data lake designed specifically for mission critical analytics and AI workloads. Azure Data Lake Storage Gen2 combines the scalability and cost benefits of object storage with the reliability and performance offered by the Hadoop file system capabilities.

We are also pleased to announce the general availability of new capabilities in Azure Data Factory. Now, integrating data from multiple sources to validate, enrich, and transform data for insights is dramatically simplified.

This evolution of the Microsoft analytics portfolio makes it easier for customers to integrate disparate data sources, then store and process large amounts of data economically to accelerate their digital transformation.

Taking Azure Data Lake Storage to the next level

Analytics solutions such as Hadoop have been designed assuming they run on scale out file systems. Other cloud providers shoehorn these solutions using a combination of client-side file system emulation and feature-deficit object stores resulting in poor performance and inconsistent reliability, ultimately forcing compromise.

Azure Data Lake Storage Gen2 offers a no-compromise data lake. It unifies the core capabilities from the first generation of Azure Data Lake with a Hadoop compatible file system endpoint now directly integrated into Azure Blob Storage. This enhancement combines the scale and cost benefits of object storage with the reliability and performance typically associated only with on-premises file systems. This new file system includes a full hierarchical namespace that makes files and folders first class citizens, translating to faster, more reliable analytic job execution.

Azure Data Lake Storage Gen2 also includes limitless storage ensuring capacity to meet the needs of even the largest, most complex workloads. In addition, Azure Data Lake Storage Gen2 will deliver on native integration with Azure Active Directory and support POSIX compliant ACLs to enable granular permission assignments on files and folders.

As Azure Data Lake Storage Gen2 is fully integrated with Blob storage, customers can access data through the new file system-oriented APIs or the object store APIs from Blob Storage. Customers also have all the benefits of Azure Blob Storage including encryption at rest, object level tiering, and lifecycle policies as well as HA/DR capabilities such as ZRS and GRS. All of this will come at a lower cost and lower overall TCO for customers’ analytics projects! Azure Data Lake Storage Gen2 is the most comprehensive data lake available anywhere. At general availability, Azure Data Lake Storage Gen2 will be available in all Azure regions.

To enable a seamless experience with leading Open Source providers of Hadoop and Spark analytics engines, we are working closely with our partners to make Azure Data Lake Storage Gen2 the most optimized data lake solution for customers.

“As a key partner, Cloudera has been working very closely with Microsoft since our integration of CDH with the first generation of Azure Data Lake. We are confident that Azure Data Lake Storage Gen2 will provide a superior experience for our CDH customers, specifically from a performance and stability perspective. We are very excited to announce our commitment in providing comprehensive platform support for Azure Data Lake Storage Gen2.”

– Vikram Makhija, General Manager for Cloud, Cloudera

Data integration simplified with Azure Data Factory

With proliferation of big data, organizations no longer wish to be weighed down by the complexity of integrating their data to drive the analytical insights their business requires. Now generally available, the new capabilities in Azure Data Factory, Azure’s cloud-based data ingestion and integration service makes it easier than ever before to drive raw data to actionable insights.

With a drag-and-drop graphical user interface, data engineers and developers can quickly and easily create, schedule, and manage data integration at scale. Azure Data Factory now supports code-free data ingestion from over 70 data source connectors to accelerate data movement across on-premises, cloud, and applications. We have also made a preview of a native Azure Data Factory connector available for Azure Data Lake Storage Gen2 in preview so customers can take advantage of Azure Data Lake Storage Gen2 and easily migrate data from other data sources including the first generation of Azure Data Lake.

Data engineers and developers can also easily lift SQL Server Integration Services (SSIS) packages to Azure and let Azure Data Factory manage their resources for them, achieving high scalability and availability while reducing operational costs. London-based data analytics consultancy Concentra Analytics has seen an 80 percent reduction in automated data warehouse development time by moving their SSIS packages to Azure.

“We have no problem working with customers that have data distributed between on-premises and cloud sources, even those with large datasets. With Azure Data Factory, our customers use the DataPlus auto-generated SSIS packages published in Azure to achieve scalability.”

– Weelin Lim, Director of Business Intelligence, Concentra Analytics

Azure is the best place for analytics

We are committed to making Azure the best place for organizations to unlock the insights hidden in their data to accelerate innovation. Customers can benefit from tight integration with other Azure Services for building end to end powerful cloud scale analytics solutions to support modern data warehousing, advanced analytics, and real-time analytics easily and more economically.

Big Data and advanced analytics

To find out more about Azure Data Lake Storage you can:

To find out more about Azure Data Factory you can:

  • Explore

     

    Let us know what you think of Azure and what you would like to see in the future.

     

    Provide feedback

  • Build your cloud computing and Azure skills with free courses by Microsoft Learn.

     

    Explore Azure learning