SQL Data Warehouse now supports seamless integration with Azure Data Lake Store

Inlägg på 6 februari, 2017

Program Manager II, SQL Server and Azure SQL Database

Azure SQL Data Warehouse is a SQL-based fully managed, petabyte-scale cloud solution for data warehousing. SQL Data Warehouse is highly elastic, enabling you to provision in minutes and scale capacity in seconds. You can scale compute and storage independently, allowing you to burst compute for complex analytical workloads or scale down your warehouse for archival scenarios, and pay based off what you're using instead of being locked into predefined cluster configurations.

We are pleased to announce that you can now directly import or export your data from Azure Data Lake Store (ADLS) into Azure SQL Data Warehouse (SQL DW) using External Tables.

ADLS is a purpose-built, no-limits store and is optimized for massively parallel processing. With SQL DW PolyBase support for ADLS, you can now load data directly into your SQL DW instance. Because SQL DW can now ingest data directly from Windows Azure Storage Blob and ADLS, you can now load data from any storage service in Azure. This provides you with the flexibility to choose what is right for your application. 

A common use case for ADLS and SQL DW is the following. Raw data is ingested into ADLS from a variety of sources. Then ADL Analytics is used to clean and process the data into a loading ready format. From there, the high value data can be imported into Azure SQL DW via PolyBase.

PolyBase Pipeline Support

ADLS has a variety of built-in security features that PolyBase uses to ensure your data remains secure, such as always-on encryption, ACL-based authorization, and Azure Active Directory (AAD) integration. To load data from ADLS via PolyBase, you need to create an AAD application. Read and write privileges are managed for the AAD application on either a per directory, subdirectory, or file basis. This allows you to provide fine-grained access control of what data can be loaded into SQL DW from ADLS resulting in an easy to manage security model.

You can import data stored in ORC, RC, Parquet, or Delimited Text file formats directly into SQL DW using the Create Table As Select (CTAS) statement over an external table.

How to Set Up the Connection to Azure Data Lake Store

When you connect to your SQL DW from your favorite client (SSMS or SSDT), you can use the script below to get started. You will need to know your AAD Application’s client ID, OAuth2.0TokenEndpoint, and Key to create a Database Scoped Credential in SQL DW. This key is encrypted with your Database Master Key and is stored within the SQL DW. This is the credential used to authenticate against ADLS.

Image

It’s just that simple to load data into Azure SQL Data Warehouse from ADLS.

Best Practices for loading data into SQL DW from Azure Data Lake Store

For the best experience, please look at the following guidelines:

  • Co-locate the services in the same data center for better performance and no data egress charges.
  • Split large compressed files into at least 60 smaller compressed files.
  • Use a large resource class in SQL DW to load the data.
  • Ensure that your AAD Application has read access from your chosen ADLS Directory.
  • Scale up your DW SLO when importing a large data set.
  • Use a medium resource class for loading data into SQL DW.

Learn more about best practices for loading data into SQL DW from Azure Data Lake Store.

Next steps

If you already have an Azure Data Lake Store, you can try loading your data into SQL Data Warehouse.

Additionally, there are great tutorials specific to ADLS to get you up and running.

Learn more

What is Azure SQL Data Warehouse?

What is Azure Data Lake Store?

SQL Data Warehouse best practices

Load Data into SQL Data Warehouse

MSDN forum

Stack Overflow forum