Azure SQL Data Warehouse is a SQL-based fully managed, petabyte-scale cloud solution for data warehousing. SQL Data Warehouse is highly elastic, enabling you to provision in minutes and scale capacity in seconds. You can scale compute and storage independently, allowing you to burst compute for complex analytical workloads or scale down your warehouse for archival scenarios, and pay based off what you're using instead of being locked into predefined cluster configurations.
Since announcing general availability in July 2016, we have continued to work on helping customers get data faster into their Data Warehouse to generate insights faster and grow their businesses further. Azure SQL Data Warehouse solves the data loading scenario via PolyBase, which is a feature built into the SQL Engine. It effectively leverages the entire Massively Parallel Processing (MPP) architecture of Azure SQL Data Warehouse to provide the fastest loading mechanism from Azure Blob Storage into the Data Warehouse. We recently shared how you can use Azure Data Factory Copy Wizard to load 1TB data in under 15 mins into Azure SQL Data Warehouse, at over 1.2 GB per second throughput.
To understand just how this works, let’s take a high-level look at the SQL Data Warehouse architecture. A SQL Data Warehouse is composed of a Control Node, which is where users connect and submit queries, and compute nodes, where processing occurs. Traditional loading tools load individual rows through the control node. The rows are then routed to the appropriate compute node depending on how the data is to be distributed. This can cause slower performance because the control node must read each record as they are received. PolyBase uses the compute nodes to load the data in parallel allowing for faster performance, resulting in quicker insights from your data.
UTF-16 support for delimited text files
To make it easier to load data into Azure SQL Data Warehouse using PolyBase, we have expanded our delimited text file format to support UTF-16 encoded files.
Support for UTF-16 encoded files is important because this is the default file encoding for BCP.exe. We’ve often seen that customers export their data from their on-premises data Warehouse to Azure Blob Storage in UTF-16 format. In the past, it was necessary to then have a script to reencode the data into UTF-8 format, resulting in time consuming processing and a duplication of data. Now with UTF-16 supported, files can go directly from Azure Blob storage into SQL Data Warehouse without encoding conversion.
How to import a UTF 16 text file format
To import UTF-16 files into SQL DW with PolyBase, all you have to do is create a new file format with the encoding option set to ‘UTF16’. All of the additional format options like, field terminator, date format, and rejection values are supported with both UTF-16 and UTF-8 encoding.
Below is an example of a pipe delimited text file format that would read UTF16 files.
In this blog post we discussed a bit about PolyBase and why it is the optimal data loading tool for SQL Data Warehouse and our expanded support for UTF-16 encoded file formats. This is now available in all SQL Data Warehouse Azure regions worldwide. We encourage you to try it out if you are interested in moving your on-prem Data Warehouse into the cloud.
If you have any feature requests for Azure SQL Data Warehouse, I like to suggest connecting with the team via User Voice.