Azure SQL Data Warehouse (SQL DW) is a SQL-based, fully managed, petabyte-scale cloud solution for data warehousing. SQL DW is highly elastic, you can provision in minutes and scale capacity in seconds. You can scale compute and storage independently, allowing you to burst compute for complex analytical workloads or scale down your warehouse for archival scenarios, and pay based off what you're using instead of being locked into predefined cluster configurations.
In the latest release of PolyBase in SQL DW, we have increased the row width limit to 1MB from 32KB. This will allow you to ingest your wide columns directly from Windows Azure Storage Blob or Azure Data Lake Store into SQL DW.
When thinking about loading data into SQL DW via PolyBase, you need to take into consideration a couple key points regarding the data size of strings.
- For character types (char, varchar, nchar, nvarchar), the 1MB data size is based on memory consumption of data in UTF-16 format. This means that each character is represented by 2 bytes.
- When importing variable length columns ((n)varchar, varbinary), the loading tool pads the buffer to the width of the schema in the external table definition regardless of data type. This means that a varchar(8000) has 8000 bytes reserved regardless of the size of the data in the row.
To help improve performance, define your external table with minimal amount of padding on schema data types to maximize the amount of data transferred per internal buffer.
Additionally, it is a best practice to use a medium or a large resource class and to scale up to a larger DWU instance to take advantage of additional memory needed for importing data, especially into CCI tables. More information can be found at our documentation for Memory allocation by DWU and Resource Class.
Give loading with External Tables into SQL DW a try with our loading tutorial.