We’re excited to announce another round of improvements during the month of November for SQL Data Warehouse. Please note that, in addition to these improvements, since our last update we’ve also added Data Auditing and Transparent Data Encryption.
PolyBase
We’ve added functionality to PolyBase, our mechanism for working with non-relational data:
- PolyBase can now read from, write to, and query Parquet files in Azure Storage Blobs.
- PolyBase now provides more information about the rejected rows. For example, in the case of a load that succeeds but has rejected rows, you may see a message like this:
Query completed. Rows were rejected while reading from external source(s). 10 rows rejected from FactSalesExternal in step 1 of query execution: Location: '/example/example.txt' Column ordinal: 0, Expected data type: INT, Offending Value: 2147483648 (Column Conversion Error), Error: Arithmetic overflow error converting NVARCHAR to data type SMALLINT. and 9 more.
When a load is aborted because of too many rejected rows, you will see a message like this:
Rows were rejected while reading from external source(s). 500 rows rejected from FactSalesExternal in step 1 of query execution: Location: '/example/example.txt' Column ordinal: 0, Expected data type: INT, Offending Value: 2147483648 (Column Conversion Error), Error: Arithmetic overflow error converting NVARCHAR to data type SMALLINT. and 499 more. ERROR: Query aborted -- number of rejected rows exceeded the maximum row rejection threshold while reading from an external source: 500 rows rejected out of total 10000 rows processed.
- We have fixed an issue where dropping and recreating a CREDENTIAL object associated with existing EXTERNAL TABLES would not always refresh the cached credential for those EXTERNAL tables.
SQL features
We’ve also expanded the SQL Server features that are supported as part of SQL Data Warehouse, including:
- Clustered Columnstore will now be applied to tables created in SQL Data Warehouse by default. This will allow tables to take advantage of the additional compression and query performance that this construct allows. You can read more about this work our Clustered Columnstore blogpost.
- Support for RENAME OBJECT statement.
- Support for a new set of dynamic management views (DMVs) that provide additional visibility into information about your SQL Data Warehouse. For more specifics see the list of DMVs. All DMVs containing “pdw_nodes” are new. For example if you wanted to see all of the performance data from your compute nodes during a query you would look at sys.dm_pdw_nodes_os_performance_counters. Alternatively, you could see all of the steps in your query plan by using the sys.dm_exec_requests view.
- Support for Threat Detection within Data Auditing, enabling users to detect anomalous activities and suspicious events. For example, you can set-up alerts within the portal to identify events that seem suspicious and ensure that you are alerted by e-mail:
After receiving word of a suspicious event, you can analyze your events using the data auditing feature in the portal, or open your logs in excel for more advanced analysis options.
- We addressed an issue with BCP where permissions were not always honored during loads.
Back-ups
We’ve improved the design of our automatic, scheduled back-ups in SQL Data Warehouse to increase reliability and lower the impact on user workloads. We now leverage the same Azure storage snapshot mechanism newly added in SQL Server 2016.
Performance and stability
Finally, we’ve improved the performance and stability of SQL Data Warehouse. Specifically, we’ve:
- Improved our mechanism for loading into Round Robin tables to ensure a more even distribution of data even for small/incremental load scenarios.
- Addressed a frequent intermittent connection issue which caused some long running queries to fail with the message “Encountered exception when enlisting connection. Transaction will be rolled back”.
- Improved stability of pause, resume, and scaling operations.