Azure SQL Data Warehouse (SQL DW) is a SQL-based petabyte-scale, massively parallel, cloud solution for data warehousing. It is fully managed and highly elastic, enabling you to provision and scale capacity in minutes. You can scale compute and storage independently, allowing you to range from burst to archival scenarios, and pay based off what you're using instead of being locked into a cluster configuration.
The engine underneath Azure SQL Data Warehouse that runs the queries on each individual node is the industry leading SQL Server Database from Microsoft. With general availability in 2016, Azure SQL DW received an upgrade to SQL Server 2016 that transparently provided 40% performance increase to user workloads comprising of analytic queries.
The two performance pillars of SQL DW are its column store and the batch mode execution engine, also known as vectorized query execution. In this blog, we highlight the improvements in SQL Server 16 that took SQL Data Warehouse performance to a new level. These are all in addition to existing features such as columnar compression and segment elimination. We already had batch mode execution that can process multiple rows at a time, instead of one value at a time, and take advantage of SIMD hardware innovations. SQL Server 16 further extended batch mode execution to more operators and scenarios.
The following are the key SQL Server 16 performance innovations for columnstore and batch mode. Each links to a detailed blog providing examples and observed performance gain.
Aggregate Pushdown
Aggregates are very common in analytic queries. With columnstore tables, SQL Server processes aggregates in batch mode delivering an order of magnitude better performance. SQL Server 16 further dials up aggregate computation performance by pushing the aggregation to the SCAN node. This allows the aggregate to be computed on the compressed data during the scan itself.
String Predicate Pushdown
Columnstore in SQL Server 16 allows string predicates to be pushed down the SCAN node, resulting in a significant improvement in query performance. String predicate pushdown leverages dictionaries to minimize the number of string comparisons.
Multiple Aggregates
SQL Server 16 now processes multiple aggregates on a table scan more efficiently in a single batch mode aggregation operator. Previously multiple aggregation paths and operators would be instantiated resulting in slower performance.
Batch Mode Window Aggregates
SQL Server 16 introduces batch mode execution for window aggregates. Batch mode has the potential to speed up certain queries by even 300 times as measured in some of our internal tests.
Batch Mode in Serial Execution
High concurrent activity and/or low number of cores can force queries to run in serial. Previously serial queries would get forced to run in row mode, resulting in a double beating from lack of parallelism and lack of batch mode. SQL Server 16 can run batch mode even when degree of parallelism (DOP) for a query is 1 (DOP 1 means the query is run serial). SQL Data Warehouse at lower SLOs (less than DWU1000) runs each distribution query in serial as there is less than one core per distribution. With this improvement, these queries now run in batch mode.
The above is quite an extensive list of performance boosts that SQL Data Warehouse now benefits from. Best of all, no change is required to SQL Data Warehouse user queries to get the above performance benefits – it is all automatic under the hood!
Next steps
In this blog we described how SQL Server 2016 innovations in columnstore and batch mode technologies give a huge performance boost to Azure SQL Data Warehouse queries. We encourage you to try it out by moving your on-premise data warehouse into the cloud.
Learn more
Check out the many resources to learn more about SQL Data Warehouse.
What is Azure SQL Data Warehouse?