Clustered Columnstore Tables are default in Azure SQL Data Warehouse
Friday, December 4, 2015
Clustered columnstore indexes are the most efficient way you can store your data in Azure SQL Data Warehouse. Storing your data in tables that have a clustered columnstore index are the fastest way to query your data. It will give you the greatest data compression and lower your storage costs. To ensure your data is using this latest technology, clustered columnstore index is now the table default in Azure SQL Data Warehouse. With this change come some changes to CREATE TABLE and CREATE TABLE AS SELECT syntax. For example, the following code will now create a clustered columnstore table:
CREATE TABLE myClusteredColumnstoreTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH ( DISTRIBUTION = HASH (id) );If you want to create your table with the previous default, HEAP, you simply need to add this option to your WITH clause. For example, the following code will create a HEAP table:
CREATE TABLE myHeapTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH ( DISTRIBUTION = HASH (id), HEAP );If you want to see what types of tables you have in your database, you can run the following query:
SELECT a.name as schema_name, b.name as table_name, c.type_desc as table_type, c.name as index_name FROM sys.schemas a INNER JOIN sys.tables b ON a.schema_id = b.schema_id INNER JOIN sys.indexes c ON b.object_id = c.object_id ORDER BY 1, 2;To convert a table from HEAP to a clustered columnstore, you can use CREATE CLUSTERED COLUMNSTORE INDEX, for example:
CREATE CLUSTERED COLUMNSTORE INDEX cciIndex ON myHeapTable;To convert a table from clustered columnstore to HEAP, you can use DROP INDEX, for example:
DROP INDEX cciIndex ON myHeapTable;There are two subtle impacts of this change that may impact you:
- If you create staging tables for loads, where the main purpose is to load data into your database quickly and then move that data to a permanent table, you may find that HEAP tables perform better.
- If you have automation that currently leverages a secondary index on a HEAP table, you will want to add the HEAP keyword to your DDL because secondary indexes are not yet supported on clustered columnstore tables.