• <1 minute

Clustered Columnstore Tables are the New Default in Azure SQL Data Warehouse

Clustered columnstore indexes are one of the most efficient ways you can store your data in Azure SQL Data Warehouse.

Clustered columnstore indexes are the most efficient way you can store your data in Azure SQL Data Warehouse. Storing your data in tables which have a clustered columnstore index are the fastest way to query your data and will give you the greatest data compression, lowering your storage cost. To ensure your data is using this latest technology, clustered columnstore index is now the new table default in Azure SQL Data Warehouse.

With this change comes some changes to CREATE TABLE and CREATE TABLE AS SELECT syntax. The below example will now create a clustered columnstore table.

CREATE TABLE myClusteredColumnstoreTable 
  (
    id int NOT NULL,
    lastName varchar(20),
    zipCode varchar(6)
  )
WITH
  ( 
    DISTRIBUTION = HASH (id)
  );

If you wish to create your table as the previous default, HEAP, you simply need to add this option to your WITH clause. For example, the below DDL will create a HEAP table.

CREATE TABLE myHeapTable 
  (
    id int NOT NULL,
    lastName varchar(20),
    zipCode varchar(6)
  )
WITH
  ( 
    DISTRIBUTION = HASH (id), 
    HEAP
  );

If you wish to see what types of tables you have in your database, you can run the following query

SELECT
    a.name as schema_name, 
    b.name as table_name,
    c.type_desc as table_type,
    c.name as index_name
FROM
    sys.schemas a
    INNER JOIN sys.tables b
        ON a.schema_id = b.schema_id
    INNER JOIN sys.indexes c
        ON b.object_id = c.object_id
ORDER BY
    1, 2;

To convert a table from heap to clustered columnstore, you can use CREATE CLUSTERED COLUMNSTORE INDEX.

CREATE CLUSTERED COLUMNSTORE INDEX cciIndex ON myHeapTable;

To convert a table from clustered columnstore to heap, you can use DROP INDEX.

DROP INDEX cciIndex ON myHeapTable;

There are two subtle impacts of this change which may impact you.

  1. If you create staging tables for loads, where the main purpose is to load data into your database quickly and then move that data to a permanent table, you may find that heap tables perform better in that scenario.
  2. If you have automation which currently leverages a secondary index on a heap table, you will want to add the HEAP keyword to your DDL as secondary indexes are not yet supported on clustered columnstore tables.