4 min read
Columnstore index is the preferred technology to run analytics queries in Azure SQL Databases. We recently announced general availability if In-Memory technologies for all Premium databases. Similar to In-Memory OLTP, the columnstore index technology is available in premium databases.
The columnstore technology is available in two flavors; clustered columnstore index (CCI) for DataMart analytics workloads and nonclustered columnstore index (NCCI) to run analytics queries on operational (i.e. OLTP) workload. Please refer to NCCI vs CCI for the differences between these two flavors of columnstore indexes. The columnstore index can speed up the performance of analytics queries up to 100x while significantly reducing the storage footprint. The data compression achieved depends on the schema and the data, but we see around 10x data compression on average when compared to rowstore with no compression. This blog will focus on Analytic workloads using CCI but cover NCCI in a future blog.
Clustered Columnstore index is available in Azure SQL Databases across all premium editions. However, it is not yet available on the Standard and Basic pricing tiers. Using this technology in Azure SQL Databases, you can lower the storage cost and getting a similar or better query performance on lower premium tiers.
The tables below show a typical analytics query with multi-table join running on P1 and P15 both with/without clustered columnstore index and storage savings achieved
Query Performance: Key point to note below is that with clustered columnstore index, the example query runs 5x faster on P1 compared to the same query running on P15 with rowstore with no tuning. This can significantly lower the cost you need to pay to meet your workload requirements.
|Pricing Tier||With Rowstore||With Columnstore||Performance Gains|
|P1||30.6 secs||4.2 secs||14x|
|P15||19.5 secs||0.319 secs||60x|
Storage Size: The storage savings with columnstore compared to PAGE or NONE compressed tables shown below. While the cost of storage is already included with AzureDB, but lower storage can enable you to choose a lower tier. Note, this is generated test data so the compression is lower than what one would get for customer workloads.
|Number of Rows||Size Rowstore (MB)||Size columnstore (MB)||Savings|
|3626191||212 (PAGE compression)||120||1.8x|
|3626191||756 (NONE compression)||120MB||6.2x|
The best part of columnstore index technology is that it does not require any changes to your application. All you need to do is to either create or replace an existing index with columnstore index on your table(s).
How does Columnstore Index work?
As described earlier, the columnstore is just an index that stores data in a table as columns as shown below. The queries can continue to access the table requiring no changes.
Columnstore index delivers significant data compression and query performance due to the following three key factors
- Reduced IO and Storage: Since data is stored as individual columns, it compresses really well as all values are drawn from the same domain (i.e. data type) and in many cases, the values repeat or are similar. The compression will depend on the data distribution but typical compression that we have seen is around 10x. This is significant because it enables you reduce the storage as the IO footprint of your database significantly.
- Only Referenced columns need to be fetched: Most analytics queries fetch/process only a small set of columns. If you consider a typical Star Schema, the FACT table is the one with most rows and it has large number of columns. With columnstore storage, SQL Server needs to fetch only the referenced columns unlike rowstore where the full row needs to be fetched regardless of number of columns referenced in the query. For example, consider a FACT table with 100 columns and an analytic query accessing this tables references only 5 columns. Now, by fetching only the referenced columns, you can potentially reduce IO by 95% with simplifying assumption that all columns take same storage. Note, this is on top of already 10x data compression provided by columnstore.
- Efficient Data Processing: SQL Server has an industry leading query engine for columnstore data to deliver up to 100x speed up in query performance. For details, please refer to Speeding up Analytics Queries.
How do I create clustered columnstore index?
Creating a clustered columnstore index is like creating any other index. For example, I can create a regular rowstore table as follows
CREATE TABLE ACCOUNT ( ACCOUNTKEY INT NOT NULL, ACCOUNTDESCRIPTION NVARCHAR (50), ACCOUNTTYPE NVARCHAR (50), ACCOUNTCODEALTERNATEKEY INT)
Any rows inserted into the table above are stored in rowstore format. Now, if you want to convert this table to store data in ‘columnstore’, all you need to do is to execute the following SQL statement
CREATE CLUSTERED COLUMNSTORE index ACCOUNT_CI on ACCOUNT
If the rowstore table had a clustered BTREE index, then you can execute the following SQL Statement
CREATE CLUSTERED COLUMNSTORE index ACCOUNT_CI on ACCOUNT WITH (DROP_EXISTING = ON)
When and where should you use clustered columnstore Index?
Clustered Columnstore index primarily targets analytics workloads. The table below shows the common scenarios that have been successfully deployed with this technology.
|CCI (clustered columnstore index)||
10x on average
|CCI/NCI (with one or more nonclustered indexes)||
10X on average + additional storage for NCIs
Resources to get started
For more details, please refer to the following