We are pleased to announce the general availability of Clustered and NonClustered Columnstore indexes for Standard databases in the S3 and above pricing tiers, 100 eDTU and larger Elastic Pools, and all vCore based databases (both General Purpose and Business Critical). These features will enable a number of new and valuable scenarios:
- Functional dev/test for columnstore based applications, without needing to pay for Premium tier databases for testing purposes. (Of course performance testing should always be done at the target performance configuration.)
- Deploying applications with columnstore-based storage which do not have the mission critical performance and availability requirements found in Premium tier database requirements.
- Application vendors can now develop an application which leverages columnstore functionality and deploy it on both Standard and Premium performance tiers.
Columnstore advantages
Columnstore indexes are designed to be extremely efficient for queries which do scans and aggregations across millions and billions of rows of data. They are fundamentally different structures, which physically group data by column, rather than by row. In OLTP-style workloads, queries typically access one, or a few rows at a time, making traditional index structures the most efficient access path. For analytic queries, organizing data by column means that we only need to read the data for those columns involved in a query, and other columns never need be brought into memory at all. Additionally, groups of about 1 million rows (“rowgroups”) are compressed, and metadata about the range of values in each column for the rowgroup is maintained. Thus, we can skip entire rowgroups if we know that there are no rows which contain data that satisfies the query. For more information, please see the columnstore indexes documentation page.
Finally, because the data in each container is all from the same column, it tends to be very similar in value and of course is identical in datatype. This makes it extremely compressible. Typically we are able to compress 90 percent of the space out of a dataset in a columnstore.
All of this together means that we can store more data in less space, and move far less data from disk to memory in order to satisfy a query. Combining this with the ability to process batches of data when performing aggregations gives us very impressive performance improvements indeed. Our customers have seen gains on the order of 100X or more in real-world, production workloads for queries that lend themselves to columnstore processing.
Questions and answers
Q. Are all Standard performance levels supported now?
A. Queries that leverage columnstore indexes have a minimum memory footprint in order to perform well. To ensure an optimal experience, memory provided in S3 or higher, or Elastic Pools with at least 100 eDTU are required. All of the vCore based options are supported.
Q. What happens if I create a columnstore in a database and then scale down below S3?
A. The scaling operation will succeed, but the columnstore will not be chosen when the database executes a query. The index will be dormant with the exception that it will be updated to keep up with data changes in the table. When the database is scaled back to S3 or above, the columnstore index will automatically start being used again. One of the implications of this is if a table has only a clustered columnstore index, and is scaled down below S3. The clustered columnstore index can’t be used to retrieve data, so the data cannot be accessed until the database is scaled back up.
Q. What do I need to do to enable this feature?
A. For any database at the S3 Performance level or higher, no action is needed. Columnstore Index creation will now succeed with no other actions.