• 2 min read

Identity now available in SQL Data Warehouse

Azure SQL Data Warehouse now supports IDENTITY column properties.

Azure SQL Data Warehouse (SQL DW) is a SQL-based, fully managed, petabyte-scale cloud solution for data warehousing. SQL DW is highly elastic, you can provision in minutes and scale capacity in seconds. You can scale compute and storage independently, allowing you to burst compute for complex analytical workloads or scale down your warehouse for archival scenarios, and pay based off what you're using instead of being locked into predefined cluster configurations.

IDENTITY has been a long standing customer ask for SQL Data Warehouse. We’re excited to announce that Azure SQL Data Warehouse now supports an IDENTITY column property as well as SET IDENTITY_INSERT syntax and generating IDENTITY on load. In data warehousing, IDENTITY functionality is particularly important as it makes easier the creation of surrogate keys.

Surrogate keys are fundamental to dimensional modelling because they often uniquely identify a row. Since they are typically integer values, they also tend to compress and compare with better performance. While UUIDs can often be used for similar purposes, they are harder to manage, don’t intrinsically contain temporal information, and are non-performant. For large data warehouses, the 4x size of UUIDs compared with a traditional 4-byte IDENTITY value really adds up. The previous method of assigning monotonically increasing surrogate keys involved using left outer joins from a staging table combined with the application of getting the max id on the surrogate key column with a ROW_NUMBER function. This solution was clunky and invoked a costly broadcast data move.

We hope that by adding this feature, we’ve made data management in SQL DW easier and better for our customers.

Keep in mind, this IDENTITY property is not synonymous with uniqueness constraints which are often imposed on IDENTITY columns!

Next steps

Get started today by creating IDENTITY columns in a table today. It’s as simple as:

CREATE TABLE dbo.T1
(	C1 INT IDENTITY(1,1) NOT NULL
,	C2 INT NULL
)
WITH
(   DISTRIBUTION = HASH(C2)
,   CLUSTERED COLUMNSTORE INDEX
)
;
Bear in mind that the IDENTITY property cannot be used in the following scenarios:
  • Where the column data type is not INT or BIGINT
  • Where the column is also the distribution key
  • Where the table is an external table

Learn more about adding IDENTITY functionality to your tables today by visiting our documentation or our T-SQL syntax page.

Learn more