• <1 minute

Cross-Database Queries in Azure SQL Database

This blog announces the new cross-database querying capabilities in Azure SQL Database and other improvements to elastic database query.

NOTE: The Azure SQL Database elastic query overview (preview) documentation has the latest and most relevant information on elastic database queries.

We are happy to announce a number of significant improvements to elastic database query in Azure SQL Database. Most notably, elastic database query now supports querying across databases in Azure SQL Database. This makes possible common cross-database querying tasks like selecting from a remote table into a local table.

Cross-database queries in Azure SQL Database

It also allows for richer remote database querying topologies like the one illustrated in the following figure where a number of databases need access to each others tables.

Querying remote databases in Azure SQL Database

This new cross-database querying capability complements the existing support in elastic database query for horizontal partitioning (sharding) which is illustrated in the following figure.

HorizontalPartitioning
In contrast to SQL Server on-premises, elastic database query in Azure SQL Database now unifies both vertical and horizontal partitioning under one common concept and the same surface area.

Enhancements in the latest refresh of the elastic database query preview include:

  • Improved support for common cross-database query scenarios that do not involve sharding,
  • Elastic query is now available in both Standard and Premium performance tiers,
  • Flexible DDL now allows schema and table name aliases to represent remote database tables,
  • Performance is significantly improved for queries that involve T-SQL parameters when referencing remote tables,
  • Performance improvements for queries that retrieve large numbers of rows from remote databases,
  • Parameter support in the sp_execute_fanout procedure.

See the following paragraphs for more details on those enhancements.

Querying remote databases

Elastic database query now provides access to tables in remote Azure SQL Databases through a simple extension in the DDL for external data sources and external tables. You can define an external data source that, for instance, provides access to a remote database which stores reference data shared among all databases of your data tier. You can also easily copy the contents of tables from a remote database to another using a INSERT INTO… SELECT statement.

External data sources that refer to a single remote database are identified by using the RDBMS option in the TYPE clause of the following DDL statement:

CREATE EXTERNAL DATA SOURCE RemoteReferenceData
WITH
(
 TYPE=RDBMS,
 LOCATION='myserver.database.windows.net',
 DATABASE_NAME='ReferenceData',
 CREDENTIAL= SqlUser
);

Based on this external data source, you can now define an external table that provides remote access to a ZIP codes table located in the ReferenceData database.

CREATE EXTERNAL TABLE [dbo].[zipcode](
 [zc_id] int NOT NULL,
 [zc_cityname] nvarchar(256) NULL,
 [zc_zipcode] nvarchar(20) NOT NULL,
 [zc_country] nvarchar(5) NOT NULL
)
WITH
(
 DATA_SOURCE = RemoteReferenceData
);

After this simple one-time setup, your queries can now access the remote ZIP code table from any Azure SQL Database where the external data source and external table have been defined.

Availability across more performance tiers

Elastic database query is now also available in the Standard performance tier of Azure SQL Database. This significantly lowers the cost of entry for cross-database querying and partitioning scenarios in Azure SQL Database. Due to the smaller DTU limits in the Standard tier, it can take up to one minute to initialize elastic database query when you run your first remote database query. Initialization latency for elastic database query is an area we are actively working on. The experience will improve over the next couple of months.

More flexible naming

Several important scenarios require the ability to name your external table differently than the original table on the remote database. Any scenario where a local table already exists with the same name as your remote table are examples of that. All of these scenarios require the ability to use an alias for the remote table name.

For instance, consider a scenario where you want an external table definition to aggregate a DMV (Dynamic Management View) across a horizontally partitioned (sharded) data tier. Previously, this required complicated workarounds such as effectively renaming the DMV using a view on the remote databases and referring to the view from the external table definition. This was necessary since DMV names or catalog names already existed locally and could not be used directly as external table names.

Now you can use any name as your external table name and identify the underlying remote table using the new OBJECT_SCHEMA and OBJECT_NAME clauses on the external table DDL. This makes it easy to query across DMVs or catalog views of your scaled-out data tier, as the following example shows. The following DDL (Data Definition Language) performs the one-time setup of the external data source and external table. Note the use of the OBJECT_SCHEMA and OBJECT_NAME clauses in the external table definition:

CREATE EXTERNAL DATA SOURCE MyExtSrc
WITH
(
 TYPE=SHARD_MAP_MANAGER,
 LOCATION='myserver.database.windows.net',
 DATABASE_NAME='ShardMapDatabase',
 CREDENTIAL= SMMUser,
 SHARD_MAP_NAME='ShardMap'
);

 

CREATE EXTERNAL TABLE [dbo].[all_dm_exec_requests](
 [session_id] smallint NOT NULL,
 [request_id] int NOT NULL,
 [start_time] datetime NOT NULL, 
 [status] nvarchar(30) NOT NULL,
 [command] nvarchar(32) NOT NULL,
 [sql_handle] varbinary(64),
 [statement_start_offset] int,
 [statement_end_offset] int,
 [cpu_time] int NOT NULL
)
WITH
(
 DATA_SOURCE = MyExtSrc,
 SCHEMA_NAME = 'sys',
 OBJECT_NAME = 'dm_exec_requests',
 DISTRIBUTION=ROUND_ROBIN
);

Now you can retrieve the most expensive requests across your whole data tier with a simple elastic database query like the following:

SELECT TOP 10 
 [request_id],
 [start_time]
 [status],
 [command]
FROM all_dm_exec_requests
ORDER BY [cpu_time] DESC

New signature for sp_execute_fanout

Elastic database query provides the stored procedure sp_execute_fanout to invoke stored procedures and functions on remote databases. Our recent improvements to Azure SQL Database now align the signature of sp_execute_fanout with the familiar signature for sp_executesql. This allows passing regular SQL parameters into invocations of sp_execute_fanout and will be available early next week.

Performance improvements

Previously, elastic database query could not push parameterized operations to remote databases. As a result, sometimes large row sets had to be unnecessarily brought local to evaluate these operations. With the recent improvements, parameterized operations can now be pushed to remote databases and be evaluated remotely. For a query over an external table and a local table like the following, this can now avoid transferring millions of rows by evaluating the selective filter in the WHERE clause on the remote database:

DECLARE @low int
DECLARE @high int
SET @low = 100
SET @high = 200

SELECT c.CustomerId, c.Name, count(OrderId) 
FROM remote_customers c
JOIN local_orders o
ON c.CustomerId = o.CustomerId 
WHERE c.CustomerId > @low and c.CustomerId < @high
GROUP BY c.CustomerId, c.Name

A quick look into the query plan for the query above confirms that the range predicate on customer ID in the WHERE clause made it successfully into the remote query operator.

Improved performance for parameterized remote operations

Finally, we also made transferring large numbers of small rows with elastic database query more efficient. Our tests show performance improvements for queries over external tables by a factor of more than five when transferring 100,000 rows or more.

To learn more about all the improvements discussed above, please visit the elastic database query overview.