Multi-shard querying using elastic database tools

Applies to: Azure SQL Database

Overview

With the Elastic Database tools, you can create sharded database solutions. Multi-shard querying is used for tasks such as data collection/reporting that require running a query that stretches across several shards. (Contrast this to data-dependent routing, which performs all work on a single shard.)

  1. Get a RangeShardMap (Java, .NET) or ListShardMap (Java, .NET) using the TryGetRangeShardMap (Java, .NET), the TryGetListShardMap (Java, .NET), or the GetShardMap (Java, .NET) method. See Constructing a ShardMapManager and Get a RangeShardMap or ListShardMap.
  2. Create a MultiShardConnection (Java, .NET) object.
  3. Create a MultiShardStatement or MultiShardCommand (Java, .NET).
  4. Set the CommandText property (Java, .NET) to a T-SQL command.
  5. Execute the command by calling the ExecuteQueryAsync or ExecuteReader (Java, .NET) method.
  6. View the results using the MultiShardResultSet or MultiShardDataReader (Java, .NET) class.

Example

The following code illustrates the usage of multi-shard querying using a given ShardMap named myShardMap.

using (MultiShardConnection conn = new MultiShardConnection(myShardMap.GetShards(), myShardConnectionString))
{
    using (MultiShardCommand cmd = conn.CreateCommand())
    {
        cmd.CommandText = "SELECT c1, c2, c3 FROM ShardedTable";
        cmd.CommandType = CommandType.Text;
        cmd.ExecutionOptions = MultiShardExecutionOptions.IncludeShardNameColumn;
        cmd.ExecutionPolicy = MultiShardExecutionPolicy.PartialResults;

        using (MultiShardDataReader sdr = cmd.ExecuteReader())
        {
            while (sdr.Read())
            {
                var c1Field = sdr.GetString(0);
                var c2Field = sdr.GetFieldValue<int>(1);
                var c3Field = sdr.GetFieldValue<Int64>(2);
            }
        }
    }
}

A key difference is the construction of multi-shard connections. Where SqlConnection operates on an individual database, the MultiShardConnection takes a collection of shards as its input. Populate the collection of shards from a shard map. The query is then executed on the collection of shards using UNION ALL semantics to assemble a single overall result. Optionally, the name of the shard where the row originates from can be added to the output using the ExecutionOptions property on command.

Note the call to myShardMap.GetShards(). This method retrieves all shards from the shard map and provides an easy way to run a query across all relevant databases. The collection of shards for a multi-shard query can be refined further by performing a LINQ query over the collection returned from the call to myShardMap.GetShards(). In combination with the partial results policy, the current capability in multi-shard querying has been designed to work well for tens up to hundreds of shards.

A limitation with multi-shard querying is currently the lack of validation for shards and shardlets that are queried. While data-dependent routing verifies that a given shard is part of the shard map at the time of querying, multi-shard queries do not perform this check. This can lead to multi-shard queries running on databases that have been removed from the shard map.

Multi-shard queries and split-merge operations

Multi-shard queries do not verify whether shardlets on the queried database are participating in ongoing split-merge operations. (See Scaling using the Elastic Database split-merge tool.) This can lead to inconsistencies where rows from the same shardlet show for multiple databases in the same multi-shard query. Be aware of these limitations and consider draining ongoing split-merge operations and changes to the shard map while performing multi-shard queries.

Additional resources

Not using elastic database tools yet? Check out our Getting Started Guide. For questions, contact us on the Microsoft Q&A question page for SQL Database and for feature requests, add new ideas or vote for existing ideas in the SQL Database feedback forum.