• 3 min read

Vertical Partitioning in SQL Azure: Part 1

[This article was contributed by the SQL Azure team.]SQL Azure currently supports 1 GB and 10 GB databases. If you want to store larger amounts of data in SQL Azure you can divide your tables across…

[This article was contributed by the SQL Azure team.]

SQL Azure currently supports 1 GB and 10 GB databases. If you want to store larger amounts of data in SQL Azure you can divide your tables across multiple SQL Azure databases. This article will discuss how to use a middle layer to join two tables on different SQL Azure databases using LINQ. This technique vertically partitions your data in SQL Azure.

In this version of vertically partitioning for SQL Azure we are dividing all the tables in the schema across two or more SQL Azure databases. In choosing which tables to group together on a single database you need to understand how large each of your tables are and their potential future growth – the goal is to evenly distribute the tables so that each database is the same size.

There is also a performance gain to be obtained from partitioning your database. Since SQL Azure spreads your databases across different physical machines, you can get more CPU and RAM resources by partitioning your workload. For example, if you partition your database across 10 – 1 GB SQL Azure databases you get 10X the CPU and memory resources. There is a case study (found here) by TicketDirect, who partitioning their workload across hundreds of SQL Azure databases during peak load.

When partitioning your workload across SQL Azure databases, you lose some of the features of having all the tables in a single database. Some of the considerations when using this technique include:

  • Foreign keys across databases are not support. In other words, a primary key in a lookup table in one database cannot be referenced by a foreign key in a table on another database. This is a similar restriction to SQL Server’s cross database support for foreign keys.
  • You cannot have transactions that span databases, even if you are using Microsoft Distributed Transaction Manager on the client side. This means that you cannot rollback an insert on one database, if an insert on another database fails. This restriction can be negated through client side coding – you need to catch exceptions and execute “undo” scripts against the successfully completed statements.

SQLAzureHelper Class

In order to accomplish vertical partitioning we are introduce the SQLAzureHelper class, which:

  • Implements forward read only cursors for performance.
  • Support IEnumerable and LINQ
  • Disposes of the connection and the data reader when the result set is no longer needed.

This code has the performance advantage of using forward read only cursors, which means that that data is not fetched from SQL Azure until it is needed for the join.

The code to get the result sets from SQLAzureHelper class looks like this:

var colorDataReader = SQLAzureHelper.ExecuteReader(      ConfigurationManager.ConnectionStrings["ColorDatabase"].ConnectionString,      sqlConnection =>      {          SqlCommand sqlCommand =              new SqlCommand("SELECT ColorName, CompanyId FROM Colors",                  sqlConnection);          return (sqlCommand.ExecuteReader());      });    var companyDataReader = SQLAzureHelper.ExecuteReader(      ConfigurationManager.ConnectionStrings["CompanyDatabase"].ConnectionString,      sqlConnection =>      {          SqlCommand sqlCommand =              new SqlCommand("SELECT CompanyId, CompanyName FROM Companies",                  sqlConnection);          return (sqlCommand.ExecuteReader());      });

The result sets return from the two SQL Server databases as join by LINQ.

LINQ

LINQ is a set of extensions to the .NET Framework that encompass language-integrated query, set, and transform operations. It extends C# and Visual Basic with native language syntax for queries and provides class libraries to take advantage of these capabilities. You can learn more about LINQ here. This code is using LINQ as client-side query processor to perform the joining and querying of the two result sets.

var query =      from color in colorDataReader      join company in companyDataReader on          (Int32)color["CompanyId"] equals (Int32)company["CompanyId"]      select new      {          ColorName = (string)color["ColorName"],          CompanyName = (string)company["CompanyName"]      };    foreach (var combo in query)  {      Console.WriteLine(String.Format("{0} - {1}", combo.CompanyName, combo.ColorName));  }

This code takes the result sets and joins them based on CompanyId, then selects a new class comprised of CompanyName and ColorName.

Connections and SQL Azure

One thing to note is that the code above doesn’t take into account the retry scenario mention in our previous blog post. This has been done to simpfy the example. The retry code needs to go outside of the SQLAzureHelper class to completely re-execute the LINQ query.

In our next blog post we will demonstrate horizontal partitioning using the SQLAzureHelper class. Do you have questions, concerns, comments? Post them below and we will try to address them.