[This article was contributed by the SQL Azure team.]
When using a web enabled database like SQL Azure, it requires connections over the internet or other complex networks and because of this, you should be prepared to handle unexpected dropping of connections. Established connections consist of: connections that are returning data, open connections in the connection pool, or connections being cached in client side variables. When you are connecting to SQL Azure, connection loss is a valid scenario that you need to plan for in your code. The best way to handle connection loss it to re-establish the connection and then re-execute the failed commands or query.
Network reliability
The quality of all network components between the machine running your client code and the SQL Azure Servers is at times outside of Microsoft’s sphere of control. Any number of reasons on the internet may result in the disconnection of your session. When running applications in Windows Azure the risk of disconnect is significantly reduced as the distance between the application and the server is reduced.
In circumstances when a network problem causes a disconnection, SQL Azure does not have an opportunity to return a meaningful error to the application because the session has been terminated. However, when reusing this connection (like when you use connection pooling), you will get a 10053 (A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – An established connection was aborted by the software in your host machine) error.
Redundancy Requires Connection Retry Considerations
If you are used to connecting against a single SQL Server within the same LAN and that server fails or goes down for an upgrade then your application will get disconnected permanently. However, if you have been coding against a redundant SQL Server environment you might already have code in place to manage reconnecting to your redundant server when you primary server is unavailable. In this situation you suffer a short disconnect instead of an extended downtime. SQL Azure behaves much like a redundant SQL Server cluster. The SQL Azure fabric manages the health of every node in the system. Should the fabric notice that a node is either in an unhealthy state or (in the event of an upgrade) a node is ready to be taken offline, the fabric will automatically reconnect your session to a replica of your database on a different node.
Currently some failover actions result in an abrupt termination of a session and as such the client receives a generic network disconnect error (A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 – An established connection was aborted by the software in your host machine.) The best course of action in these situations is to reconnect and SQL Azure will automatically connect you to a healthy replica of your database.
Resource Management by SQL Azure
Like any other database SQL Azure will at times terminate sessions due to errors, resource shortages and other transient reasons. In these situations SQL Azure will always attempt to return a specific error if the client connection has an active request. It is important to note that it may not always be possible to return an error to a client application if there are no pending requests. For example, if you are connected to your database through SQL Server Management Studio for longer than 30 minutes without having any active request your session will timeout and because there are no active requests SQL Azure can’t return an error.
In these circumstances, SQL Azure will close an already established connection:
- An idle connection was held by an application for more than 30 minutes.
- You went to lunch and left your SQL Server Management Studio connection for longer than 30 minutes
SQL Azure Errors
Before we start writing code to handle connection loss, a few other SQL Azure errors would benefit from a re-establish the connection and then re-execute the failed commands or the query. They include:
- 40197 – The service has encountered an error processing your request. Please try again.
- 40501 – The service is currently busy. Retry the request after 10 seconds.
The Code
The following code takes into account the latency associated with the internet and the potential for connections to get dropped. The code:
- Re-executes the failed commands or the query repeatedly, the number of times is configurable in the .config file without getting into an endless loop.
- Wait time between attempts is configurable in the .config file.
- Handles only exceptions that should be re-attempted, throwing the other exceptions.
- Abstracts the screening of exceptions so that other scenarios can be added in the future.
C#:
////// Generic Code that takes and input and executes /// a statement against the SQL Azure /// /// static void DoSomething(Int32 companyId) { // This is the retry loop, handling the retries session // is done in the catch for performance reasons for (Int32 attempt = 1; ; ) { // Create a SqlConnection Class, the connection isn't established // until the Open() method is called using (SqlConnection sqlConnection = new SqlConnection( ConfigurationManager.ConnectionStrings["SQLAzure"]. ConnectionString)) { try { // Open the connection sqlConnection.Open(); // Statement To Call String sql = @"SELECT Color FROM Source WHERE Id = @CompanyId"; SqlCommand sqlCommand = new SqlCommand(sql, sqlConnection); sqlCommand.Parameters.AddWithValue("@CompanyId", companyId); using (SqlDataReader sqlDataReader = sqlCommand.ExecuteReader()) { while (sqlDataReader.Read()) { /// Do Something Wtih Color } return; } } catch (SqlException sqlException) { // Increment Trys attempt++; // Find Maximum Trys Int32 maxRetryCount = Int32.Parse( ConfigurationManager.AppSettings["ConnectionRetrys"]); // Throw Error if we have reach the maximum number of retries if (attempt == maxRetryCount) throw; // Determine if we should retry or abort. if (!RetryLitmus(sqlException)) throw; else Thread.Sleep(ConnectionRetryWaitSeconds(attempt)); } } } } static Int32 ConnectionRetryWaitSeconds(Int32 attempt) { Int32 connectionRetryWaitSeconds = Int32.Parse(ConfigurationManager. AppSettings["ConnectionRetryWaitSeconds"]) * 1000; // Backoff Throttling connectionRetryWaitSeconds = connectionRetryWaitSeconds * (Int32)Math.Pow(2, attempt); return (connectionRetryWaitSeconds); } /// /// Determine from the exception if the execution /// of the connection should Be attempted again /// /// Generic Exception /// True if a a retry is needed, false if not static Boolean RetryLitmus(SqlException sqlException) { switch (sqlException.Number) { // The service has encountered an error // processing your request. Please try again. // Error code %d. case 40197: // The service is currently busy. Retry // the request after 10 seconds. Code: %d. case 40501: //A transport-level error has occurred when // receiving results from the server. (provider: // TCP Provider, error: 0 - An established connection // was aborted by the software in your host machine.) case 10053: return (true); } return (false); }
.Config File
<configuration> <connectionStrings> <add name="SQLAzure" connectionString="Server=tcp:youserver.database.windows.net; Database=Test;User ID=login@server;Password=yourPassword; Trusted_Connection=False;Encrypt=True;"/> connectionStrings> <appSettings> <add key="ConnectionRetrys" value="4"/> <add key="ConnectionRetryWaitSeconds" value="5"/> appSettings> configuration>
Do you have questions, concerns, comments? Post them below and we will try to address them.