Yellow Dashboard?

4月 5, 2010 に投稿済み

[This article was contributed by the SQL Azure team.]

SQL Azure has been yellow on the Windows Azure Platform dashboard for a few days and I wanted to take a moment to brief you on why and what we are doing to correct it. A small number of customers are seeing intermittent connection issues and so we flipped the status on the service dashboard to reflect this. The exact status on the dashboard states:

We are experiencing intermittent authentication issues within our service which also causing intermittent authentication errors for some customers attempting to access their database. The length of each occurrence varies widely from seconds to several minutes. In some cases closing and retrying your connection may mitigate the issue. We are actively investigating this issue and will continue to do so until we have resolved the issue.

Even though the number of affected customers is low, we are focused on resolving this issue with the highest priority.

So what is going on? As some of you that have been to one of the many presentations on SQL Azure know, SQL Azure is comprised of a number of different tiers, front-end machines that handle the initial user request and back-end machines that ultimately execute that request against the database. The problem we are seeing is that intermittently, some front-end machines fail to connect to the appropriate back-end machine due to a transient authentication issue. The result is that the user receives an error message stating “Login failed for user NT AUTHORITY\ANONYMOUS LOGON”. If your application has retry logic built in, you probably didn’t even notice the problem was occurring since it is an intermittent issue.

The SQL Azure Development Team, Operations and our Data Center staff are working around the clock on this. Once we complete our root cause analysis, we will let you know our findings. I’ll post an update on our progress in the next 24 hours.

Rest assured your data is completely safe! We maintain multiple copies of your database and this issue is only related to intermittent connection failures.  I would also like to reiterate that although this is an intermittent problem affecting a small subset of the users, we are treating it with the utmost urgency and people are working around the clock to get it fixed.

What if you are one of the people seeing this? Adding retry logic to your application or simply trying to reconnect will work around this issue.

Any questions or comments, let us know.