The Right Way to Handle Azure OnStop Events

Editor’s note: This post comes from Rick Anderson who is a programmer / writer for the Windows Azure and ASP.NET MVC teams.

Restarts for Web Roles

An often neglected consideration in Windows Azure is how to handle restarts. It’s important to handle restarts correctly, so you don’t lose data or corrupt your persisted data, and so you can quickly shutdown, restart, and efficiently handle new requests.  Windows Azure Cloud Service applications are restarted approximately twice per month for operating system updates. (For more information on OS updates, see Role Instance Restarts Due to OS Upgrades.) When a web application is going to be shutdown, the RoleEnvironment.Stopping event is raised. The web role boilerplate created by Visual Studio does not override the OnStop method, so the application will have only a few seconds to finish processing HTTP requests before it is shut down. If your web role is busy with pending requests, some of these requests can be lost. You can delay the restarting or your web role by up to 5 minutes by overriding the OnStop method and calling Sleep, but that’s far from optimal. Once the Stopping event is raised, the Load Balance (LB) stops sending requests to the web role, so delaying the restart for longer than it takes to process pending requests leaves your virtual machine spinning in Sleep, doing no useful work. The optimal approach is to wait in the OnStop method until there are no more requests, and then initiate the shutdown. The sooner you shutdown, the sooner the VM can restart and begin processing requests. To implement the optimal shutdown strategy, add the following code to your WebRole class.

The code above checks the ASP.NET request’s current counter. As long as there are requests, the OnStop method calls Sleep to delay the shutdown. Once the current request’s counter drops to zero, OnStop returns, which initiates shutdown. Should the web server be so busy that the pending requests cannot be completed in 5 minutes, the application is shut down anyway. Remember that once the Stopping event is raised, the LB stops sending requests to the web role, so unless you had a massively under sized (or too few instances of) web role, you should never need more than a few seconds to complete the current requests.

The code above writes Trace data, but unless you perform a tricky On-Demand Transfer, the trace data from the OnStop method will never appear in WADLogsTable. Later in this blog I’ll show how you can use DebugView to see these trace events. I’ll also show how you can get tracing working in the web role OnStart method.

Optimal Restarts for Worker Roles

Handling the Stopping event in a worker role requires a different approach. Typically the worker role processes queue messages in the Run method. The strategy involves two global variables; one to notify the Run method that the Stopping event has been raised, and another global to notify the OnStop method that it’s safe to initiate shutdown. (Shutdown is initiated by returning from OnStop.) The following code demonstrates the two global approaches.

When OnStop is called, the global onStopCalled is set to true, which signals the code in the Run method to shut down at the top of the loop, when no queue event is being processed.

Viewing OnStop Trace Data

As mentioned previously, unless you perform a tricky On-Demand Transfer, the trace data from the OnStop method will never appear in WADLogsTable. We’ll use Dbgview to see these trace events. In Solution Explorer, right-click on the cloud project and select Publish.

Download your publish profile.  In the Publish Windows Azure Application dialog box, select Debug and select Enable Remote Desktop for all roles.

The compiler removes Trace calls from release builds, so you’ll need to set the build configuration to Debug to see the Trace data. Once the application is published and running, in Visual Studio, select Server Explorer (Ctl+Alt+S). Select Windows Azure Compute, and then select your cloud deployment. (In this case it’s called t6 and it’s a production deployment.) Select the web role instance, right-click, and select Connect using Remote Desktop.

Remote Desktop Connection (RDC) will use the account name you specified in the publish wizard and prompt you for the password you entered. In the left side of the taskbar, select the Server Manager icon.

In the left tab of Server Manager, select Local Server, and then select IE Enhanced Security Configuration (IE ESC). Select the off radio button in the IE ESC dialog box.

Start Internet Explorer, download and install DebugView. Start DebugView, and in the Capture menu, select Capture Global Win32.

Select the filter icon, and then enter the following exclude filter:

For this test, I added the RoleEnvironment.RequestRecycle  method to the About action method, which as the name suggests, initiates the shutdown/restart sequence. Alternatively, you can publish the application again, which will also initiate the shutdown/restart sequence.

Follow the same procedure to view the trace data in the worker role VM. Select the worker role instance, right-click and select Connect using Remote Desktop.

Follow the procedure above to disable IE Enhanced Security Configuration. Install and configure DebugView using the instructions above. I use the following filter for worker roles:

 For this sample, I published the Azure package, which causes the shutdown/restart procedure.

One last departing tip: To get tracing working in the web roles OnStart method, add the following:

If you’d like me to blog on getting trace  data from the OnStop method to appear in WADLogsTable, let me know.  Most of the information in this blog comes from Azure multi-tier tutorial Tom and I published last week. Be sure to check it out for lots of other good tips. 

–Rick
@RickAndMSFT