Reacting to maintenance events... before they happen

Posted on May 3, 2017

Principal Program Manager, Azure Compute

Introducing Scheduled Events (Preview)

What if you could learn about upcoming events which may impact the availability of your VM and plan accordingly?  Well, with Azure Scheduled Events you can.

Scheduled Events is one of the subservices under Azure Metadata Service that surfaces information regarding upcoming events (for example, reboot). Scheduled events give your application sufficient time to perform preventive tasks to minimize the effect of such events. Being part of the Azure Metadata Service, scheduled events are surfaced using a REST Endpoint from within the VM. The information is available via a Non-routable IP so that it is not exposed outside the VM.

What is covered with scheduled events

While we continue to invest in increasing the scope of scheduled events, the following are already covered during the preview:

  • VM Preserving maintenance (also known as – in place VM migration). This class of maintenance operations is used to patch and update the hosting environment (hypervisor and agents) without rebooting the VM. With VM preserving maintenance, your VM freezes for up to 30 seconds without losing open files and network connections. While most modern applications are not impacted by such a short pause, some workloads (like gaming) are too sensitive and consider this as an outage. With scheduled events, your application will be able to learn of such maintenance with an event type of freeze.
  • VM Restarting maintenance. While the majority of updates have zero to little impact on virtual machines, there are cases where we do need to reboot your virtual machine. With scheduled events, your application can detect such scenarios with event type being set to Reboot or Redeploy.
  • User operations. You may not reboot your production servers manually, but you can definitely try and reboot or redeploy your test VMs to test your failover logic. In both cases, a scheduled event is surfaced with event type being set to Reboot or Redeploy.

Use cases for scheduled events

We have observed several use cases for using scheduled events:

  • Proactive failover. Instead of waiting for your application, SLB or traffic manager to sense that something went wrong, you can proactively failover to another node. In some cases, knowing that a VM will be back soon can help the application logic to start accumulate and log changes rather than failover a partition/replica.
  • Drain a node. Instead of failing running jobs, you can block the VM from accepting new jobs and let it drain those already started.
  • Log and audit. knowing that the VM was interrupted by Azure can simplify root cause analysis of detection availability issues.
  • Notify and correlate. Send notification to your admin (human) or monitoring software and correlate the schedule event with other signals.

Getting Started with scheduled events

You can query for Scheduled Events simply by making the following call from within a VNET enabled VM:

curl -H Metadata:true http://169.254.169.254/metadata/scheduledevents?api-version=2017-03-01 

A response contains an array of scheduled events. An empty array means that there are currently no events scheduled. In the case where there are scheduled events, the response contains an array of events:

{
 "DocumentIncarnation":{IncarnationID},
 "Events":[
      {
            "EventId":{eventID},
            "EventType":"Reboot" | "Redeploy" | "Freeze",
            "ResourceType":"VirtualMachine",
            "Resources":[{resourceName}],
            "EventStatus":"Scheduled" | "Started",
            "NotBefore":{timeInUTC},              
     }
 ]
}

In order to trigger and test your logic dealing with scheduled events on your VM, simply go to the Azure portal and either Restart or Redeploy your VM.

Next Steps