Skip to main content

Reacting to maintenance events... before they happen

Posted on May 3, 2017

Principal Program Manager, Azure Compute

Scheduled Events now generally available 

What if you could learn about upcoming events which may impact the availability of your VM and plan accordingly? With Azure Scheduled Events you can.

Scheduled Events is one of the subservices under Azure Metadata Service that surfaces information regarding upcoming events (for example, reboot). Scheduled Events give your application sufficient time to perform preventive tasks to minimize the effect of such events. Scheduled events are surfaced using a REST Endpoint from within the VM, and the information is made available via a Non-routable IP so that it is not exposed outside the VM.

What is covered with Scheduled Events

  • VM Preserving maintenance (also known as – in place VM migration). This class of maintenance operations is used to patch and update the hosting environment (hypervisor and agents) without rebooting the VM. With VM preserving maintenance, your VM freezes for up to 30 seconds without losing open files and network connections. While most modern applications are not impacted by such a short pause, some workloads (like gaming) are too sensitive and consider this as an outage. With Scheduled Events, your application will be able to learn of such maintenance with an event type of Freeze.
  • VM Restarting maintenance. While most updates have little to no impact on virtual machines, there are cases where we do need to reboot your virtual machine. With Scheduled Events, your application can detect such scenarios with event type being set to Reboot or Redeploy.
  • User operations. You may not reboot your production servers manually, but you can try to reboot or redeploy your test VMs to test your failover logic. In both cases, a scheduled event is surfaced with event type being set to Reboot or Redeploy.

Use cases for Scheduled Events

We have observed several use cases for using Scheduled Events:

  • Proactive failover: Instead of waiting for your application, SLB or traffic manager to discover that something went wrong, you can proactively failover to another node. In some cases, knowing that a VM will be back soon can help the application logic to start accumulating and logging changes, rather than failing over a partition/replica.
  • Drain a node: Instead of failing running jobs, you can block the VM from accepting new jobs and let it drain those already started.
  • Log and audit: Knowing that the VM was interrupted by Azure can simplify the root cause analysis of availability issues.
  • Notify and correlate: Send notification to your admin (human) or monitoring software and correlate the scheduled event with other signals.

Getting Started with Scheduled Events

You can query for scheduled events simply by making the following call from within a VNET enabled VM:

curl -H Metadata:true  

A response contains an array of scheduled events. An empty array means that there are currently no events scheduled. In the case where there are scheduled events, the response contains an array of events:

            "EventType":"Reboot" | "Redeploy" | "Freeze",
            "EventStatus":"Scheduled" | "Started",

In order to trigger and test your logic dealing with scheduled events on your VM, simply go to the Azure portal and either Restart or Redeploy your VM.

Next Steps