Questions? Feedback? powered by Olark live chat software
Ignorar Navegação

Preview: “What is about to happen to my VM?” In-VM Notification Service

Publicado em 21 setembro, 2015

Program Manager, Azure Compute

As a service owner that runs on a public cloud, you might already asked the question, “What is about to happen to my VM?” The following post will help answer this question. This will help service owners understand what is going to happen to their VMs five minutes prior to a VM event.

One of the advantages in running virtual machines on Azure is that we allow a VM grouping in an availability set for redundancy so the service stays available during planned platform updates and even when unexpected problems occur. When Azure detects a problem with a node, it proactively moves the VM to new nodes, so its restored to a running and accessible state. Some updates do require a reboot to your virtual machines. As a service owner, you might want to be better prepared for the coming occurrence, even though we send email notifications for such events.

Some services just need to know when such event is about to happen. It will allow the services to execute several steps that can minimize and even eliminate the service interruption to its end-users. In this post we present the In-VM Metadata service. The service is based on IETF 3927 allowing a dynamic network configuration within the 169.254/16 prefix that is valid for communication with other VMs connected to the same physical node.

How should I use it?

The In-VM Metadata service allows a standard method to pull the maintenance status of that VM by executing the command:

curl http://169.254.169.254/metadata/v1/maintenance

The standard results set will include three main attributes: InstanceID, placement upgrade-domains and placement fault-domains. If an on-going maintenance activity is about to begin (within 5 minutes)  an additional maintenance event will be added.

Normal Results -

{}

Results when your VM is about to reboot -

{
  "EventID": "6f0a13a3-dc0d-4bbe-ab24-df710a3917e6",
  "EventCreationTime": "9\/15\/2015 6:42:51 AM"
}

Why should I use it?

The service is easy to use and available on any OS you choose to run. It will allow a pulling-based mechanism from the VM itself so the DevOps team operating the service can get a near-time status of their VMs. Such indications can help you mask availability issues from your end-users and increase the service availability (basic availability logging or pro-active steps for incoming reboot events).

There are two scenarios one can use In-VM Metadata:

1. System logging events: In this example, the service owners would like to track their resources availability by pulling data on regular basis and store it in EventLog (Windows) or syslog (Linux).

2. Masking reboots from end-users by tracking on-the-spot upcoming reboots and drains traffic from a VM that about to be rebooted. VMs can be excluded from its availability set based on dynamic indication pulled from the In-VM Metadata service.

Simple reboot logging

The example below shows how upcoming reboots on an Azure VM can be logged using standard logging (EventLog for Windows and syslog for Linux). IsVmInMaint.ps1 is scheduled to execute every five minutes and log an event in the EventLog in case of a VM reboot is about to happen.

$result=curl http://169.254.169.254/metadata/v1/maintenance | findstr -i EventID
if ($result) {Write-EventLog -LogName Application –Source "IsVmInMaint" -EntryType Information –EventID 1 –Message "Incoming VM reboot"}

The IsVmInMaint.sh does the same action as the former but assumed to be registered in crontab to be executed every five minutes and log upcoming reboot events using the Linux syslog.

#!/bin/bash

result=`curl http://169.254.169.254/metadata/v1/maintenance | grep -i EventID`

if [ -n $result ]; then

`logger Incoming VM Event`

fi


Masking reboots from end-users

Figure 1 below depicts item #2. We have a simple distributed application with one tier (availability set) configured to use a load balancer that maintains its stickyness based source IP (client i landed on VM1 on his first request).

All consequent calls will be diverted to VM1 until VM1 will not be available. Other clients will be served by the available VMs based load factor captured by the load balancer.

Fig1_Basic Load Balancing Scenario

 

Figure 2 shows the case where VM1 is under maintenance that might require the service to (1) proactively drain VM1 endpoint http://myinvmmetadata1.cloudapp.net/ from new client sessions (2) exclude VM1 from the available load balancer member, the current endpoints http://myinvmmetadata2.cloudapp.net/ and http://myinvmmetadata3.cloudapp.net/

Fig2_Maintenance Scenario_Serving VM is Unavailable


At that point, the VM about to be impacted can execute the command to the load-balancer traffic manager for excluding it from future traffic. Finally, the VM is back and can be added back to the available endpoint pool.


A little more details

Adding a VM to a load balancer pool - Add-AzureEndpoint

Validating an endpoints - Get-AzureEndpoint

Removing an endpoint from a load balancer pool - Remove-AzureEndpoint

How does it work?

The instance metadata server is a http server that returns data from a host agent (the node) receiving commands from the main controller component (Fig. 3). When the controller initiates a command on a node, its stored in a repository that remains valid for the duration of the activity (planned maintenance, service healing etc.).

Figure 3 describes the high level overview of today’s communication framework. The REST Server is the only place the VM can communicate. For the metadata instance server, we use the standard Link-Local addresses i.e. 169.254/16 which is aligned with RFC3927.

Fig3_In-VM Metadata Service Basic Architecture