• 7 min read

Monitoring Azure Services and External Systems with Azure Automation

In this post, I will walk through how to set up Azure Automation to monitor services in Azure or in an external system so that actions can be taken when a specific event occurs.

In this blog post, I will walk through how to set up Azure Automation to monitor services in Azure or in an external system so that actions can be taken when a specific event occurs. I have had quite a few discussions with customers on ways to accomplish this using Azure Automation, so I thought I would discuss an approach here that seems to work for most scenarios.

It is quite common to leverage Automation to monitor other systems for events that will trigger processes.  This is a popular approach, because it does not require any changes to the external system, and you can keep all of your event-reaction logic in one place.

The use of monitors introduces a need for a polling mechanism – I will describe this later in this post. The polling mechanism will leverage the scheduling and workflow capabilities of Azure Automation to minimize the time they are actually running and thus conserve resources while reducing costs with polling.

Basic Monitoring

To get started, let’s begin with a scenario that requires monitoring an Azure service. You probably have some virtual machines deployed in your Azure environment where you want to monitor the event log looking for a specific event id.  For example, if a certificate is about to expire, then you want to renew the certificate and send an email to let everyone know the change has occurred.

Your runbook might look like the diagram below for any event you are monitoring in a VM.

 

Azure Automation Monitor Runbook

 

For a certificate expiry sample, the steps might look like the following:

  1. Monitor a VM in Azure looking for an event ID of 64 (certificate is about to expire) that has occurred in the last 24 hours.
  2. If event ID found, start a new runbook job to renew the certificate based on the process for your organization
  3. Send an email once the certificate has been renewed
  4. Continue monitoring the VM for expiring certificates (Event ID 64)

Let’s say that you have a requirement to run this check once a day across all of your VMs. We can build a runbook that makes a call into a VM in a cloud service and determines if this event ID was found. If true, then it calls a child runbook to renew the certificate.

As the requirement is only to check for expiring certificates once a day, we don’t need to sleep and continually poll and instead we can schedule this easily using the scheduler capabilities within Azure Automation, using a daily schedule.

In the schedule set the start time to 11:55 each night so that it looks at all the events for the previous day and runs for about 5 minutes or until midnight is reached.

 

Azure Automation Monitor Runbook

Advanced Monitoring

Now that we have a basic monitoring solution in place, let’s look at how to make this work for more frequent polls.

Azure Automation can schedule runbooks to run every hour, so this would allow the monitoring to get granular down to an hour using the same method as covered in basic monitoring. However, let’s say you want to poll every 15 minutes — this can be accomplished by creating four separate schedules that each run on a different 15 minute mark of the hour, and then linking these to the Watch-EventID runbook.

You can see in the image below that this runbook will now run every 15 minutes by looking at the Next Run column.

Multiple schedules on runbook

 

Monitoring with State Management

The approach described above would meet most of your needs for a monitoring runbook, but it has the following two drawbacks: it requires that schedules be associated with each runbook, and it requires the state management to be maintained external to the runbook (perhaps in an Automation variable) so that duplicate events are not triggered. For example, if we think again of our cert expiry event log scenario, you wouldn’t want future jobs of the runbook that looks for cert expiry events to continually generate a new cert; just because they weren’t aware the job that executed 15 minutes earlier already regenerated the cert based on that same event.

An approach to solving these issues is to have a separately scheduled runbook that is responsible for triggering the monitor runbooks to run at the predefined intervals. Let’s call this the “Manage-MonitorRunbook”, since it kicks off our monitor runbooks that have a specific tag associated with them. This has the advantage of not requiring schedules linked to each runbook, and makes it easy to turn off monitors if required.

To address the state management drawback, we can leverage the Suspend-Workflow activity of PowerShell Workflow. Instead of a monitor runbook doing only a single poll and then completing, it would now loop continuously, but suspend itself after it has done a single poll of the system it is monitoring. Then the Manage-MonitorRunbook runbook would resume the runbook job using the Resume-AzureAutomationJob cmdlet.

 

MonitorEvents

In addition to allowing you to maintain state between polls in a single monitor runbook job, another advantage of this approach is that your runbook job is not continually running and therefore not incurring additional costs when it is not actually doing any work. This would be the case if you just used Start-Sleep to sleep, instead of Suspend-Workflow, to suspend. Suspended workflows are not billed, while sleeping workflows incur running costs, since they are still in the running state.

If we apply this approach to the above Watch-EventID runbook it would then look like the following. See how in the below workflow, I maintain state within the runbook to say the time after which it should look for new events, and each time I find an event, update that value so the next polls will not trigger for the same event.

<#
.SYNOPSIS 
    Sample runbook to search for a specific event id in an Azure VM

.DESCRIPTION
    This runbook looks for a specific event ID in an Azure VM so that an action
    could be taken when this event happens.
    It is designed to be used with the Manage-MonitorRunbook utility runbook so that
    it will get resumed on specific intervals defined by the schedules on the Manage-MonitorRunbook
    runbook. This runbook should have a tag to indicate that it should get resumed by
    that runbook.

 This runbook depends on the Connect-AzureVM utility runbook that is available from the gallery.

.PARAMETER ServiceName
    Name of the Azure Cloud Service where the VM is located

.PARAMETER VMName
    Name of the Azure VM

.PARAMETER AzureCredentialSetting
    A credential asset name containing an Org Id username / password with access to this Azure subscription.

.PARAMETER SubscriptionName
    The name of the Azure subscription

.PARAMETER EventID
    The specific event ID to search for. This sample looks for this event ID

.PARAMETER LogName
    The event log name. Example System

.PARAMETER Source
    The event log source. Example EventLog

.PARAMETER VMCredentialSetting
    A credential asset name that has access to the Azure VM 

.EXAMPLE
    Watch-EventID -ServiceName "Finance" -VMName 'FinanceWeb1' -AzureCredentialSetting 'FinanceOrgID' -SubscriptionName "Visual Studio Ultimate with MSDN" -EventID "63" -LogName "System" -Source "EventLog" -VMCredentialSetting "FinanceVMCredential"
#>
workflow Watch-EventID
{
  Param ( 
        [String] $ServiceName,
        [String] $VMName,
        [String] $AzureCredentialSetting,
        [String] $SubscriptionName,
        [String] $EventID,
        [String] $LogName,
        [String] $Source,
        [String] $VMCredentialSetting
    )

 # The start time is used to ensure we only look for events after this specific time.
 # This would be a common pattern in any monitor runbooks that are developed.
    $StartTime = Get-Date
    Try
    {
        While (1)
        {         
            $OrgIDCredential = Get-AutomationPSCredential -Name $AzureCredentialSetting
            if ($OrgIDCredential -eq $null)
            {
                throw "Could not retrieve '$AzureCredentialSetting' credential asset. Check that you created this first in the Automation service."
            }

            $Credential = Get-AutomationPSCredential -Name $VMCredentialSetting
            if ($Credential -eq $null)
            {
                throw "Could not retrieve '$VMCredential' credential asset. Check that you created this first in the Automation service."
            }     

            # Get the uri of the Azure VM to connect to by calling the Connect-AzureVM utility runbook                 
            $Uri =  Connect-AzureVM `
                -AzureOrgIdCredential $OrgIDCredential `
                -AzureSubscriptionName $SubscriptionName `
                -ServiceName $ServiceName `
                -VMName $VMName

   # Script to run on the remote VM looking for an event ID
            $ScriptBlock = {Param($EventID, $StartTime,$LogName,$Source) Get-EventLog -LogName $LogName -Source $Source -InstanceID $EventID -After $StartTime -Newest 1} 

   # Run this ScirptBlock on the remote VM
            $EventResult = InlineScript {
                 Invoke-command -ConnectionUri $Using:Uri -Credential $Using:Credential -ScriptBlock $Using:ScriptBlock -ArgumentList $Using:EventID, $Using:StartTime, $Using:LogName, $Using:Source
            }

            if ($EventResult)
            {
                # Set new start time to be after this event. This is to ensure that only new events are looked for.
                $StartTime = $EventResult.TimeGenerated

                # Take whatever action is required when this event happens...
                # You should use the Start-AzureAutomationRunbook cmdlet to trigger a new runbook asynchrously
                # so that this runbook returns immediately and this runbook can suspend itself looking for new work
                # at the next call from the Manage-MonitorRunbook runbook
                # Start-AzureAutomationRunbook -AutomationAccountName  -Name  [-Parameters ] 

                Write-Output "Event ID found... Taking action"
            }

            # Suspending workflow so Automation minutes are not used up continously
            # This workflow will be resumed by a separate monitor runbook (Manage-MonitorRunbook) on a specific schedule
            Write-Verbose "Suspending workflow..."

            # Clearing credentials since these can't be persisted with suspend currently
            $Credential = $Null
            $OrgIDCredential = $Null
            Suspend-Workflow
        }
    }
    Catch
    {  
        # This runbook should never suspend due to an error as it will
        # get resumed by the monitor runbook when it shouldn't. You should not set Erroractionpreference  =  stop for this runbook
  # as it will cause the runbook to suspend when it shouldn't for monitor runbooks.
        # Writing out an error in this case 
        Write-Error ($_)
    }

}

We then need the Manage-MonitorRunbook runbook to resume this runbook on a specific interval, so it will be able to poll on that interval.  Below is an example of what that runbook could look like. You should be able to use this runbook as is to manage your monitors.

<#
.SYNOPSIS 
    Utility runbook to control monitor runbooks to run at specific intervals

.DESCRIPTION
    This runbook is designed to run on scheduled intervals and resume any monitor runbooks
    that have a specific tag that are suspended

.PARAMETER AccountName
    Name of the Azure automation account name

.PARAMETER AzureCredentialSetting
    A credential asset name containing an Org Id username / password with access to this Azure subscription.

.PARAMETER Tag
    Value of the tag for monitor runbooks in the service that should be resumed. Only this specific tag should be set on monitor runbooks
 to avoid other runbooks from getting resumed if they are suspended. 

.PARAMETER SubscriptionName
    The name of the Azure subscription. This is an optional parameter as the default subscription will be used if not supplied.

.EXAMPLE
    Manage-MonitorRunbook -AccountName "Finance" -AzureCredentialSetting 'FinanceOrgID' -Tag "Monitor" -SubscriptionName "Visual Studio Ultimate with MSDN"

#>
workflow Manage-MonitorRunbook
{
    Param ( 
        [Parameter(Mandatory=$true)]
        [String] $AccountName,

        [Parameter(Mandatory=$true)]
        [String] $AzureCredentialSetting,

        [Parameter(Mandatory=$true)]
        [String] $Tag,

        [Parameter(Mandatory=$false)]
        [String] $SubscriptionName
    )

    $AzureCred = Get-AutomationPSCredential -Name $AzureCredentialSetting
    if ($AzureCred -eq $null)
    {
        throw "Could not retrieve '$AzureCredentialSetting' credential asset. Check that you created this first in the Automation service."
    }

    # Set the Azure subscription to use
    $Null = Add-AzureAccount -Credential $AzureCred 

    # Select the specific subscription if it was passed in, otherwise the default will be used  
    if ($SubscriptionName -ne $Null)
    {
       $Null = Select-AzureSubscription -SubscriptionName $SubscriptionName
    }

    # Get the list of runbooks that have the specified tag
    $MonitorRunbooks = Get-AzureAutomationRunbook -AutomationAccountName $AccountName | where -FilterScript {$_.Tags -match $Tag}

    foreach ($Runbook in $MonitorRunbooks)
    {
        Write-Verbose ("Checking " + $Runbook.Name + " for suspended jobs to resume")
        # Get the next suspended job if there is one for this Runbook Id
        $SuspendedJobs = Get-AzureAutomationJob -AutomationAccountName $AccountName `
      -RunbookName $Runbook.Name | Where -FilterScript {$_.Status -eq "Suspended"}

       if ($SuspendedJobs.Count -gt 1)
        {
            Write-Error ("There are multiple jobs for " + $Runbook.Name + " running. This shouldn't happen for monitor runbooks")
            # Select the oldest job and resume that one
            $SuspendedJobs = $SuspendedJobs | Sort-Object -Property CreationTime  | Select-Object -First 1
        }

        if ($SuspendedJobs)
        {    
   Write-Verbose ("Resuming the next suspended job: " + $SuspendedJobs.Id)
            Resume-AzureAutomationJob -AutomationAccountName $AccountName -Id $SuspendedJobs.Id   
        }
    }
}

The above runbook looks for any runbooks that have a specific tag set indicating that they are monitors, and then it resumes those runbooks so they can perform their checks and then suspend again. You can see in the below image that in my environment this runbook is scheduled to run every 30 minutes.

 

30MinSchedule

 

If you look at the runbook parameter values for the schedule the monitor manager runbook is connected to, you can see that a value of “monitor” has been set on the Tag parameter so that any runbooks that have this tag and are suspended will get resumed. In this scenario, you would add the “monitor” tag to the Watch-EventID runbook, on the configuration page, so that it will be called by the monitor manager runbook every thirty minutes.

 

ScheduleValues

 

You could extend this solution to handle monitors of various time intervals by configuring the Manage-MonitorRunbook runbook on different schedules, and then tagging your runbooks with the appropriate time-based interval tag. For example, you could schedule the runbook to run every 15 minutes and start runbooks with the “monitor15” tag, or schedule it for every 5 minutes and for the tag parameter use the “monitor5” tag.

The above runbooks are available in the Automation runbook gallery so that you can build your own monitor runbooks using the monitor manager, and start integrating into different systems to combine your operational tasks together in an automated way.

Visit the Automation page to learn more about Automation and how to get started.