Monitoring is a critical component of a production online application. A properly implemented monitoring system would enable you to receive alerts for key events and react appropriately. In this post, I will go over some sample code that I wrote to monitor a Media Services account. You can use the sample code as a starting point to start planning your own custom monitoring system for your application built on top of Media Services.
Sample Code
At a high level, the sample code provided works on the principle of executing some code against the service at regular intervals. The results of the operations are then logged and another piece of code checks the logs for failures at a regular frequency. If the programmed failure condition is triggered, an alert is fired. Given this, the code is divided in to two Visual Studio projects. The first project calls Media Services asset create and delete APIs at a regular interval and logs the results in an Azure table. The second project checks the top 5 entries in the Azure table at a regular interval to check for failures and fires an alert if all 5 entries were logged as failures.
The App.config for the first project is as follows
Replace the
The code that performs monitoring is as follows. As mentioned earlier, the code runs in an infinite loop and every minute it calls the Media Services APIs to create an asset and then delete it. If the API calls fail it logs a failure entry in an Azure table else it logs a success entry. Description of the functions used are below the code itself.
using System; using System.Linq; using System.Configuration; using System.IO; using System.Text; using System.Threading; using System.Threading.Tasks; using System.Collections.Generic; using Microsoft.WindowsAzure; using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Blob; using Microsoft.WindowsAzure.Storage.Table; using Microsoft.WindowsAzure.MediaServices.Client; namespace Monitoring { ////// /// public class AssetLogEntity : TableEntity { public int Status { get; set; } public string FailureData { get; set; } } ////// /// class Program { // Read values from the App.config file. private static readonly string _mediaServicesAccountName = ConfigurationManager.AppSettings["MediaServicesAccountName"]; private static readonly string _mediaServicesAccountKey = ConfigurationManager.AppSettings["MediaServicesAccountKey"]; private static readonly string _storageConnectionString = ConfigurationManager.AppSettings["StorageConnectionString"]; private static CloudStorageAccount _cloudStorage = null; private static CloudTableClient _tableClient = null; private static CloudTable _monitoringTable = null; // Field for service context. private static CloudMediaContext _context = null; private static MediaServicesCredentials _cachedCredentials = null; ////// /// /// static void Main(string[] args) { try { // Create and cache the Media Services credentials in a static class variable. _cachedCredentials = new MediaServicesCredentials( _mediaServicesAccountName, _mediaServicesAccountKey); // Used the chached credentials to create CloudMediaContext. _context = new CloudMediaContext(_cachedCredentials); _cloudStorage = CloudStorageAccount.Parse(_storageConnectionString); _tableClient = _cloudStorage.CreateCloudTableClient(); _monitoringTable = _tableClient.GetTableReference("MonitoringData"); _monitoringTable.CreateIfNotExists(); Monitor(); } catch (Exception ex) { Console.WriteLine(ex.Message); } } ////// /// static void Monitor() { try { while (true) { try { Console.WriteLine("Starting Monitoring loop: " + DateTime.Now.ToString()); IAsset _asset = _context.Assets.Create("Monitoring Asset", AssetCreationOptions.None); if (_asset == null) { LogMonitoringData(1, "Create Asset returned null"); } else { _asset.Delete(); } LogMonitoringData(0); } catch (Exception x) { Console.WriteLine(x.Message); LogMonitoringData(1, x.Message); } Console.WriteLine("Going to sleep for a minute"); Console.WriteLine(""); Thread.Sleep(1000 * 60); } } catch (Exception ex) { Console.WriteLine(ex.Message); } } ////// /// static void LogMonitoringData(int status, string _failureData = "") { try { AssetLogEntity _assetLogEntity = new AssetLogEntity(); _assetLogEntity.PartitionKey = "Asset"; _assetLogEntity.RowKey = (DateTime.MaxValue.Ticks - DateTime.Now.Ticks).ToString("D12"); _assetLogEntity.Status = status; _assetLogEntity.FailureData = _failureData; TableOperation op = TableOperation.Insert(_assetLogEntity); _monitoringTable.Execute(op); } catch (Exception ex) { Console.WriteLine(ex.Message); } } } }
The code is fairly straight forward. Below is a brief description.
- AssetLogEntity – This class defines the table entities. The code above logs a Status of 0 for success and a Status of 1 for failure. The FailureData string is used to write out a failure messages when Status = 1.
- main – The main function creates the Media Services account context. It then creates the MonitoringData Azure table if it doesn’t already exist. Lastly it calls the Monitor function.
- Monitor – This is the main monitoring code. The function runs in an infinite while loop. Every minute it uses the Media Services account context to create an asset and then delete it. If the APIs succeed it logs a success else it logs a failure by calling the LogMonitoringData function.
- LogMonitoringData – This function performs the actual task of writing the data to the MonitoringData table. In order to make sure that the latest log items are at the top, it uses a rowkey that is calculated by subtracting the max DateTime value from the current DataTime. This is because Azure table enumerates items in ascending order.
The App.config for the second project is as follows
The config file is similar to the first project except that it doesn’t have the Media Services account credentials (as this project focuses only on the entries in the MonitoringData table). At a high level, this code runs in an infinite loop and every minute it checks to see if there were 5 consecutive failure entries logged. If yes, then it raises an alert. The code is below and it is followed by a brief description of the functions in the code.
using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Table; using System; using System.Collections.Generic; using System.Configuration; using System.Linq; using System.Text; using System.Threading; using System.Threading.Tasks; namespace Alerts { ////// /// public class AssetLogEntity : TableEntity { public int Status { get; set; } public string FailureData { get; set; } } class Program { private static readonly string _storageConnectionString = ConfigurationManager.AppSettings["StorageConnectionString"]; private static CloudStorageAccount _cloudStorage = null; private static CloudTableClient _tableClient = null; private static CloudTable _monitoringTable = null; ////// /// /// static void Main(string[] args) { try { _cloudStorage = CloudStorageAccount.Parse(_storageConnectionString); _tableClient = _cloudStorage.CreateCloudTableClient(); _monitoringTable = _tableClient.GetTableReference("MonitoringData"); _monitoringTable.CreateIfNotExists(); CheckforFailures(); } catch (Exception ex) { Console.WriteLine(ex.Message); } } ////// /// static void CheckforFailures() { while (true) { try { Console.WriteLine("Starting failure checking loop " + DateTime.Now.ToString()); TableQueryquery = new TableQuery ().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "Asset")); query.Take(5); TableQuerySegment tqs = _monitoringTable.ExecuteQuerySegmented(query, null); if ((tqs != null) && (tqs.Results != null)) { if (tqs.Results.Count == 5) { bool _fireAlert = true; for (int i = 0; i < tqs.Results.Count; i++) { if (tqs.Results[i].Status == 0) { _fireAlert = false; break; } } if (_fireAlert) { Console.WriteLine("More than 5 consecutive failures detected"); } } } Console.WriteLine("Going to sleep for a minute"); Console.WriteLine(""); Thread.Sleep(1000 * 60); } catch (Exception ex) { Console.WriteLine(ex.Message); } } } } }
A brief explanation of the code is as follows
- AssetLogEntity – This is the same as the first project.
- main – The main function simply creates a reference to the MonitoringData table and calls the CheckforFailures function.
- CheckforFailures – This function loops through an infinite loop. Every minute it gets the top 5 entries in the MonitoringData table and checks to see if all the 5 entries were failures. If yes, then it raises an alert by printing out a message on the screen.
In order to simulate a failure, I regenerated the keys of the Storage account associated with the Media Services account by going to the Azure Management Portal. This action resulted in the Media Services account being unable to access the Storage account for creating the containers for an asset, thus resulting in failure of the CreateAsset call in the first project. After waiting for more than 5 minutes, I went back to the Azure Management Portal and synchronized the storage keys by clicking on “synchronize primary key” button (see screenshot below).
This action resulted in the Media Services account being functional again after a few minutes. Below is a screenshot of the MonitoringData table entries. I have highlighted the entries corresponding to the failures.
Below are screenshots of the console windows associated with the two sample code projects above. Both screenshots have the failure duration outlined.
Considerations
A few closing remarks on this topic
- I did not use the storage account associated with the Media Services account to create the MonitoringData table. In fact, I used a storage account in a different datacenter. The sample above simulated one failure situation that was related to the Media Services account itself but if you want to catch failures related to networking issues or storage services issues in the datacenter where your media application is hosted, then its best that you have your monitoring data being written in another location. The same applies to where you run your monitoring and alerting code. I ran the two sample codes above from outside the datacenter where the Media Services account was located.
- Select your monitoring frequency and alert conditions carefully. A highly sensitive monitoring system will cause too many false alerts.
- A well designed monitoring system can help you automate Failover and Failback when building a high availability application that is running with multiple instances in multiple Azure datacenters.
- In the samples above, I am monitoring the Media Services account itself. You should consider using the same principles to monitor your application that is built on top of Media Services.
- Also you can use the principles from the sample above to monitor any other Azure service that your application depends on.