Auditing Media Assets Lifecycle – Part 1

Binary code

Media applications dealing with high value content are typically required to abide by MPAA, CDSA or equivalent compliance requirements. As part of the audit process associated with them, you will likely be asked to produce an audit report that demonstrates the lifecycle of Media Assets as it propagates through your applications and services. In this blog (which is divided in to multiple parts), I will cover how you can generate an audit report for your Media Assets as they flow through Media Services. Part 1 of this blog will focus on enabling you to create an asset audit report that shows when Media Assets were created and deleted.

 

Media Assets in Media Services

When you create a Media Asset, Media Services generates a GUID and uses that GUID to create a Media Asset Id. The Media Asset Id is prefixed by “nb:cid:UUID:” followed by the GUID. These ID’s are in URN format where the “nb” actually stands for Media Services codename internally (Nimbus), and the cid stands for Content ID.  In other words, the Media Asset Id takes the form of “nb:cid:UUID:<GUID>”. Media Services then creates a record for the asset and stores it internally. Media Services also creates a container named “asset-<GUID>” in the specified Storage Account. Once the Asset is created, you can upload media files in the storage container. When you delete a Media Asset, Media Services deletes the asset record from its internal database and also deletes the storage container. Given this you can use the Media Services APIs to determine the create time of an Asset as long as it has not been deleted, but there is no way to determine the delete time of an Asset unless you kept track of it in your media application.

 

Tracking creation and deletion of Media Assets via Storage logs

Since a media asset is represented as a container in Storage, you can use the Storage logs to determine creation and deletion time of Media Assets. In order to do that it is necessary that you have Storage logging enabled in your Storage account. See How to: Configure logging to learn about this. Note that the retention policy you choose will dictate your ability for how far back you will be able to go in your audit report. If you choose zero then your logs will not be deleted and you will have the ability to go as far back as the date you enabled logging. Azure Storage saves the logs in the storage account in a container called $logs. More details about how logs are stored and log naming convention can be found in the Storage Analytics Logging MSDN page.

 

Sample Code

The sample code provided below uses both the Media Services Assets Collection and the Storage logs to generate an Azure Storage Table called AssetAudit. This table can be used for generating an asset audit report showing the creation and deletion time of assets. At a high level, the logic is as follows.

  • The code enumerates through all the assets using the Media Services API.
  • For each enumerated asset, it uses the Asset.Created property to create a table entry in the AssetAudit Table.
  • It then enumerates all blobs under $logs/blob.
  • For each blob, it downloads the file and parses it based on log entry format documented on the Storage Analytics Log Format MSDN page.
  • The code looks for operations that have occurred on objects that start with “asset-“.
  • It filters out the CreateContainer and DeleteContainer operation types and creates corresponding entries in AssetAudit Table.

The App.Config file for the sample is as follows

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <startup>
    <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />
  </startup>
  <appSettings>
    <add key="MediaServicesAccountName" value="<MediaAccountName>" />
    <add key="MediaServicesAccountKey" value="<MediaAccountKey>" />
    <add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;AccountName=<StorageAccountName>;AccountKey=<StorageAccountKey>"/>
  </appSettings>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="Microsoft.WindowsAzure.Storage" publicKeyToken="31bf3856ad364e35" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.1.0.0" newVersion="4.1.0.0" />
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
</configuration>

In the above App.Config, replace <MediaAccountName> and <MediaAccountKey> with your Media Services Account Name and Key. Also replace <StorageAccountName> and <StorageAccountKey> with the name and key of the storage account associated with your Media Services account.

The code is as follows

using System;
using System.Linq;
using System.Configuration;
using System.IO;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Collections.Generic;
using Microsoft.WindowsAzure;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Storage.Table;
using Microsoft.WindowsAzure.MediaServices.Client;

namespace AssetAuditing
{
    /// <summary>
    /// 
    /// </summary>
    public class AssetAuditEntity : TableEntity
    {
        public string OperationType { get; set; }
    }

    /// <summary>
    /// 
    /// </summary>
    class Program
    {
        // Read values from the App.config file.
        private static readonly string _mediaServicesAccountName = ConfigurationManager.AppSettings["MediaServicesAccountName"];
        private static readonly string _mediaServicesAccountKey = ConfigurationManager.AppSettings["MediaServicesAccountKey"];
        private static readonly string _storageConnectionString = ConfigurationManager.AppSettings["StorageConnectionString"];
        private static string _lastLogFile = ConfigurationManager.AppSettings["LastLogFile"];

        // Field for service context.
        private static CloudMediaContext _context = null;
        private static MediaServicesCredentials _cachedCredentials = null;
        private static CloudStorageAccount _cloudStorage = null;

        private static CloudBlobClient _blobClient = null;
        private static CloudTableClient _tableClient = null;
        private static CloudTable _assetAuditTable = null;

        /// <summary>
        /// 
        /// </summary>
        /// <param name="args"></param>
        static void Main(string[] args)
        {
            try
            {
                // Create and cache the Media Services credentials in a static class variable.
                _cachedCredentials = new MediaServicesCredentials(_mediaServicesAccountName, _mediaServicesAccountKey);

                // Used the chached credentials to create CloudMediaContext.
                _context = new CloudMediaContext(_cachedCredentials);

                _cloudStorage = CloudStorageAccount.Parse(_storageConnectionString);

                _blobClient = _cloudStorage.CreateCloudBlobClient();
                _tableClient = _cloudStorage.CreateCloudTableClient();

                _assetAuditTable = _tableClient.GetTableReference("AssetAudit");
                _assetAuditTable.CreateIfNotExists();

                ProcessAssetData();
                ParseStorageLogs();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message + ex.InnerException.StackTrace);
            }
        }

        /// <summary>
        /// This function parses all the storage log files under $logs container
        /// It skips the files that have already been parsed in the last run based on the entry in app.config
        /// </summary>
        static void ParseStorageLogs()
        {
            try
            {
                // Enumerate all blobs under $logs/blob
                foreach (CloudBlockBlob _blobItem in _blobClient.ListBlobs("$logs/blob", true))
                {
                    // The blobs will be enumerated in ascending order
                    // Since the logs are organized in chronological order, you can compare the blob name with the last processed logfile to make sure you are not reprocessing it                    
                    if (String.Compare(_blobItem.Name, _lastLogFile) > 0)
                    {
                        try
                        {
                            Console.WriteLine("Processing " + _blobItem.Name);
                            string _logs = GetBlobData(_blobItem);  // Download the blob

                            // Get individual loglines by looking for the newline separator
                            List<string> _logLines = ParseDelimitedString(_logs, "\n");

                            for (int i = 0; i < _logLines.Count; i++)
                            {
                                // Separate out the log items by looking for the ; separator
                                List<string> _logLineItems = ParseDelimitedString(_logLines[i], ";");
                                if (_logLineItems.Count > 0)
                                {
                                    // Parse each log line
                                    ParseLogLine(_logLineItems);
                                }
                            }

                            // Stored the blob name as the last log file that was processed
                            _lastLogFile = _blobItem.Name;
                            SaveLastLogFileInConfig();
                        }
                        catch (Exception x)
                        {
                            Console.WriteLine(x.Message + x.InnerException.StackTrace);
                        }
                    }
                    else
                    {
                        Console.WriteLine("Skipping " + _blobItem.Name);
                    }
                }

                SaveLastLogFileInConfig();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message + ex.InnerException.StackTrace);
            }
        }

        /// <summary>
        /// This function loops through all the assets (1000 at a time) in the Media Services account and logs the Asset Create time in the AssetAudit Table
        /// </summary>
        static void ProcessAssetData()
        {
            try
            {
                int skipSize = 0;
                int batchSize = 1000;
                int currentSkipSize = 0;

                while (true)
                {
                    // Enumerate through all assets (1000 at a time)
                    foreach (IAsset asset in _context.Assets.Skip(skipSize).Take(batchSize))
                    {
                        currentSkipSize++;
                        Console.WriteLine("Processing Asset " + asset.Id);

                        // Enter the Create time of the asset in the AssetAudit table
                        InsertAssetData(asset.Id, asset.Created.ToString("o"), "Create");
                    }

                    if (currentSkipSize == batchSize)
                    {
                        skipSize += batchSize;
                        currentSkipSize = 0;
                    }
                    else
                    {
                        break;
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
        }

        /// <summary>
        /// This function saves the last log file parsed in the app config
        /// </summary>
        static void SaveLastLogFileInConfig()
        {
            var configFile = ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel.None);
            var settings = configFile.AppSettings.Settings;
            if (settings["LastLogFile"] == null)
            {
                settings.Add("LastLogFile", _lastLogFile);
            }
            else
            {
                settings["LastLogFile"].Value = _lastLogFile;
            }

            configFile.Save(ConfigurationSaveMode.Modified);
            ConfigurationManager.RefreshSection(configFile.AppSettings.SectionInformation.Name);
        }

        /// <summary>
        /// This function downloads the blob and loads the data in it as a string
        /// </summary>
        /// <param name="_blobItem"></param>
        /// <returns></returns>
        static string GetBlobData(CloudBlockBlob _blobItem)
        {
            MemoryStream ms = new MemoryStream();
            _blobItem.DownloadToStream(ms);

            byte[] buffer = new byte[ms.Length];
            ms.Seek(0, SeekOrigin.Begin);
            ms.Read(buffer, 0, (int)ms.Length);

            string _logs = Encoding.UTF8.GetString(buffer);

            ms.Dispose();

            return _logs;
        }

        /// <summary>
        /// This function parses a string and generates a list of substrings separated by the specified delimiter
        /// The function ignores the delimiter insides quotes
        /// </summary>
        /// <param name="_stringToParse"></param>
        /// <param name="strDelimiter"></param>
        /// <returns></returns>
        public static List<string> ParseDelimitedString(string _stringToParse, string strDelimiter)
        {
            List<string> _parsedStrings = new List<string>();
            if (!String.IsNullOrEmpty(_stringToParse))
            {
                int j = 0;
                int i = _stringToParse.IndexOf(strDelimiter);
                while (i >= 0)
                {
                    if (_stringToParse.Length > 0)
                    {
                        // This piece of code is to check if the substring starts with a quote
                        // If it does then the code finds the matching pair and looks the delimiter beyond that
                        if (_stringToParse[j] == '\"')
                        {
                            i = _stringToParse.IndexOf("\"", j + 1);
                            if (i > 0)
                            {
                                i = _stringToParse.IndexOf(strDelimiter, i);
                            }
                        }
                    }

                    string _str = _stringToParse.Substring(j, i - j);
                    _parsedStrings.Add(_str);

                    j = i + strDelimiter.Length;
                    i = _stringToParse.IndexOf(strDelimiter, j);
                }

                _parsedStrings.Add(_stringToParse.Substring(j, _stringToParse.Length - j));
            }

            return _parsedStrings;
        }

        /// <summary>
        /// This function parses a line of log
        /// </summary>
        /// <param name="_logLineItems"></param>
        static void ParseLogLine(List<string> _logLineItems)
        {
            try
            {
                // Check to make sure we are dealing 1.0 logs and that all the log items are parsed out properly
                if ((_logLineItems[0] == "1.0") && (_logLineItems.Count == 30))
                {
                    // Parsing out the necessary log items. We dont need all the items for this sample
                    string _requestedObjectKey = _logLineItems[12];

                    string _assetPrefix = "\"/" + _cloudStorage.Credentials.AccountName + "/asset-";
                    int _assetIdIndex = _requestedObjectKey.IndexOf(_assetPrefix);
                    if (_assetIdIndex == 0)
                    {
                        Console.WriteLine("Processing ObjectKey=" + _requestedObjectKey);

                        _assetIdIndex += _assetPrefix.Length;
                        int j = _requestedObjectKey.IndexOf("/", _assetIdIndex);
                        if (j < 0)
                        {
                            j = _requestedObjectKey.Length - 1;
                        }
                        string _assetId = _requestedObjectKey.Substring(_assetIdIndex, j - _assetIdIndex);
                        _assetId = "nb:cid:UUID:" + _assetId;

                        string _timeStamp = _logLineItems[1];
                        string _operationType = _logLineItems[2];
                        string _requestStatus = _logLineItems[3];
                        string _authType = _logLineItems[7];

                        string _requesterIpAddress = _logLineItems[15];

                        Console.WriteLine("Processing Asset Id:" + _assetId + " TimeStamp:" + _timeStamp + " OperationType:" + _operationType);

                        switch (_operationType)
                        {
                            case "CreateContainer":
                                _operationType = "Create";
                                InsertAssetData(_assetId, _timeStamp, _operationType);
                                break;

                            case "DeleteContainer":
                                _operationType = "Delete";
                                InsertAssetData(_assetId, _timeStamp, _operationType);
                                break;
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message + ex.InnerException.StackTrace);
            }
        }

        /// <summary>
        /// This function adds an entry in the AssetAudit table
        /// For Create operations, it checks if the entry already exists. This is to avoid duplicate entries as there are two sources of data
        /// The Assets collection and the Storage logs may have slightly different timestamps due to clock skew between different Azure role instances
        /// </summary>
        /// <param name="_assetId"></param>
        /// <param name="_timeStamp"></param>
        /// <param name="_operationType"></param>
        static void InsertAssetData(string _assetId, string _timeStamp, string _operationType)
        {
            try
            {
                bool _insert = true;
                if (_operationType == "Create")
                {
                    // If operationtype is Create, then check if an entry already exists for the given asset id

                    TableQuery<AssetAuditEntity> query = new TableQuery<AssetAuditEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, _assetId));
                    query.Take(1);

                    TableQuerySegment<AssetAuditEntity> tqs = _assetAuditTable.ExecuteQuerySegmented(query, null);
                    if ((tqs != null) && (tqs.Results != null))
                    {
                        if (tqs.Results.Count > 0)
                        {
                            if (tqs.Results[0].OperationType == "Create")
                            {
                                _insert = false;
                            }
                        }
                    }
                }

                if (_insert)
                {
                    AssetAuditEntity _asset = new AssetAuditEntity();
                    _asset.PartitionKey = _assetId;
                    _asset.RowKey = _timeStamp;
                    _asset.OperationType = _operationType;

                    TableOperation op = TableOperation.Insert(_asset);
                    _assetAuditTable.Execute(op);
                }
            }
            catch (Exception ex)
            {                
                Console.WriteLine(ex.Message);                
            }
        }

    }
}

A brief description of the functions in the code above is as follows

ProcessAssetData

This function loops through all the assets in the provided Media Services Account. By default, Media Services returns 1000 assets in Assets collection. The function makes use of Skip and Take to make sure that all assets are enumerated (in case you have more than 1000 assets in your account).

ParseStorageLogs

This function enumerates all the blobs under $logs/blob and saves the processed blob as the last log file processed so that they are not re-processed if the code is run repeatedly.

SaveLastLogFileInConfig

This function saves the last processed log file name in App.Config so that it can be retrieved if the program is rerun.

GetBlobData

This function downloads the blob from Storage and reads the content in to a string.

ParseDelimitedString

This function parses a string based on provided delimiter. The parsed data is returned as a string collection.

ParseLogLine

This function parses each log line to extract out the CreateContainer and DeleteContainer operations for containers that start with “asset-“.

InsertAssetData

This function adds an entry to the AssetAudit table.

 

Asset Audit Data

Once you run the code above, the AssetAudit table will be created. Below is a screenshot of the contents of this table against a test account that I used. I have highlighted a matching pair of Create and Delete for an Asset. These entries could only be captured from the Storage logs when the code above was run as Media Services had no entry for it anymore

2014-07-07_16h24_01

You can also use Excel Power Query to load the above table data in to Excel. With Excel you can do additional filtering or load the above in to a Pivot table for additional analysis. If you have never used Excel Power Query, you can download it from “Download Microsoft Power Query for Excel” web page. Once installed you can start Excel and you will see a tab called “POWER QUERY”. Click on that tab and then click on “From Other Sources” button and you will see a menu item called “From Windows Azure Table Storage” as shown in the screenshot below

2014-07-06_00h21_51

 

To import the data from AssetAudit table, select the menu item above and follow the instructions. Once the tables are loaded in the “Navigator” pane on the right hand side, you can double click on the AssetAudit table and a new window will open up. A screenshot of that window is as follows.

2014-07-07_16h35_30

Click on the button next to the column labeled “Content” and then click OK. After that you can click on “Apply & Close” button on the top. This will close the current window and load the Table data in Excel. Now you can use Excel to analyze the data in the way you see fit

 

Considerations

Finally, please note the following as you consider using this sample code for your application

  • The sample code provided in this blog is designed to work with a Media Services account that has all assets in a single storage account but it can be easily adapted to work with multiple storage accounts.
  • The audit is limited to the retention policy associated with the Storage logs.
  • When you run the sample in the debugger, the App.Config file will not be updated with last log blob that was processed. You will see that happen only when you run the sample outside the debugger.
  • The exceptions are only printed to the Console. You can write those out to an Azure Table or a local file to see the errors (if any).