Advanced Encoding Features in Azure Media Encoder

Publisert på 21 august, 2014

Principal Program Manager, Azure Media Services

UPDATE: 10/23/16 - Note that this blog post refers to the older Azure Media Encoder and not the latest "Media Encoder Standard" - for documentation on the advanced presets available in "Media Encoder Standard", please refer to the following page -

Azure Media Encoder offers numerous capabilities for advanced processing of video and audio beyond what is available through the default presets.  In this week’s post, I wanted to focus on some of  the more advanced things that you can do with Azure Media Encoder that are ‘hidden” in our high level samples and portal experience.  The scenarios I chose to highlight are just a few of the top questions that I often get in email or through the forums.

Custom Presets

One feature that is not obvious to our first time users is the ability to define and submit custom presets.  Our encoder already provides a broad set of pre-defined system presets that are well documented on MSDN.  We also provide a page with guidance on which presets to use when delivering to specific devices. All of the capabilities and settings of the encoder are also fully configurable through an XML preset file format. This allows you to dial in the settings and performance that you need for your specific use case scenario.  The full schema for our XML Preset file format is documented under the Media Services Encoder Configuration Schema.  In addition, we provide you the full XML of all of our system defined presets so that you have a “best practices” starting point to make custom modifications. For example, here is a link to the full XML for H264 Adaptive Bitrate MP4 Set 720P. The preset schema has a very simple high level structure. Each XML begins with a <Presets> element that can contain one or more <Preset> definitions. This is really handy when you need to define multiple outputs or encoding settings for a single encode Task. For example, you may need to output to multiple audio profiles in the same encoding job. A good example of this is when you need to encode to both stereo AAC, and maybe Dolby DD+ format for surround sound. The most basic structure of the Preset XML looks like the following. Notice that we provide a version string on our presets as well, since we version them when we make major changes to the Encoder processor.

<?xml version="1.0" encoding="utf-16"?>
<Presets version=”5.0”>

To add new Output’s to your preset and modify the settings, you can add different XML elements under the <OutputFormat>.   For example, the following output formats are currently supported:

Under each of those output formats, you can set up the detailed elementary stream  settings for both audio and video using the following two main elements:

  A simple example of a fully defined custom preset for audio would look like the following. In this case I just modified the bitrate on the existing audio only preset to 96kbps.  For Audio you can add different codecs under AudioProfile. Currently we support AAC, Dolby, and WMV profiles. For details look at the AudioProfile element documentation

<?xml version="1.0" encoding="utf-16"?>
<!—My Custom AAC Audio Preset -->
         <AudioProfile Condition="SourceContainsAudio">
                 BufferWindow="00:00:00" />

If you are a video encoding jockey and want to have deep access to the H264 encoding settings of your job, you can dig deep into the following elements that go under your <VideoProfile> element.  Here you can tweak all kinds of encoding settings that can make your encodes go either really fast, or really super slow – so be careful and make sure you know what you are editing. The most widely used H264 settings under the VideoProfile element include, Basic, Main and High profile settings.

Under each of these you specify each of the <Streams> that you want to encode to. You will notice that in our advance multi-bitrate MP4 encoding presets that we define multiple output streams with different resolution scaling and output bitrates. Presets can range from simple audio only ones, to really complex “Super Presets” that combine multiple output Presets and MediaFiles.  For a good example of one of the most complex custom presets that we have available in our documentation, take a look at this behemoth preset file for Encoding to Multiple Audio Tracks.  This extremely complex preset actually outputs 8 Video files and 5 separate audio encodings (including Dolby DD+)  in the same job preset.   Once you have a custom preset defined, you can simply submit it in your encoding Task by setting the XML in the Configuration property. In the example code below, I read the custom preset XML into a configuration string and then use that in my Task.

string configuration = File.ReadAllText(pathToCustomXMLConfigFile));
ITask task = job.Tasks.AddNew("My Custom Encoding Task",

Advanced Thumbnail Generation

Thumbnails can easily be created during an encode by specifying the settings in your custom preset  or as a completely standalone separate task using the Thumbnail task.  Azure Media Encoder has many options to generate thumbnails from your video including percentages or specific time points and intervals. The thumbnail task generates a series of  images from the video and writes them into an Asset in your Media Services account.  You can then serve those thumbnails directly form Azure Storage, or through the Media Services streaming server along with your published streaming content. To generate thumbnails, you can submit a standalone thumbnail Task in your job and point it to a video asset in your account, or you can also chain tasks together to encode first, and then generate thumbnails after your job. When submitting your thumbnail task you have to use the Azure Media Encoder with a custom XML file that specifies all of the information needed to generate your thumbnails. The simplest form of this XML looks like the following, which shows you how to generate thumbnails with fixed width, variable height, JPEG format with a customized naming template, and at specific percentages along the input timeline.

<?xml version="1.0" encoding="utf-8"?>
<Thumbnail Size="100%,*" Type="Jpeg" Filename="{OriginalFilename}_{Size}_{ThumbnailTime}_{ThumbnailIndex}_{Date}_{Time}.{DefaultExtension}">
        <Time Value="10%" Step="10%" Stop="90%"/>

The following attributes on the Thumbnail element allow you to set the details about the output size and type of the Thumbnail

  • Size – this sets the width and height to use for creating the thumbnail.  You can use percentages, exact pixel values, or an asterisk “*” to maintain the aspect ratio of the original source.
  • Type – sets the output file format. Supported values are “JPEG”,”BMP”, “GIF” or “PNG”.

The @FileName attribute uses a set of macros that are available to you when specifying the output naming template for your thumbnails. The following macro names are supported:

  • OriginalFilename – the name of the source input file for the Thumbnail job
  • Size –  the size(Width x Height) of the thumbnail task as set in the @Size attribute
  • ThumbnailTime – the time point at which the thumbnail was extracted from the video
  • ThumbnailIndex – the absolute index number for the thumbnail
  • Date – the short date (DD-MM-YYYY) the thumbnail was extracted
  • Time – the time (HH.MM AM/PM)  that the thumbnail was extracted.
  • DefaultExtension – the extension used for the @Type of output format that was chosen.  In the case of Type=”Jpeg” this will be set to “jpg”

The <Time> element in the XML has three properties that control how the thumbnails are extracted from the source video.  The @Value attribute sets where the thumbnail extraction will start from. This is set to 10% of the duration of the video in this case. The @Step attribute controls how far to step ahead from the starting value. It will repeat this skip and grab behavior until it exceeds the Stop boundary of 95% in this example.  The advantage of using percentage based units is that you don’t need to set the exact time codes or know the full duration of the video before creating the XML file for the job.  Note that when creating your naming template in the @Filename attribute, the value of the Filename property cannot have any of the following percent-encoding-reserved characters: !*'();:@&=+$,/?%#[]". Also, there can only be one ‘.’ for the file name extension. This is what the output of the above XML would generate in terms of file names based on the template settings. My sample was really short (only 5 seconds) so I didn’t get a lot of thumbnails!

Asset Files

Another option when creating thumbnails if you know the exact time points that you wish to extract from the video you can submit a modified XML using timecode values in the form of “Hours”:”Minutes”:”Seconds”.

<?xml version="1.0" encoding="utf-8"?>
<Thumbnail Size="300,*" Type="Jpeg" Filename="{OriginalFilename}_{ThumbnailIndex}.{DefaultExtension}">
  <Time Value="0:0:0" Step="0:0:5" />

The sample above will start at the beginning of the video (0:0:0) and then grab a new thumbnail for ever Step value of 5-seconds (0:0:5) until it hits the end of the video. That same short video used above with this template would generate only a single thumbnail with an index value of “1”.

Asset FIles JPG  

For more details and example code on how to submit your customized XML settings as a Job for Media Services API see Creating a Thumbnail for Video.

Video and Audio Overlays

Visible watermarking of video in the encoder can be used to apply a visible image overlay to prevent unauthorized distribution of your content or to brand your content. For example, you may be used to seeing your local TV channel overlay their logo in the corner of a broadcast to identify the content. This is also common on airlines where you may notice the occasional watermark of the airline being displayed during a movie.  Visible watermarks don’t completely prevent the usage of the video of course, but it does discourage the usage by others and clearly identifies the owner.   For prevention of distribution or controlling access to the video, Digital Rights Management features like our upcoming PlayReady content protection services is recommended. Note: Media Services does not currently offer what is known as ‘forensic’ watermarking. That is an invisible, individualized watermark that is applied to the video stream in a non-destructive manner and can be detected at a later point in distribution to identify the person or owner that leaked the content. The Azure Media Encoder allows you to overlay an image (*.jpg, *.bmp, *.gif, *.tif, *.png), a video, or an audio track onto an existing video. Using the settings on the Overlay, you can control the duration of the overlay, fades, opacity, and location on the output video. Overlay settings are controlled in the XML for your encoding job.  The following attributes are available to control the video overlay.

  • OverlayFileName - The name of the file containing the video overlay. In the example below, I will use a single input Asset to the Task, which contains both the input video that is to be transcoded, and the file that is to be overlaid onto the input video.
  • OverlayRect  - The x & y coordinates of the upper left hand corner of the overlay rect, followed by the width and height, in pixels. These dimensions are relative to the source asset dimensions, and not the output encoding settings.
  • OverlayOpacity  - The transparency of the video overlay. Valid values: 0 – 1, where 0 is completely transparent and 1 is completely opaque
  • OverlayFadeInDuration  - How long it takes for the video overlay to fade-in. Valid values: Time in hh:mm:ss:fff format.  Make sure this fits within the timeline of the input video.
  • OverlayFadeOutDuration  - How long it takes for the video overlay to fade-out. Valid values: Time in hh:mm:ss:fff format
  • OverlayLayoutMode  -  Specifies whether the overlay should be displayed over the entire timeline or only a specific portion of the timeline. Valid values:
    1. WholeSequence – the overly is displayed during the entire video sequence
    2. Custom – the overlay is displayed during the time period specified by the OverlayStartTime and OverlayEndTime attributes
  • OverlayStartTime - The point on the video timeline when the video overlay begins, only used when OverlayLayoutMode is set to “Custom”. Valid values: Time in hh:mm:ss:fff format. Make sure this fits within the timeline of the input video.
  • OverlayEndTime - The point on the video timeline when the video overlay ends, only used when OverlayLayoutMode is set to “Custom”. Valid values: Time in hh:mm:ss:fff format. Make sure this fits within the timeline of the input video.
  • As an example, I’m going to overlay this simple Icon on top of a video file that I have in my Azure Media Services account:

Media PNG Image

Media.png – download the source here.

To do so, I first need to upload this graphic along with the source video that I want to overlay into my Media Services account.  Alternatively, I could have just uploaded this to a standalone Asset which is handy if you need to re-use on multiple jobs.

Azure Media Services Video

Azure Media Services.MP4 –  download the source file here. After uploading both files into a single Asset, I can then create a customer encode job for the Overlay. I’ll base my encode XML off of the “H264 Broadband” MP4 encoding system preset. Instead of showing all of the XML for that Preset, I’ll just show you what needs to be edited and changed to submit this Overlay job.

On the <MediaFile> element for the Preset, I simply added the following additional attributes and saved the XML file for submission. I updated the @OverlayFileName attribute to point to the “Media.png” image that I uploaded with my video.

     OverlayRect="200, 100, 40, 40"

Submitting the job with the above modifications to the Preset XML results in a video where the Media Services logo fades in at 5 seconds stays on screen for 15 seconds and then fades out at 20 seconds. The opacity is set to 90% so you can see through it, and the @OverlayRect controls the position of the image.   The @OverlayRect uses x & y coordinates of the upper left hand corner of the overlay rect, followed by the width and height, in pixels for the final image. When adjusting this, make sure that it fits within the dimensions of the source video or you will get an error message back.  In addition the times used for the Start and End time need to be within the source timeline for the video (percentages are not supported for this attribute). This is what the sample looks like when composited into the video and played back.  You can download and watch the output video file here.

8-12-2014 13-58-21

In addition to video overlays, we also support audio overlays. Audio overlays can take any supported audio file format and fade it in and out over your video. This is handy for adding in voice overs or music.  You can control the audio overlay capabilities by using the following attributes on the MediaElement in a similar fashion to the example above. The following attributes can be used to control the audio overlay:

  • AudioOverlayFileName -The name of the file that contains the audio to overlay. This can be an individual file name or the %n% zero-based index to point to one of the input assets for the job
  • AudioOverlayLoop - Specifies whether the audio overlay should loop. Valid values: True, False.
  • AudioOVerlayLoopingGap  - The amount of time between when the audio overlay finishes and starts up again. Valid values: Time in hh:mm:ss:fff format
  • AudioOverlayLayoutMode - Indicates during which part of encoded video stream the audio overlay should be played. Valid values:“WholeSequence” – the audio overlay plays throughout the entire stream, “Custom” – The audio overlay plays through the duration specified by the AudioOverlayStartTime attribute.
  • AudioOverlayStartTime - The point in the video timeline when the audio overlay begins. Valid values: Time in hh:mm:ss:fff format. Make sure this fits within the timeline of the input video.
  • AudioOverlayEndTime - The point in the video timeline when the overlay ends. Valid values: hh:mm:ss:fff. Make sure this fits within the timeline of the input video.
  • AudoOverlayGainLevel - The gain level for audio overlay. Valid values: 1 – 10 in increments of 0.1
  • AudioOverlayFadeInDuration - The length of time it takes for the audio overlay to fade in. Valid values: : hh:mm:ss:fff
  • AudioOverlayFadeOutDuration – The length of time it takes for the audio overlay to fade out. Valid values: : hh:mm:ss:fff

Audio overlays work the same as video overlays when it comes to setting the file name. You can use a single asset to point to a specific audio file inside it, or you can submit multiple assets as the source for your encoding job and set the file name with the %n% based syntax. For more details on the API and how to submit jobs for encoding with overlays see Creating Overlays.

Video Cropping

If you have a need to cut out a few pixels from your videos (scan line removal) or just extract a portion of the video for some reason, you can easily do that with a single setting in your custom preset. Videos can be cropped simply by setting the @CropRect attribute on the MediaElement

  • CropRect - Specifies a rectangle, which is used to crop the input video. Valid values: The x & y coordinates of the upper left hand corner of the cropping rect, followed by the width and height, in pixels. Note that these coordinates apply to the input video, so keep the dimensions of the source in mind when setting up this rectangle.

8-12-2014 13-20-04

For example, to crop a the top 150 x 150 pixels from a video you would use the following attribute settings.

   <MediaFile CropRect="0,0,150,150"/>

Just remember to set the size for your final encoding to the correct height and width to match your cropping, or you will end up with some oddly stretched videos (unless you intended that of course!)

Video Rotation

With the ease of use of mobile devices today, consumers can capture videos in all kinds of situations and sadly – angles.  To deal with the “Vertical Video Syndrome (VVS)” problem, you need the option to detect the orientation of a video and rotate it. Azure Media Encoder supports the ability to detect the orientation of the video, as well as re-orient it based on 90 degree increments.  When the video is encoded, it will playback properly on the screen (of course, you should try to avoid VVS at all costs!) Currently we only support rotation of MP4 source format files.  Most phones today will be generating this file format and insert the rotation metadata into a specific area inside the MP4 files that we recognize. To automatically detect and rotate your videos during encoding, you just need to enable the @Rotation attribute on the <Presets> element by setting it’s value to “Auto.  Make sure to set it on the top level <Presets> element and not on the <Preset> element.

<Presets Rotation="Auto">

By setting this value, the encoder will attempt to automatically detect the orientation of the incoming source video based on metadata that is stored in the stream by most cameras and mobile phones.  If the metadata is not detected, then the video will not be rotated. This works for Thumbnail jobs as well. For Thumbnail jobs you just apply the @Rotation attribute to the <Thumbnail> element instead.

<Thumbnail Rotation="Auto" Size="300,*" Type="Jpeg" Filename="{OriginalFilename}_{ThumbnailIndex}.{DefaultExtension}">

Sub-clipping a Video

Many times there are scenarios where you need to cut a clip or highlight out of an existing video or audio asset. To do this, Azure Media Encoder supports a sub-clipping feature. Simply add a <Clips> element and a list of clips into your custom preset, setting the start and end time for each clip that you want to extract from the video.  Multiple clips will be stitched together in the output asset. For example, the following XML snipping can be added inside the <MediaFile> element of an existing preset to extract the first 30 seconds of your video as a clip. Keep in mind that this is just a snip of XML and should be merged into one of the system or full custom presets.

			EndTime="00:04:30" />

To demonstrate how to sub clip a video, I’ll utilize this timecode clip that clearly shows what time point that you are clipping out of your video. The clip starts at 00:58:00;00 and counts up over the 1 hour mark for about 11 minutes.

8-12-2014 13-32-59

Timecode.wmv – Here is the link to download this clip.

In my custom preset, I have added the <Source>, <Clips> and a single <Clip> element and set the start and end time to 4 minutes and 00:04:30 to cut a 30 seconds clip out of the video. I expect to get back a burned-in timecode that starts now at 1:00:02:00 in the video. After submitting my encoding job to Azure Media Encoder with this customized preset, the Asset is now clipped down to 30 seconds. Download the output asset MP4 here.

Stitching Videos

Sometimes you need to stitch together two or more videos to create a longer video. This is often the case when creating marketing videos, or adding advertisements or “bumpers” to a video.  You can also use this feature to string together a set of short videos into one long final clip.  The videos to be stitched together can have the start and end times altered, and they can all come from different assets in your Media Services account to make it easy to add bumpers and trailers onto your new videos from existing assets. To stitch videos together in Azure Media Services you simply use the <Sources>  and <Source> elements inside your <MediaFile> element of your custom preset, and make use of the fact that you can provide a collection of Assets as input to the encoding Task. A simple example of stitching two clips together would look like the following. In this example, two <Source> elements are used. The first <Source> refers to the first input asset, the second is identified as the second input asset through the use of the @MediaFile attribute set to “%1%”.  This syntax tells the encoder to look for the second file in the zero-based index of input assets.


Stitching Clips

Here is a more advanced example that shows a preset that is set up to source from 2 separate input assets and stich them into a single clip.  This example XML shows the usage of the <Sources> element to define the multiple input <Source> files and also shows how to use the subclipping capabilities of the encoder to set the start and end time of each source clip with the <Clip> element. I first uploaded each file as a separate Asset.  This allows me to submit two assets as part of the Input of the job and then refer to each asset in my <Source> element’s @MediaFile attribute using the %n% convention for indexing the Input Asset array. Note that you cannot use “%0%” for the input asset in the zero index position – just leave the @MediaFile attribute out to access the first Input Asset. Again, keep in mind that the XML snipping below is just "partial" and needs to be merged in with a full custom XML preset. I recommend to start with one of the system presets from the custom preset section above. You can download the same source assets here:

XML Snipping (add this to a full preset XML)

<MediaFile>  …
            EndTime="00:00:41" />
            EndTime="00:00:10" />
            EndTime="00:02:58" />

After submitting the job, the final encoded asset will have the three sections of video clipped together into a single asset.  To view the output from this job, download the final MP4 here. For more details on using the API see Stitching Video Segments.  

Customizing Output Filenames

There are many times where you need to have fine control over the names of your output files.  Azure Media Encoder provides a set of macros and naming conventions to control the output file names.  You can provide a customized naming template on the <Preset> element by setting the @DefaultMediaOutputFileName attribute. For example, the following custom preset shows how you can use a combination of macros, and custom name settings to control the naming template.

<Preset Version="5.0" DefaultMediaOutputFileName="{Original File Name}{Video Codec}{Video Bitrate}{Audio Codec}{Audio Bitrate}.{Default Extension}">
<MediaFile …>

The encoder will insert underscores between each macro as it expands them.  For example, the configuration above would result in a file name like: MyVideo_H264_4500kpbs_AAC_128kbps.mp4. Supported macros include:

  • {Original File Name}
  • {Audio Bitrate}
  • {Audio Codec}
  • {Channel Count}
  • {Default Extension}
  • {Language}
  • {StreamId}
  • {Video Codec}
  • {Video Bitrate}

For more details on the API see Controlling Media Service Encoder Output Filenames.

Audio files with Speech

When encoding content that mostly contains speech (and little use of music of sound effects), the default encoder presets may cause background noise to be amplified in a way that results in a hissing sound in your audio tracks. To avoid this unwanted side effect, there is a little known setting in our presets that needs to be disabled.  This is the @NormalizeAudio attribute on the <MediaFile> element in the preset. You can copy any existing system preset and adjust this setting by either deleting the @NormalizeAudio attribute, or setting it to False as in the example XML snipping below (don't forget to add these example edits to a full XML preset):


For more details on how to handle audio files with speech only and submit the jobs with the API see Encoding Presentations with Mostly Speech.

Preserving Captions in MP4 files

If your source MP4 files have embedded captions inside the elementary streams in the form of EIA-608 and 708 data in the SEI messages of the H.264 encoded streams, you can preserve those captions and pass them through into the newly encoded output files. To do this, we have the @ClosedCaptionSEIPassThrough  attribute setting on the <MediaFile> element of the preset that enables the pass-through. Just set this to “True” to preserve captions through to the output files. This is very useful when you are delivering captions to HLS protocol devices like iOS that support native decoding of the CEA-608 caption data in the client.


Note that this feature currently only supports source files that are in the MP4 file format and have source elementary streams that are encoded with H.264. Further, the encodeTask should not attempt to subclip or stitch together two or more videos.  


These are a just a few of the cool advanced features that we have worked on for our customers in the Azure Media Encoder.  Hopefully I have hit on a few that you were never aware of before and that enable great scenarios for your own media applications and workflows. As always, keep the great ideas coming, keep encoding,  and send us your feature requests through the Azure Media Services forum on MSDN.