Media Analytics

Speech and vision services at enterprise scale with security, compliance and global reach

Azure Media Analytics is a collection of speech and vision components that organisations and enterprises use to get actionable insights from their video files through machine learning technology. Media Analytics services are hosted on the Azure Media Services platform, which is the Azure media solution for encoding, encrypting and streaming audio or video at scale, live or on demand (VOD). Media Analytics is offered at enterprise scale and it delivers the compliance, security and global reach that large organisations need.

What industries can use Media Analytics?

Public safety

  • Analyse evidence. Collect media from body cams, dash cams and other devices and analyse it to extract intelligence while observing chain of custody requirements.
  • Protect identity. Redact videos to protect people’s identity and comply with the requirements of the Freedom of Information Act.
  • Speed up investigations. Extract data from media and use it to build intelligent search indexes that can help speed up investigations.


  • Investigate crime. Process video and events collected from surveillance cameras at scale.
  • Reduce false positives. Conduct deep analysis of the video snippets associated with motion events from surveillance cameras to reduce false positives.
  • Summarise surveillance footage. Generate an intelligent summary of surveillance footage by using Hyperlapse to smooth out time-lapse videos.


  • Analyse customer calls. Use Media Indexer to convert speech to text on audio data from customer support calls and find patterns.
  • Analyse customer patterns. Correlate customer movements through a store with sales data to make decisions about product placement.

Other industries

  • Speech-to-text. Important for any business that provides customer support through a call center. Use the text extracted from customer support calls to build a search index or analyse the tone of the customer and the customer representative.
  • Optical character recognition (OCR). For any business that has video with text content in it, such as videos with PowerPoint presentations or videos of people with name tags.
  • Face emotion recognition. For any business that has videos with customers in it. Correlate facial expressions with extracted text using Indexer to make decisions on future interactions with the customer.

Available components


  • Automatically generate standard caption files for your videos
  • Choose from a growing selection of languages
  • Extract spoken keywords to help in search and recommendation
  • Use custom vocabulary adaptation to recognise domain specific speech content

Learn more

Hyperlapse (Preview)

  • Technology built on more than 20 years of research in computational photography
  • Create smooth and stabilised time lapses from first-person videos
  • Support for different speed-up factors from 1x to 25x

Learn more

Motion detection (Preview)

  • Detect when motion has occurred in videos with stationery backgrounds
  • Eliminate false positives caused because of light changes, shadows, small insects and other issues

Learn more

Face detection (Preview)

  • Detect faces that appear in videos
  • Track movement of faces over multiple frames
  • Analyse the output metadata that provides information about timestamps and face locations

Learn more

Face emotion detection (Preview)

  • Recognise the emotion of a person or crowd over time based on the facial expressions in the video
  • Identify emotions based on expressions that psychological research has identified as universal
  • Recognise specific emotions such as happiness, sadness, surprise, anger, contempt, fear, digest and neutral

Learn more

Video summarisation (Preview)

  • Create summaries of long videos to enable consumers to get a quick preview of the video
  • Choose to create between short previews, which are a few seconds long or slightly longer previews which are a few minutes long
  • Choose whether fade transitions should be applied between shots in the summarised videos
  • Ideal for building a web page similar to the Bing Videos search page

Learn more

Video optical character recognition (Preview)

  • Extract typeset words from video content
  • Select your own sampling rate to balance performance and quality
  • Specify where in the video to look for captions.

Learn more

Content moderation (Preview)

  • Detect pornography, racism, profanity, violence and other content that you want to moderate in a video
  • Save money and reduce errors by avoiding the need to hire human content moderators to screen for offensive, illicit, and inappropriate content

Learn more

Create a media solution today