Azure Media Analytics, a collection of speech and vision components that make it easier for organizations and enterprises to derive actionable insights from their video files through advanced machine learning technology. Azure Media Analytics services are hosted on the Azure Media Services platform, the Azure cloud media solution for encoding, encrypting, and streaming audio or video at scale, live, or on demand (VOD). Media Analytics is offered at enterprise scale, delivering the compliance, security, and global reach large organizations need.

What industries can use Media Analytics?

Public safety

  • Analyze evidence. Collect media from bodycams, dashcams, and other devices, and analyze it to extract intelligence while observing chain of custody requirements.
  • Protect identity. Redact videos to protect people’s identity and comply with the requirements of Freedom of Information Act.
  • Speed up investigations. Extract data from media, and use it to build intelligent search indexes that can help speed up investigations.


  • Investigate crime. Process video and events collected from surveillance cameras at scale.
  • Reduce false positives. Conduct deep analysis of the video snippets associated with motion events from surveillance cameras to reduce false positives.
  • Summarize surveillance footage. Generate an intelligent summary of surveillance footage by using Hyperlapse to smooth out time-lapsed videos.


  • Analyze customer calls. Use Media Indexer to convert speech to text on audio data from customer support calls and find patterns.
  • Analyze customer patterns. Correlate customer movements through a store with sales data to make decisions on product placement.

Other industries

  • Speech-to-text is important to any business that provides customer support through a call center. Use the text extracted from customer support calls to build a search index or analyze the tone of the customer and also the customer representative.
  • Optical character recognition (OCR) is for use by any business that has video with text content in it. For instance, videos with PowerPoint presentations, or videos of people with name tags.
  • Face emotion recognition is useful in any business that has videos with customers in it. Correlate facial expressions with extracted text using Indexer to make decisions on future interactions with the customer.

Available components


  • Automatically generate standard caption files for your videos
  • Choose from a growing selection of languages
  • Extract spoken keywords to aid in search and recommendation
  • Use custom vocabulary adaptation to recognize domain specific speech content

Hyperlapse (Preview)

  • Technology built on more than 20 years of research in computational photography
  • Create smooth and stabilized time lapses from first-person videos
  • Support for different speed-up factors from 1x to 25x

Motion Detection (Preview)

  • Detect when motion has occurred in videos with stationery backgrounds
  • Eliminate false positives caused because of light changes, shadows, small insects, and others

Face Detection (Preview)

  • Detect faces that appear in videos
  • Track movement of faces over multiple frames
  • Analyze the output metadata that provides information about timestamps and face locations

Face Emotion Detection (Preview)

  • Recognize the emotion of a person or crowd over time based on the facial expressions in the video
  • Identify emotions based on expressions that psychological research has identified as universal
  • Recognize specific emotions: happiness, sadness, surprise, anger, contempt, fear, digest, neutral

Video Summarization (Preview)

  • Create summaries of long videos to enable consumers to get a quick preview of the video
  • Choose to create between short previews that are a few seconds long or slightly longer previews which are a few minutes long
  • Choose whether fade transitions should be applied between shots in the summarized videos
  • Ideal for building a web page similar to Bing videos search page

Video Optical Character Recognition (Preview)

  • Extract typeset words from video content
  • Select your own sampling rate to balance performance and quality
  • Specify where in the video to look for captions.

Content Moderation (Preview)

  • Detect pornography, racism, profanity, violence, and other content that you want to moderate in video
  • Save money and reduce errors by avoiding the need to hire human content moderators to screen for offensive, illicit, and inappropriate content

