Skip Navigation

 Media Analytics

Speech and vision services at enterprise scale with security, compliance, and global reach

Azure Media Analytics is a collection of speech and vision components that organizations and enterprises use to get actionable insights from their video files through machine learning technology. Media Analytics services are hosted on the Azure Media Services platform, which is the Azure media solution for encoding, encrypting, and streaming audio or video at scale, live, or on demand (VOD). Media Analytics is offered at enterprise scale and it delivers the compliance, security, and global reach that large organizations need.

What industries can use Media Analytics?

Public safety

  • Analyze evidence. Collect media from body cams, dash cams, and other devices, and analyze it to extract intelligence while observing chain of custody requirements.
  • Protect identity. Redact videos to protect people’s identity and comply with the requirements of the Freedom of Information Act.
  • Speed up investigations. Extract data from media and use it to build intelligent search indexes that can help speed up investigations.


  • Investigate crime. Process video and events collected from surveillance cameras at scale.
  • Reduce false positives. Conduct deep analysis of the video snippets associated with motion events from surveillance cameras to reduce false positives.
  • Summarize surveillance footage. Generate an intelligent summary of surveillance footage by using Hyperlapse to smooth out time-lapse videos.


  • Analyze customer calls. Use Media Indexer to convert speech to text on audio data from customer support calls and find patterns.
  • Analyze customer patterns. Correlate customer movements through a store with sales data to make decisions about product placement.

Other industries

  • Speech-to-text. Important for any business that provides customer support through a call center. Use the text extracted from customer support calls to build a search index or analyze the tone of the customer and the customer representative.
  • Optical character recognition (OCR). For any business that has video with text content in it, such as videos with PowerPoint presentations, or videos of people with name tags.
  • Face emotion recognition. For any business that has videos with customers in it. Correlate facial expressions with extracted text using Indexer to make decisions on future interactions with the customer.

Available components


  • Automatically generate standard caption files for your videos
  • Choose from a growing selection of languages
  • Extract spoken keywords to help in search and recommendation
  • Use custom vocabulary adaptation to recognize domain specific speech content

Learn more

Hyperlapse (Preview)

  • Technology built on more than 20 years of research in computational photography
  • Create smooth and stabilized time lapses from first-person videos
  • Support for different speed-up factors from 1x to 25x

Learn more

Motion detection (Preview)

  • Detect when motion has occurred in videos with stationery backgrounds
  • Eliminate false positives caused because of light changes, shadows, small insects, and other issues

Learn more

Face detection (Preview)

  • Detect faces that appear in videos
  • Track movement of faces over multiple frames
  • Analyze the output metadata that provides information about timestamps and face locations

Learn more

Face emotion detection (Preview)

  • Recognize the emotion of a person or crowd over time based on the facial expressions in the video
  • Identify emotions based on expressions that psychological research has identified as universal
  • Recognize specific emotions such as happiness, sadness, surprise, anger, contempt, fear, digest, and neutral

Learn more

Video summarization (Preview)

  • Create summaries of long videos to enable consumers to get a quick preview of the video
  • Choose to create between short previews, that are a few seconds long, or slightly longer previews which are a few minutes long
  • Choose whether fade transitions should be applied between shots in the summarized videos
  • Ideal for building a web page similar to the Bing Videos search page

Learn more

Video optical character recognition (Preview)

  • Extract typeset words from video content
  • Select your own sampling rate to balance performance and quality
  • Specify where in the video to look for captions.

Learn more

Content moderation (Preview)

  • Detect pornography, racism, profanity, violence, and other content that you want to moderate in a video
  • Save money and reduce errors by avoiding the need to hire human content moderators to screen for offensive, illicit, and inappropriate content

Learn more

Create a media solution today