Media Analytics

Speech and vision services at enterprise scale, with security, compliance and global reach

Azure Media Analytics is a collection of speech and vision components that make it easier for organisations and enterprises to derive actionable insights from their video files through advanced machine-learning technology. Azure Media Analytics services are hosted on the Azure Media Services platform, the Azure cloud media solution for encoding, encrypting and streaming audio or video at scale, live or on demand (VOD). Media Analytics is offered at enterprise scale, delivering the compliance, security and global reach that large organisations need.

Which industries can use Media Analytics?

Public safety

  • Analyse evidence. Collect media from bodycams, dashcams and other devices, and analyse it to extract intelligence while observing chain of custody requirements.
  • Protect identity. Redact videos to protect people’s identity and comply with the requirements of the Freedom of Information Act.
  • Speed up investigations. Extract data from media, and use it to build intelligent search indexes that can help speed up investigations.


  • Investigate crime. Process video and events collected from surveillance cameras at scale.
  • Reduce false positives. Conduct deep analysis of the video snippets associated with motion events from surveillance cameras to reduce false positives.
  • Summarise surveillance footage. Generate an intelligent summary of surveillance footage by using Hyperlapse to smooth out time-lapsed videos.


  • Analyse customer calls. Use Media Indexer to convert speech to text on audio data from customer support calls, and find patterns.
  • Analyse customer patterns. Correlate customer movements through a shop with sales data to make decisions on product placement.

Other industries

  • Speech to-text is important to any business that provides customer support through a call centre. Use the text extracted from customer support calls to build a search index, or analyse the tone of the customer and the customer representative.
  • Optical character recognition (OCR) is for use by any business that has video with text content in it. For instance, videos with PowerPoint presentations, or videos of people with name tags.
  • Face emotion recognition is useful in any business that has videos with customers in them. Correlate facial expressions with extracted text, using Indexer to make decisions on future interactions with the customer.

Available components


  • Automatically generate standard caption files for your videos
  • Choose from a growing selection of languages
  • Extract spoken keywords to aid with search and recommendation
  • Use custom vocabulary adaptation to recognise domain-specific speech content

Learn more

Hyperlapse (Preview)

  • Technology built on more than 20 years of research in computational photography
  • Create smooth and stabilised time lapses from first-person videos
  • Support for different speed-up factors from 1x to 25x

Learn more

Motion Detection (Preview)

  • Detect when motion has occurred in videos with stationery backgrounds
  • Eliminate false positives caused due to light changes, shadows, small insects and more

Learn more

Face Detection (Preview)

  • Detect faces that appear in videos
  • Track movement of faces over multiple frames
  • Analyse the output metadata that provides information about timestamps and face locations

Learn more

Face Emotion Detection (Preview)

  • Recognise the emotion of a person or crowd over time based on the facial expressions in the video
  • Identify emotions based on expressions that psychological research has identified as universal
  • Recognise specific emotions: happiness, sadness, surprise, anger, contempt, fear, digest, neutral

Learn more

Video Summarisation (Preview)

  • Create summaries of long videos to enable consumers to get a quick preview of the video
  • Choose to create short previews that are a few seconds long or slightly longer previews which are a few minutes long
  • Choose whether fade transitions should be applied between shots in the summarised videos
  • Ideal for building a web page similar to Bing videos search page

Learn more

Video Optical Character Recognition (Preview)

  • Extract typeset words from video content
  • Select your own sampling rate to balance performance and quality
  • Specify where in the video to look for captions.

Learn more

Content Moderation (Preview)

  • Detect pornography, racism, profanity, violence and other content that you want to moderate in video
  • Save money and reduce errors by avoiding the need to hire human content moderators to screen for offensive, illicit and inappropriate content

Learn more

Create a media solution today