Keyword search and speech-to-text

Azure Content Delivery Network
Azure AI Search
Azure Media Player
Azure AI Video Indexer
Azure App Service

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

This solution idea identifies speech in static video files to manage speech as standard content.

Architecture

Architecture diagram shows the flow from the source through Azure blob storage and live encoder to the streaming endpoint.

Download a Visio file of this architecture.

Dataflow

  • Azure Blob Storage stores large amounts of unstructured data that can be accessed from anywhere in the world via HTTP or HTTPS. You can use Blob Storage to expose data publicly to the world, or to store application data privately.
  • Azure Encoding converts media files from one encoding to another.
  • Azure streaming endpoint represents a streaming service that can deliver content directly to a client player application, or to a content delivery network (CDN) for further distribution.
  • Content Delivery Network provides secure, reliable content delivery with broad global reach and a rich feature set.
  • Azure Media Player uses industry standards, such as HTML5 (MSE/EME) to provide an enriched adaptive streaming experience. Regardless of the playback technology used, you have a unified JavaScript interface to access APIs.
  • Azure Cognitive Search provides a ready-to-use service that gets populated with data and then used to add search functionality to a web or mobile application.
  • Web Apps hosts the website or web application.
  • Azure Media Indexer makes the content of your media files searchable and generates a full-text transcript for closed-captioning and keywords. Media files are processed individually or in batches.

Components

  • Blob Storage is a service that's part of Azure Storage. Blob Storage offers optimized cloud object storage for large amounts of unstructured data.
  • Azure Media Services is a cloud-based platform that you can use to stream video, enhance accessibility and distribution, and analyze video content.
  • Live and on-demand streaming is a feature of Azure Media Services that delivers content to various devices at scale.
  • Azure Encoding provides a way to convert files that contain digital video or audio from one standard format to another.
  • Azure Media Player plays videos that are in various formats.
  • Azure Content Delivery Network offers a global solution for rapidly delivering content. This service provides your users with fast, reliable, and secure access to your apps' static and dynamic web content.
  • Azure Cognitive Search is a cloud search service that supplies infrastructure, APIs, and tools for searching. You can use Azure Cognitive Search to build search experiences over private, heterogeneous content in web, mobile, and enterprise applications.
  • App Service provides a framework for building, deploying, and scaling web apps. The Web Apps feature is a service for hosting web applications, REST APIs, and mobile back ends.
  • Azure Media Indexer provides a way to make content of your media files searchable. It can also generate a full-text transcript for closed captioning and keywords.

Scenario details

A speech-to-text solution provides a way to identify speech in static video files so you can manage it as standard content. For instance, employees can use this technology to search within training videos for spoken words or phrases. Then they can navigate to the specific moment in the video that contains the word or phrase.

When you use this solution, you can upload static videos to an Azure website. The Azure Media Indexer uses the Speech API to index the speech within the videos and stores it in an Azure database. You can search for words or phrases by using the Web Apps feature of Azure App Service. Then you can retrieve a list of results. When you select a result, you can see the place in the video that mentions the word or phrase.

This solution is built on the Azure managed services Content Delivery Network and Azure Cognitive Search.

Potential use cases

This solution applies to scenarios that can benefit from the ability to search recorded speech. Examples include:

  • Training and educational videos.
  • Crime investigations.
  • Customer service analysis.

Next steps