Conceptual architecture: Keyword search/speech-to-text/OCR digital media

A speech-to-text solution which allows you to identify speech in static video files so that you can manage it as standard content, such as allowing employees to search within training videos for spoken words or phrases and enabling them to quickly navigate to the specific moment in the video. This solution allows users to upload static videos to an Azure Website. The Azure Media Indexer will use the Speech API to index the speech within the videos and store them in SQL Azure. The user will be able to search for words or phrases via the Azure Website and retrieve a list of results. Selecting a result will enable the user to view the specific portion of the video where the word or phrase was mentioned.

This solution is built on the following Azure managed services: CDN and Search. These services run in a high-availability environment that is patched and supported, allowing you to focus on your solution instead of the environment they run in.

Speech to Text Digital Media Solution ArchitectureTTML, WebVTTKeywordsAzure BlobStorageStreamingEndpointMulti-ProtocolDynamicPackaging/Multi-DRMWeb AppsAzure CDNSourceA/V FilesAzure MediaIndexer/OCR Media ProcessorAzure SearchAzure Media PlayerAzure Encoder(Standard orPremium)

Implementation guidance

Products Documentation

Blob Storage

Stores large amounts of unstructured data, such as text or binary data, that can be accessed from anywhere in the world via HTTP or HTTPS. You can use blob storage to expose data publicly to the world, or to store application data privately.

Azure Encoder

Encoding jobs are one of the most common processing operations in Media Services. You create encoding jobs to convert media files from one encoding to another.

Azure Streaming Endpoint

Represents a streaming service that can deliver content directly to a client player application, or to a content delivery network (CDN) for further distribution.


Provides secure, reliable content delivery with a broad global reach and rich feature set.

Azure Media Player

Uses industry standards, such as HTML5 (MSE/EME), to provide an enriched adaptive streaming experience. Regardless of the playback technology used, developers have a unified JavaScript interface to access APIs.


Delegates search-as-a-service server and infrastructure management to Microsoft, leaving you with a ready-to-use service that you can populate with your data and use to add search to your web or mobile application.

Web Apps

Hosts the website or web application.

Azure Media Indexer

Enables you to make the content of your media files searchable and to generate a full-text transcript for closed-captioning and keywords. You can process one media file or multiple media files in a batch.

Related solution architectures

Video-on-demand digital media

A basic video-on-demand solution which provides the capability for streaming recorded video content such as films, news clips, sports segments, training videos and customer support tutorials to any video-capable endpoint device, mobile application or desktop browser. Video files are uploaded to Azure Blob storage, encoded to a multi-bitrate standard format, then distributed via all major adaptive bit-rate streaming protocols (HLS, MPEG-DASH, Smooth) to the Azure Media Player client.

Learn more
Live-streaming digital media

A live streaming solution allows you to capture video in real time and broadcast it to consumers to make interviews, conferences or sporting events, for example, available online in real time. Through this solution, video is captured by a video camera and sent to a channel input endpoint. The channel receives the live input stream and makes it available for streaming through a streaming endpoint to a web browser or mobile app. The channel also provides a preview monitoring endpoint that you use to preview and validate your stream before further processing and delivery. The channel can also record and store the ingested content in order to be streamed later (video on demand).

Learn more