Full text search of audio and video files

By Liam Cavanagh Principal Program Manager, Azure Search

Full text search of audio and video files • 2 min read

Posted on March 22, 2016
2 min read

Over the past few months we talked about how you can use Azure Search to perform full text search over images using OCR, Office, PDF, HTML documents and more. Today I want to expand on this and show you how to use Azure Media Services with Azure Search to perform full text search over the spoken words within your audio and video files.

Being able to search through audio and video content is useful because it helps your users find relevant content. This is especially important in cases where you do not have vast amounts of content. Let’s say you have a company that offers training and you have a set of videos users watch. Normally within Azure Search, you would index metadata about the training videos such as title, speaker and description, which would then be searchable by your users.

This is a good start, but what if the speaker in the video talks about a topic that is not included in this metadata? By indexing the spoken text, the user can be presented with results they previously would not have been able to find. This is important because nothing will turn a user away from your site faster than a search response of “0 Results Found.” In addition, by indexing the spoken words, users can find results that are most relevant to them.

Handling transcription errors

Unfortunately, audio transcription is not perfect. For example, the speaker in the video might say the word “genes” but it is interpreted as “jeans.” Luckily this is not a problem, because Azure Search supports phonetic searching, meaning you can search for a word and we will optionally return words that sound similar.

This works great for names as well. For example, my last name is Cavanagh, but people often spell it with a K and U such as Kavanaugh. Since these sound similar, even misspellings such as this can be returned in Azure Search’s results.

How does it work?

To show you how this all works, I created a sample found in the following GitHub repository. For this sample, I chose to index video recordings from the 2015 Build Conference. In this sample, I show how to:

Upload audio or video files to Azure Media Services and have the service transcribe the text from the videos
Upload the transcribed text to Azure Search along with metadata relating to the video
Perform some searches against this Azure Search index to show how additional relevant content can be returned using this additional transcribed text

If you have any questions, please let us know in the comments below. If you would like to see this become part of our Azure Search Indexer, please cast your vote on our UserVoice page.

Full text search of audio and video files

Handling transcription errors

How does it work?

Explore

Related posts

Logic Apps, Flow connectors will make Automating Video Indexer simpler than ever

Get video insights in (even) more languages!

Build 2018: Video Indexer updates

Brand Detection in Microsoft Video Indexer

Join the conversation

Sélection

IA + Machine Learning

Analyse

Calcul

Conteneurs

Bases de données

DevOps

Outils de développement

Hybride + multicloud

Identité

Intégration

Internet des Objets

Gestion et gouvernance

Données multimédias

Migration

Réalité mixte

Mobile

Mise en réseau

Sécurité

Stockage

Web

Bureau virtuel Windows

Cas d'utilisation

Développement d’applications

IA

Migration et modernisation cloud

Données et analyse

Cloud hybride et infrastructure

Internet des Objets

Sécurité et gouvernance

Type d’organisation

Ressources

Handling transcription errors

How does it work?

Explore

Related posts

Join the conversation