Full text search of audio and video files

By Keiko Harada Senior Program Manager, Azure Application Platform Container Compute

Full text search of audio and video files • 2 min read

Posted on March 22, 2016
2 min read

Over the past few months we talked about how you can use Azure Search to perform full text search over images using OCR, Office, PDF, HTML documents and more. Today I want to expand on this and show you how to use Azure Media Services with Azure Search to perform full text search over the spoken words within your audio and video files.

Being able to search through audio and video content is useful because it helps your users find relevant content. This is especially important in cases where you do not have vast amounts of content. Let’s say you have a company that offers training and you have a set of videos users watch. Normally within Azure Search, you would index metadata about the training videos such as title, speaker and description, which would then be searchable by your users.

This is a good start, but what if the speaker in the video talks about a topic that is not included in this metadata? By indexing the spoken text, the user can be presented with results they previously would not have been able to find. This is important because nothing will turn a user away from your site faster than a search response of “0 Results Found.” In addition, by indexing the spoken words, users can find results that are most relevant to them.

Handling transcription errors

Unfortunately, audio transcription is not perfect. For example, the speaker in the video might say the word “genes” but it is interpreted as “jeans.” Luckily this is not a problem, because Azure Search supports phonetic searching, meaning you can search for a word and we will optionally return words that sound similar.

This works great for names as well. For example, my last name is Cavanagh, but people often spell it with a K and U such as Kavanaugh. Since these sound similar, even misspellings such as this can be returned in Azure Search’s results.

How does it work?

To show you how this all works, I created a sample found in the following GitHub repository. For this sample, I chose to index video recordings from the 2015 Build Conference. In this sample, I show how to:

Upload audio or video files to Azure Media Services and have the service transcribe the text from the videos
Upload the transcribed text to Azure Search along with metadata relating to the video
Perform some searches against this Azure Search index to show how additional relevant content can be returned using this additional transcribed text

If you have any questions, please let us know in the comments below. If you would like to see this become part of our Azure Search Indexer, please cast your vote on our UserVoice page.

Full text search of audio and video files

Handling transcription errors

How does it work?

Explore

Related posts

Logic Apps, Flow connectors will make Automating Video Indexer simpler than ever

Get video insights in (even) more languages!

Build 2018: Video Indexer updates

Brand Detection in Microsoft Video Indexer

Join the conversation

Destacadas

IA y Machine Learning

Análisis

Compute

Contenedores

Bases de datos

DevOps

Herramientas para desarrolladores

Híbrido y multinube

Identidad

Integración

Internet de las cosas

Administración y Gobernanza

Multimedia

Migración

Realidad mixta

Movilidad

Redes

Seguridad

Almacenamiento

Web

Windows Virtual Desktop

Casos de uso

Desarrollo de aplicaciones

Inteligencia artificial

Migración y modernización en la nube

Datos y análisis

Nube e infraestructura híbridas

Internet de las cosas

Seguridad y gobernanza

Tipo de organización

Recursos

Handling transcription errors

How does it work?

Explore

Related posts

Join the conversation