• 2 min read

Full text search of audio and video files

Today I want to show you how to perform full text search over the spoken words within your audio and video files. Read on to learn more.

Over the past few months we talked about how you can use Azure Search to perform full text search over images using OCR, Office, PDF, HTML documents and more. Today I want to expand on this and show you how to use Azure Media Services with Azure Search to perform full text search over the spoken words within your audio and video files.

Being able to search through audio and video content is useful because it helps your users find relevant content. This is especially important in cases where you do not have vast amounts of content. Let’s say you have a company that offers training and you have a set of videos users watch. Normally within Azure Search, you would index metadata about the training videos such as title, speaker and description, which would then be searchable by your users.

This is a good start, but what if the speaker in the video talks about a topic that is not included in this metadata? By indexing the spoken text, the user can be presented with results they previously would not have been able to find. This is important because nothing will turn a user away from your site faster than a search response of “0 Results Found.” In addition, by indexing the spoken words, users can find results that are most relevant to them.

Handling transcription errors

Unfortunately, audio transcription is not perfect. For example, the speaker in the video might say the word “genes” but it is interpreted as “jeans.” Luckily this is not a problem, because Azure Search supports phonetic searching, meaning you can search for a word and we will optionally return words that sound similar.

This works great for names as well. For example, my last name is Cavanagh, but people often spell it with a K and U such as Kavanaugh. Since these sound similar, even misspellings such as this can be returned in Azure Search’s results.

How does it work?

To show you how this all works, I created a sample found in the following GitHub repository. For this sample, I chose to index video recordings from the 2015 Build Conference. In this sample, I show how to:

  1. Upload audio or video files to Azure Media Services and have the service transcribe the text from the videos
  2. Upload the transcribed text to Azure Search along with metadata relating to the video
  3. Perform some searches against this Azure Search index to show how additional relevant content can be returned using this additional transcribed text

If you have any questions, please let us know in the comments below. If you would like to see this become part of our Azure Search Indexer, please cast your vote on our UserVoice page.