Note: The following blog post describes a component of Azure Media Analytics. In order to learn more and learn how to get started, please read the documentation.
It has been a few months since the last language model update to our speech-to-text product Azure Media Indexer. Instead of adding languages one-at-a-time, as we have in the past, we have been working hard overhauling the media processor from the core, and as a result, today we are announcing the first major upgrade to the Indexer since its inception.
We are proud to announce the preview for Azure Media Indexer v2.
The preview is limited to ~10 minutes of processing, but is free to all customers in public Azure datacenters.
Azure Media Indexer 2 brings two of the most sought-after features from the customers: faster indexing and broader language support.
Performance improvements
Traditionally, Azure Media Indexer has taken anywhere from 2-3x the duration of the source file to finish processing an Indexing Task. With Azure Media Indexer 2, the speed at which we can Index is doubling! Expect 1-1.5x duration of source when using Azure Media Indexer 2.
New languages
Azure Media Indexer supports speech-to-text for the English and Spanish languages. With Azure Media Indexer 2, we will have support for the following languages:
- English [en-us]
- Spanish [es-es]
- Chinese (Mandarin, Simplified) [zh-cn]
- French [fr-fr]
- German [de-de]
- Italian [it-tt]
- Portuguese [pt-br]
- Arabic (Egyptian) [ar-eg]
Additionally, with our new internal architecture for the media processor, we will be able to more rapidly incorporate language models from new languages, meaning we will be able to release languages at a much more rapid cadence than before!
Output files
Captions
The primary output files consumed by users of Azure Media Indexer were our “caption” formats. As such, we have decided to release this preview version of Azure Media Indexer 2 with only the following output formats:
- WebVTT
- TTML
- SAMI
Lattice
Experienced users of Azure Media Indexer and MAVIS will notice the lack of AIB output. This is a design decision based on our experiences over the product’s lifetime.
For background, the AIB file was a powerful binary file format which could be used in conjunction with a custom SQL Server IFilter, allowing SQL full-text search on the lattice data structure. This allowed for reducing the false-negative incidence in search experienced, but due to architectural constraints, it was impossible to use with Azure SQL, or any other full-text search engines, and instead relied on an IaaS installation of SQL Server followed by a manual installation of the custom IFilter. This was a big customer painpoint, so we decided to work towards a better approach to support lattice search scenarios.
Currently we are working on designing a new output format. Comprised of the important details from the AIB file, this new format should be compatible with all industry-standard full-text search engines while also being easier to parse and understand. Stay tuned for the release of this new data structure after our private preview of Azure Media Indexer 2.
Getting started
In order to get started, use the Media Analytics sample project from the Media Analytics introductory blog post with the following two pieces:
Item | Example |
task configuration |
{ "Version":1.0, "Features":[ { "Options":{ "Formats":["Webvtt", "Sami", "Ttml"], "Language":"EnUs", "Type":"RecoOptions" }, "Type":"SpReco" } ], }
Formats supported:
Languages supported (please use 4-character code in brackets):
|
Media Processor name | “Azure Media Indexer Preview 2” |
Not sure what Azure Media Indexer is? Learn more here.
Want to learn more about Azure Media Analytics? Check out the introductory blog post.
If you have any feedback from use of AIB files on the approach outlined above, reach out at indexer@microsoft.com or in the comments section below.