Skip to main content
  • 3 min read

Combine the Power of Video Indexer and Computer Vision

We are pleased to introduce the ability to export high-resolution keyframes from Azure Media Service’s Video Indexer.

We are pleased to introduce the ability to export high-resolution keyframes from Azure Media Service’s Video Indexer. Whereas keyframes were previously exported in reduced resolution compared to the source video, high resolution keyframes extraction gives you original quality images and allows you to make use of the image-based artificial intelligence models provided by the Microsoft Computer Vision and Custom Vision services to gain even more insights from your video. This unlocks a wealth of pre-trained and custom model capabilities. You can use the keyframes extracted from Video Indexer, for example, to identify logos for monetization and brand safety needs, to add scene description for accessibility needs or to accurately identify very specific objects relevant for your organization, like identifying a type of car or a place.

Let’s look at some of the use cases we can enable with this new introduction.

Using keyframes to get image description automatically

You can automate the process of “captioning” different visual shots of your video through the image description model within Computer Vision, in order to make the content more accessible to people with visual impairments. This model provides multiple description suggestions along with confidence values for an image. You can take the descriptions of each high-resolution keyframe and stitch them together to create an audio description track for your video.

Image description within Computer Vision

Using Keyframes to get logo detection

While Video Indexer detects brands in speech and visual text, it does not support brands detection from logos yet. Instead, you can run your keyframes through Computer Vision’s logo-based brands detection model to detect instances of logos in your content.

This can also help you with brand safety as you now know and can control the brands showing up in your content. For example, you might not want to showcase the logo of a company directly competing with yours. Also, you can now monetize on the brands showing up in your content through sponsorship agreements or contextual ads.

Furthermore, you can cross-reference the results of this model for you keyframe with the timestamp of your keyframe to determine when exactly a logo is shown in your video and for how long. For example, if you have a sponsorship agreement with a content creator to show your logo for a certain period of time in their video, this can help determine if the terms of the agreement have been upheld.

Computer Vision’s logo detection model can detect and recognize thousands of different brands out of the box. However, if you are working with logos that are specific to your use case or otherwise might not be a part of the out of the box logos database, you can also use Custom Vision to build a custom object detector and essentially train your own database of logos by uploading and correctly labeling instances of the logos relevant to you.

Computer Vision's logo detector, detecting the Microsoft logo.

Using keyframes with other Computer Vision and Custom Vision offerings

The Computer Vision APIs provide different insights in addition to image description and logo detection, such as object detection, image categorization, and more. The possibilities are endless when you use high-resolution keyframes in conjunction with these offerings.

For example, the object detection model in Computer Vision gives bounding boxes for common out of the box objects that are already detected as part of Video Indexer today. You can use these bounding boxes to blur out certain objects that don’t meet your standards.

Object detection model

High-resolution keyframes in conjunction with Custom Vision can be leveraged to achieve many different custom use cases. For example, you can train a model to determine what type of car (or even what breed of cat) is showing in a shot. Maybe you want to identify the location or the set where a scene was filmed for editing purposes. If you have objects of interest that may be unique to your use case, use Custom Vision to build a custom classifier to tag visuals or a custom object detector to tag and provide bounding boxes for visual objects.

Try it for yourself

These are just a few of the new opportunities enabled by the availability of high-resolution keyframes in Video Indexer. Now, it is up to you to get additional insights from your video by taking the keyframes from Video Indexer and running additional image processing using any of the Vision models we have just discussed. You can start doing this by first uploading your video to Video Indexer and taking the high-resolution keyframes after the indexing job is complete and second creating an account and getting started with the Computer Vision API and Custom Vision.

Have questions or feedback? We would love to hear from you. Use our UserVoice page to help us prioritize features, leave a comment below or email for any questions.