Note: The following blog post describes a component of Azure Media Analytics. In order to learn more and learn how to get started, please read the introductory blog post on Azure Media Analytics.
Video OCR is now in Public Preview. Read the follow-up blog to learn more here.
Video OCR
OCR (Optical Character Recognition) is the conversion of visual text from video into editable, searchable digital text. Video OCR detects text content in video files and generates text files for your use. This allows you to automate the extraction of meaningful metadata from the video signal of your media.
When used in conjunction with a search engine, you can easily index your media by text, and enhance the discoverability of your content. This is extremely useful in highly textual video, like a video recording or screen-capture of a slideshow presentation.The Azure OCR Media Processor is optimized for digital text.
Today we are proud to announce the Azure OCR Media Processor for a private preview as part of Azure Media Analytics.
Sample video from Oceaneering
The engineering company Oceaneering routinely surveys and maps submarine oil and gas operations. As a result, they often have large volumes of video data with “burned-in” overlaid on the video stream, such as GPS coordinates, depth, and other meaningful tags. In order to efficiently index and organize their massive library of video, Oceaneering engineers would like to extract the text overlays, and Azure Media OCR gives them the ability to process their massive video content in the cloud at enterprise scale.
Let’s take a look at a real-world sample video from Oceaneering:
Note, in particular, the text overlays on the upper-most portion of the video frame. This section comprises valuable data tagging and labeling the video and its contents. Using the Azure Media OCR Media Processor, we are able to extract this text-based information from this video file. You can download the Full XML output, or simply see the truncated results below, generated under default settings for the first frame (timestamp 0:00) of the sample video above:
Truncated output
MD-13 Mad Dog cp -148 MD-13 MD-13 Mad Dog Mad Dog cp -148 cp -148 Drilling 33 Drilling Drilling 33 33 Riser Riser Riser 8/12/2012 59 8/12/2012 8/12/2012 59 59 E 2528185. 79 N 9875573.57 E 2528185. E 2528185. 79 79 N 9875573.57 N 9875573.57 H 2.13 D 2245.90 H 2.13 H 2.13 D 2245.90 D 2245.90
With this XML file, Oceaneering is able to automate the tagging of their multimedia without investing in the rearchitecture required to pass this overlay data through their post-processing pipeline. They can easily index videos (or segments of videos) that represent views at a certain depth or in a certain geographical vicinity based on the overlaid position data on their camera feeds.
Sample enterprise video
Oceaneering, however, represents a niche scenario for Video OCR. A more common use case would be the extraction of text data from PowerPoint slides in a recorded lecture.
Check out the following clip of an Azure Media Services presentation at //Build.
From this, we were able to extract all of the text (except for the //Build logo):
Digital media landscape is always changing Huge capital investment required Delivering video is hard, expensive, time - consuming, with a need for high scale and high availability especially hard and costly as both audience sizes and content libraries grow and shrink and grow again. Delivering video is hard, expensive, time - consuming, with a need for high scale and high availability especially hard and costly as both audience sizes and content libraries grow and shrink and grow Video is the new currency Audiences of all kinds are changing and demanding content on their own devices, wherever they are.That isn 't easy: So many different device profiles and different delivery technologies. Audiences of all kinds are changing and demanding content on their own devices, wherever they are.That isn 't easy: So many different device profiles and different delivery Azure Media Services Microsoft's cloud platform enables on demand and live streaming video solutions for consumer and enterprise scenarios. Microsoft's cloud platform enables on demand and live streaming video solutions for consumer and enterprise scenarios.
This shows the power of Video OCR in processing slideshow presentations. Ideally every enterprise or educational institution should be able to index the valuable data contained in their presentation recordings, easily enabling search and discovery.
The configuration preset for this private preview version of Video OCR includes the following:
TimeInterval | Integer greater than or equal to 0.
Specifies the sampling frequency for OCR. A value of 1.5 would sample one frame every 1.5 seconds. Default is 0 (samples every frame). |
Language | One of the following strings:
Arabic, Chinese Simplified, Chinese Traditional, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish, Swedish, and Turkish. |
Getting started
To gain access to our Private Preview of OCR, sign up on the Azure website here.
Once you are admitted access to the private preview, you can submit jobs with the following configuration and Media Processor name:
Note: This blog post is out of date. For the most accurate information, check out the documentation page for Azure Media OCR or a more recent documentation page..
task configuration |
|
Media Processor name | “Azure Media OCR” |
Video OCR is now in Public Preview. Read the follow-up blog to learn more here.
To learn more about Azure Media Analytics, check out the introductory blog post.
If you have any questions about any of the Media Analytics products, send an email to amsanalytics@microsoft.com.