Passer au contenu principal

 Subscribe

Note: The following blog post describes a component of Azure Media Analytics. In order to learn more and learn how to get started, please read the introductory blog post on Azure Media Analytics.

Video OCR is now in Public Preview.  Read the follow-up blog to learn more here.

Video OCR

OCR (Optical Character Recognition) is the conversion of visual text from video into editable, searchable digital text. Video OCR detects text content in video files and generates text files for your use. This allows you to automate the extraction of meaningful metadata from the video signal of your media. 

When used in conjunction with a search engine, you can easily index your media by text, and enhance the discoverability of your content. This is extremely useful in highly textual video, like a video recording or screen-capture of a slideshow presentation.The Azure OCR Media Processor is optimized for digital text.

Today we are proud to announce the Azure OCR Media Processor for a private preview as part of Azure Media Analytics.

Sample video from Oceaneering

The engineering company Oceaneering routinely surveys and maps submarine oil and gas operations. As a result, they often have large volumes of video data with “burned-in” overlaid on the video stream, such as GPS coordinates, depth, and other meaningful tags. In order to efficiently index and organize their massive library of video, Oceaneering engineers would like to extract the text overlays, and Azure Media OCR gives them the ability to process their massive video content in the cloud at enterprise scale.

Let’s take a look at a real-world sample video from Oceaneering:

Note, in particular, the text overlays on the upper-most portion of the video frame. This section comprises valuable data tagging and labeling the video and its contents. Using the Azure Media OCR Media Processor, we are able to extract this text-based information from this video file. You can download the Full XML output, or simply see the truncated results below, generated under default settings for the first frame (timestamp 0:00) of the sample video above:

Truncated output

    
      
          MD-13  Mad Dog  cp -148
        
           MD-13
          
            MD-13
          
        
        
           Mad Dog
          
            Mad
          
          
            Dog
          
        
        
           cp -148
          
            cp
          
          
            -148
          
        
      
      
          Drilling  33
        
           Drilling
          
            Drilling
          
        
        
           33
          
            33
          
        
      
      
          Riser
        
           Riser
          
            Riser
          
        
      
      
          8/12/2012  59
        
           8/12/2012
          
            8/12/2012
          
        
        
           59
          
            59
          
        
      
      
          E 2528185.  79  N 9875573.57
        
           E 2528185.
          
            E
          
          
            2528185.
          
        
        
           79
          
            79
          
        
        
           N 9875573.57
          
            N
          
          
            9875573.57
          
        
      
      
          H 2.13  D 2245.90
        
           H 2.13
          
            H
          
          
            2.13
          
        
        
           D 2245.90
          
            D
          
          
            2245.90
          
        
      
    

With this XML file, Oceaneering is able to automate the tagging of their multimedia without investing in the rearchitecture required to pass this overlay data through their post-processing pipeline. They can easily index videos (or segments of videos) that represent views at a certain depth or in a certain geographical vicinity based on the overlaid position data on their camera feeds.

Sample enterprise video

Oceaneering, however, represents a niche scenario for Video OCR. A more common use case would be the extraction of text data from PowerPoint slides in a recorded lecture.

Check out the following clip of an Azure Media Services presentation at //Build.

From this, we were able to extract all of the text (except for the //Build logo): 

Digital media landscape is always changing
Huge capital investment required Delivering video is hard, expensive, time - consuming, with a need for high scale and high availability especially hard and costly as both audience sizes and content libraries grow and shrink and grow again.
Delivering video is hard, expensive, time - consuming, with a need for
high scale and high availability especially hard and costly as both
audience sizes and content libraries grow and shrink and grow
Video is the new currency Audiences of all kinds are changing and demanding content on their own devices, wherever they are.That isn 't easy: So many different device profiles and different delivery technologies.
Audiences of all kinds are changing and demanding content
on their own devices, wherever they are.That isn 't easy: So
many different device profiles and different delivery
Azure Media Services Microsoft's cloud platform enables on demand and live streaming video solutions for consumer and enterprise scenarios.
Microsoft's cloud platform enables on
demand and live streaming video solutions
for consumer and enterprise scenarios.

This shows the power of Video OCR in processing slideshow presentations. Ideally every enterprise or educational institution should be able to index the valuable data contained in their presentation recordings, easily enabling search and discovery.

The configuration preset for this private preview version of Video OCR includes the following:

TimeInterval Integer greater than or equal to 0.

Specifies the sampling frequency for OCR.  A value of 1.5 would sample one frame every 1.5 seconds.

Default is 0 (samples every frame).

Language One of the following strings:

Arabic, Chinese Simplified, Chinese Traditional, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish, Swedish, and Turkish.

Getting started

To gain access to our Private Preview of OCR, sign up on the Azure website here.

Once you are admitted access to the private preview, you can submit jobs with the following configuration and Media Processor name:

Note: This blog post is out of date.  For the most accurate information, check out the documentation page for Azure Media OCR or a more recent documentation page..

task configuration


 
   
   
 
 
   
             
       
       
       
     

   

 

 

 


 
Media Processor name “Azure Media OCR”

 

Video OCR is now in Public Preview.  Read the follow-up blog to learn more here.

To learn more about Azure Media Analytics, check out the introductory blog post.

If you have any questions about any of the Media Analytics products, send an email to amsanalytics@microsoft.com.

  • Explore

     

    Let us know what you think of Azure and what you would like to see in the future.

     

    Provide feedback

  • Build your cloud computing and Azure skills with free courses by Microsoft Learn.

     

    Explore Azure learning


Join the conversation