• 2 min read

Azure Media OCR Simplified Output

Today we are releasing our new output format, a simpler schema that will cover most end-user scenarios with less headache. In case you were one of the advanced users who found value in the additional data from our previous output format, you can simply use the “AdvancedOutput” flag.

Not sure what Azure Media OCR is? Check out the introductory blog post on Azure Media OCR.

Thanks to all of the customers and partners who have been part of the Azure Media OCR public and private previews. We have continued to iterate on valuable first-hand feedback over the past 5 months, and today are tackling one of the most common painpoints that we have heard: the complexity of the output format. 

It turned out, for most customers, we were simply providing too much detail in our default output format. This led to a lot of frustration.

Most customers utilize Azure Media OCR to index videos by the text displayed within them at various times.  In conjunction with Azure Media Indexer, this creates an excellent alternative to manual video tagging for augmenting the discoverability of your video content in a search engine. 

By providing positional data for every single word (in addition to the phrases and regions), we were needlessly inflating the output with little to no additional value. 

Today we are releasing our new output format, a simpler schema that will cover most end-user scenarios with less headache.  In case you were one of the advanced users who found value in the additional data from our previous output format, you can simply use the “AdvancedOutput” flag.

Advanced Output

The advanced output format of a JSON object is made of time fragments, each of which contained separate events comprised of regions, lines, and words, all tagged with positional and language data. 

The new “simple” output format simply contains time fragments which contain text.

Here’s an example of one fragment in the simple output format:

New output

"fragments": [
    {
      "start": 0,
      "duration": 1435434,
      "interval": 1435434,
      "events": [
        [
          {
            "language": "English",
            "text": "Notes File WOF MYM Edit Format View oo Window Help index-html - MvWebSite Q Search Visual Studio Code for Mac Developers Today June 1, 2016, 1:07 PM Visual Studio Code for Mac Developers 211. Google Chrome extensions Sergii Baidachnyi Principal Technical Evangelist Microsoft Canada sbaydach@microsoft.com @sbaidachni"
          }
        ]
      ]
    },

 

The following is a heavily-truncated equivalent “advanced output” from the same fragment.

Note: the actual advanced output that corresponds to the above fragment is over 600 lines of JSON!

Old output

"fragments": [
    {
      "start": 0,
      "duration": 120000,
      "interval": 120000,
      "events": [
        [
          {
            "region": {
              "language": "English",
              "orientation": "Up",
              "lines": [
                {
                  "text": "Notes File",
                  "left": 74,
                  "top": 7,
                  "width": 109,
                  "height": 15,
                  "word": [
                    {
                      "text": "Notes",
                      "left": 74,
                      "top": 8,
                      "width": 54,
                      "height": 14,
                      "confidence": 974
                    },
                    {
                      "text": "File",
                      "left": 154,
                      "top": 7,
                      "width": 29,
                      "height": 15,
                      "confidence": 848
                    }
                  ]
                },
                {
                  "text": "WOF",
                  "left": 155,
                  "top": 117,
                  "width": 33,
                  "height": 12,
                  "word": [
                    {
                      "text": "WOF",
                      "left": 155,
                      "top": 117,
                      "width": 33,
                      "height": 12,
                      "confidence": 397
                    }
                  ]
                },
                {
                  "text": "MYM",
                  "left": 156,
                  "top": 206,
                  "width": 32,
                  "height": 12,
                  "word": [
                    {
                      "text": "MYM",
                      "left": 156,
                      "top": 206,
                      "width": 32,
                      "height": 12,
                      "confidence": 309
                    }
                  ]
                }
              ]
            }
          },
          {
            "region": {
...

As you can see, unless you need all of the detail, a lot of the advanced output features may be redundant for your scenario.

How do I use this?

Minimal preset for Old Output

{
  'Version':'1.0', 
  'Options': {
    'AdvancedOutput':'true'
  }
} 

Minimal preset for New Output

{
  'Version':'1.0'
} 

Want to learn more about the input configuration? Check out our previous blog post introducing the configuration.

Love the new output? Hate it? Share your feedback with us!

If you want to learn more about this product, and the scenarios that it enables, read the introductory blog post on Azure Media OCR.

To learn more about Azure Media Analytics, check out the introductory blog post.

If you have any questions about any of the Media Analytics products, send an email to amsanalytics@microsoft.com.