Microsoft Cognitive Services – General availability for Face API, Computer Vision API and Content Moderator

Udgivet den 19 april, 2017

This post was authored by the Cognitive Services Team .

Microsoft Cognitive Services enables developers to create the next generation of applications that can see, hear, speak, understand, and interpret needs using natural methods of communication. We have made adding intelligent features to your platforms easier.

Today, at the first ever Microsoft Data Amp online event, we’re excited to announce the general availability of Face API, Computer Vision API and Content Moderator API from Microsoft Cognitive Services.

  • Face API detects human faces and compares similar ones, organizes people into groups according to visual similarity, and identifies previously tagged people and their emotions in images.
  • Computer Vision API gives you the tools to understand the contents of any image. It creates tags that identify objects, beings like celebrities, or actions in an image, and crafts coherent sentences to describe it. You can now detect landmarks and handwriting in images. Handwriting detection remains in preview.
  • Content Moderator provides machine assisted moderation of text and images, augmented with human review tools. Video moderation is available in preview as part of Azure Media Services.

Let’s take a closer look at what these APIs can do for you.

Anna is presenting us the latest updates of Cognitive Services.

Bring vision to your app

Previously, users of Face API could obtain attributes such as age, gender, facial points, and headpose. Now, it’s also possible to obtain emotions in the same Face API call. This responds to some user scenarios in which both age and emotions were requested simultaneously. Learn more about Face API in our guides.

Recognizing landmarks

We’ve added more richness to Computer Vision API by integrating landmark recognition. Landmark models, as well as Celebrity Recognition, are examples of Domain Specific Models. Our landmark recognition model recognizes 9,000 natural and man-made landmarks from around the world. Domain Specific Models is a continuously evolving feature within Computer Vision API.

Let’s say I want my app to recognize this picture I took while traveling:

Landmark image

You could have an idea about where this comes from, but how could a machine easily know it?

In C#, we can leverage these capabilities by making a simple REST API call as the following. By the way, other languages are at the bottom of this post.

using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;

namespace CSHttpClientSample
{
    static class Program
    {
        static void Main()
        {
            Console.Write("Enter image file path: ");
            string imageFilePath = Console.ReadLine();

            MakeAnalysisRequest(imageFilePath);

            Console.WriteLine("\n\nHit ENTER to exit...\n");
            Console.ReadLine();
        }

        static byte[] GetImageAsByteArray(string imageFilePath)
        {
            FileStream fileStream = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read);
            BinaryReader binaryReader = new BinaryReader(fileStream);
            return binaryReader.ReadBytes((int)fileStream.Length);
        }

        static async void MakeAnalysisRequest(string imageFilePath)
        {
            var client = new HttpClient();

            // Request headers. Replace the second parameter with a valid subscription key.
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "putyourkeyhere");

            // Request parameters. You can change "landmarks" to "celebrities" on requestParameters and uri to use the Celebrities model.
            string requestParameters = "model=landmarks";
            string uri = "https://westus.api.cognitive.microsoft.com/vision/v1.0/models/landmarks/analyze?" + requestParameters;
            Console.WriteLine(uri);

            HttpResponseMessage response;

            // Request body. Try this sample with a locally stored JPEG image.
            byte[] byteData = GetImageAsByteArray(imageFilePath);

            using (var content = new ByteArrayContent(byteData))
            {
                // This example uses content type "application/octet-stream".
                // The other content types you can use are "application/json" and "multipart/form-data".
                content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
                response = await client.PostAsync(uri, content);
                string contentString = await response.Content.ReadAsStringAsync();
                Console.WriteLine("Response:\n");
                Console.WriteLine(contentString);
            }
        }
    }
}

The successful response, returned in JSON would be the following:

```json
{
  "requestId": "b15f13a4-77d9-4fab-a701-7ad65bcdcaed",
  "metadata": {
    "width": 1024,
    "height": 680,
    "format": "Jpeg"
  },
  "result": {
    "landmarks": [
      {
        "name": "Colosseum",
        "confidence": 0.9448209
      }
    ]
  }
}
```

Recognizing handwriting

Handwriting OCR is also available in preview in Computer Vision API. This feature detects text in a handwritten image and extracts the recognized characters into a machine-usable character stream.
It detects and extracts handwritten text from notes, letters, essays, whiteboards, forms, etc. It works with different surfaces and backgrounds such as white paper, sticky notes, and whiteboards. No need to transcribe those handwritten notes anymore; you can snap an image instead and use Handwriting OCR to digitize your notes, saving time, effort, and paper clutter. You can even decide to do a quick search when you want to pull the notes up again.

You can try this out yourself by uploading your sample in the interactive demonstration.

Let’s say that I want to recognize the handwriting in the whiteboard:

Whiteboard image

An inspiration quote I’d like to keep.

In C#, I would use the following:

using System;
using System.IO;
using System.Collections;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Headers;

namespace CSHttpClientSample
{
    static class Program
    {
        static void Main()
        {
            Console.Write("Enter image file path: ");
            string imageFilePath = Console.ReadLine();

            ReadHandwrittenText(imageFilePath);

            Console.WriteLine("\n\n\nHit ENTER to exit...");
            Console.ReadLine();
        }

        static byte[] GetImageAsByteArray(string imageFilePath)
        {
            FileStream fileStream = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read);
            BinaryReader binaryReader = new BinaryReader(fileStream);
            return binaryReader.ReadBytes((int)fileStream.Length);
        }

        static async void ReadHandwrittenText(string imageFilePath)
        {
            var client = new HttpClient();

            // Request headers - replace this example key with your valid subscription key.
            client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "putyourkeyhere");

            // Request parameters and URI. Set "handwriting" to false for printed text.
            string requestParameter = "handwriting=true";
            string uri = "https://westus.api.cognitive.microsoft.com/vision/v1.0/recognizeText?" + requestParameter;

            HttpResponseMessage response = null;
            IEnumerable<string> responseValues = null;
            string operationLocation = null;

            // Request body. Try this sample with a locally stored JPEG image.
            byte[] byteData = GetImageAsByteArray(imageFilePath);
            var content = new ByteArrayContent(byteData);

            // This example uses content type "application/octet-stream".
            // You can also use "application/json" and specify an image URL.
            content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");

            try {
                response = await client.PostAsync(uri, content);
                responseValues = response.Headers.GetValues("Operation-Location");
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }

            foreach (var value in responseValues)
            {
                // This value is the URI where you can get the text recognition operation result.
                operationLocation = value;
                Console.WriteLine(operationLocation);
                break;
            }

            try
            {
                // Note: The response may not be immediately available. Handwriting recognition is an
                // async operation that can take a variable amount of time depending on the length
                // of the text you want to recognize. You may need to wait or retry this operation.
                response = await client.GetAsync(operationLocation);

                // And now you can see the response in in JSON:
                Console.WriteLine(await response.Content.ReadAsStringAsync());
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }
    }
}

Upon success, the OCR results returned include text, bounding box for regions, lines, and words through the following JSON:

{
  "status": "Succeeded",
  "recognitionResult": {
    "lines": [
      {
        "boundingBox": [
          542,
          724,
          1404,
          722,
          1406,
          819,
          544,
          820
        ],
        "text": "You must be the change",
        "words": [
          {
            "boundingBox": [
              535,
              725,
              678,
              721,
              698,
              841,
              555,
              845
            ],
            "text": "You"
          },
          {
            "boundingBox": [
              713,
              720,
              886,
              715,
              906,
              835,
              734,
              840
            ],
            "text": "must"
          },
          {
            "boundingBox": [
              891,
              715,
              982,
              713,
              1002,
              833,
              911,
              835
            ],
            "text": "be"
          },
          {
            "boundingBox": [
              1002,
              712,
              1129,
              708,
              1149,
              829,
              1022,
              832
            ],
            "text": "the"
          },
          {
            "boundingBox": [
              1159,
              708,
              1427,
              700,
              1448,
              820,
              1179,
              828
            ],
            "text": "change"
          }
        ]
      },
      {
        "boundingBox": [
          667,
          905,
          1766,
          868,
          1771,
          976,
          672,
          1015
        ],
        "text": "you want to see in the world !",
        "words": [
          {
            "boundingBox": [
              665,
              901,
              758,
              899,
              768,
              1015,
              675,
              1017
            ],
            "text": "you"
          },
          {
            "boundingBox": [
              752,
              900,
              941,
              896,
              951,
              1012,
              762,
              1015
            ],
            "text": "want"
          },
          {
            "boundingBox": [
              960,
              896,
              1058,
              895,
              1068,
              1010,
              970,
              1012
            ],
            "text": "to"
          },
          {
            "boundingBox": [
              1077,
              894,
              1227,
              892,
              1237,
              1007,
              1087,
              1010
            ],
            "text": "see"
          },
          {
            "boundingBox": [
              1253,
              891,
              1338,
              890,
              1348,
              1006,
              1263,
              1007
            ],
            "text": "in"
          },
          {
            "boundingBox": [
              1344,
              890,
              1488,
              887,
              1498,
              1003,
              1354,
              1005
            ],
            "text": "the"
          },
          {
            "boundingBox": [
              1494,
              887,
              1755,
              883,
              1765,
              999,
              1504,
              1003
            ],
            "text": "world"
          },
          {
            "boundingBox": [
              1735,
              883,
              1813,
              882,
              1823,
              998,
              1745,
              999
            ],
            "text": "!"
          }
        ]
      }
    ]
  }
}

To easily get started in your preferred language, please refer to the following:

For more information about our use cases, don’t hesitate to take a look at our customer stories, including a great use of our Vision APIs with GrayMeta.

Happy coding!