Computer Vision API

Extract rich information from images to categorise and process visual data – and machine-assisted moderation of images to help curate your services.

Analyse an image

This feature returns information about visual content found in an image. Use tagging, descriptions and domain-specific models to identify content and label it with confidence. Apply the adult/racy settings to enable automated restriction of adult content. Identify image types and color schemes in pictures.

See it in action

Gender Male
Age 36
Feature Name: Value
Description { "tags": [ "water", "swimming", "sport", "pool", "person", "man", "frisbee", "ocean", "blue", "bird", "riding", "top", "standing", "wave", "young", "body", "large", "game", "glass", "pond", "playing", "board", "catch", "clear", "boat", "white" ], "captions": [ { "text": "a man swimming in a pool of water", "confidence": 0.8909298 } ] }
Tags [ { "name": "water", "confidence": 0.9997857 }, { "name": "swimming", "confidence": 0.955619633 }, { "name": "sport", "confidence": 0.953807831 }, { "name": "pool", "confidence": 0.9515978 }, { "name": "person", "confidence": 0.889862537 }, { "name": "water sport", "confidence": 0.664259 } ]
Image format "Jpeg"
Image dimensions 462 x 600
Clip art type 0
Line drawing type 0
Black and white false
Adult content false
Adult score 0.07518345
Racy false
Racy score 0.1814024
Categories [ { "name": "people_swimming", "score": 0.98046875 } ]
Faces [ { "age": 36, "gender": "Male", "faceRectangle": { "top": 133, "left": 298, "width": 121, "height": 121 } } ]
Dominant color background
"White"
Dominant color foreground
"Grey"
Accent Color
#19A4B2

Want to build this?

Read text in images

Optical character recognition (OCR) detects text in an image and extract the recognised words into a machine-readable character stream. Analyse images to detect embedded text, generate character streams and enable searching. Take photos of text instead of copying to save time and effort.

See it in action

  1. Preview
  2. JSON

IF WE DID

ALL

THE THINGS

WE ARE

CAPABLÉ•

OF DOING,

WE WOULD

LITERALLY

ASTOUND

QURSELV*S.

{
  "textAngle": 0.0,
  "orientation": "NotDetected",
  "language": "en",
  "regions": [
    {
      "boundingBox": "316,47,284,340",
      "lines": [
        {
          "boundingBox": "319,47,182,24",
          "words": [
            {
              "boundingBox": "319,47,42,24",
              "text": "IF"
            },
            {
              "boundingBox": "375,47,44,24",
              "text": "WE"
            },
            {
              "boundingBox": "435,47,66,23",
              "text": "DID"
            }
          ]
        },
        {
          "boundingBox": "316,74,204,69",
          "words": [
            {
              "boundingBox": "316,74,204,69",
              "text": "ALL"
            }
          ]
        },
        {
          "boundingBox": "318,147,207,24",
          "words": [
            {
              "boundingBox": "318,147,63,24",
              "text": "THE"
            },
            {
              "boundingBox": "397,147,128,24",
              "text": "THINGS"
            }
          ]
        },
        {
          "boundingBox": "316,176,125,23",
          "words": [
            {
              "boundingBox": "316,176,44,23",
              "text": "WE"
            },
            {
              "boundingBox": "375,176,66,23",
              "text": "ARE"
            }
          ]
        },
        {
          "boundingBox": "319,194,281,44",
          "words": [
            {
              "boundingBox": "319,194,281,44",
              "text": "CAPABLÉ•"
            }
          ]
        },
        {
          "boundingBox": "318,243,181,29",
          "words": [
            {
              "boundingBox": "318,243,43,23",
              "text": "OF"
            },
            {
              "boundingBox": "376,243,123,29",
              "text": "DOING,"
            }
          ]
        },
        {
          "boundingBox": "316,271,170,24",
          "words": [
            {
              "boundingBox": "316,272,44,23",
              "text": "WE"
            },
            {
              "boundingBox": "375,271,111,24",
              "text": "WOULD"
            }
          ]
        },
        {
          "boundingBox": "317,300,200,24",
          "words": [
            {
              "boundingBox": "317,300,200,24",
              "text": "LITERALLY"
            }
          ]
        },
        {
          "boundingBox": "316,328,157,24",
          "words": [
            {
              "boundingBox": "316,328,157,24",
              "text": "ASTOUND"
            }
          ]
        },
        {
          "boundingBox": "318,357,214,30",
          "words": [
            {
              "boundingBox": "318,357,214,30",
              "text": "QURSELV*S."
            }
          ]
        }
      ]
    }
  ]
}

By uploading data for this demo, you agree that Microsoft may store it and use it to improve Microsoft services, including this API. To help protect your privacy, we take steps to de-identify your data and keep it secure. We shall not publish your data or let other people use it.

Want to build this?

Preview: Read handwritten text from images

This technology (handwritten OCR) allows you to detect and extract handwritten text from notes, letters, essays, whiteboards, forms etc. It works with different surfaces and backgrounds, such as white paper, yellow sticky notes and whiteboards.

Handwritten text recognition saves time and effort and can make you more productive by allowing you to take images of text, rather than having to transcribe it. It makes it possible to digitise notes, which then allows you to implement quick and easy search. It also reduces paper clutter.

Note: this technology is currently in preview and is only available for English text.

To try this optical character recognition demo, upload a locally stored image or provide an image URL. We do not store the images you supply for this demo unless you give us permission.

See it in action

  1. Preview
  2. JSON

OUR greatest glory is not

i never failing ,

but in rising every

time we fall

{
  "status": "Succeeded",
  "succeeded": true,
  "failed": false,
  "finished": true,
  "recognitionResult": {
    "lines": [
      {
        "boundingBox": [
          67,
          204,
          668,
          210,
          667,
          272,
          66,
          267
        ],
        "text": "OUR greatest glory is not",
        "words": [
          {
            "boundingBox": [
              69,
              206,
              159,
              205,
              155,
              274,
              65,
              275
            ],
            "text": "OUR"
          },
          {
            "boundingBox": [
              192,
              205,
              350,
              204,
              346,
              273,
              188,
              274
            ],
            "text": "greatest"
          },
          {
            "boundingBox": [
              393,
              204,
              509,
              203,
              505,
              272,
              389,
              273
            ],
            "text": "glory"
          },
          {
            "boundingBox": [
              539,
              203,
              588,
              203,
              584,
              272,
              534,
              272
            ],
            "text": "is"
          },
          {
            "boundingBox": [
              601,
              202,
              680,
              202,
              676,
              271,
              597,
              271
            ],
            "text": "not"
          }
        ]
      },
      {
        "boundingBox": [
          540,
          289,
          900,
          302,
          897,
          374,
          538,
          360
        ],
        "text": "i never failing ,",
        "words": [
          {
            "boundingBox": [
              534,
              300,
              558,
              300,
              568,
              376,
              545,
              376
            ],
            "text": "i"
          },
          {
            "boundingBox": [
              589,
              300,
              694,
              300,
              705,
              376,
              600,
              376
            ],
            "text": "never"
          },
          {
            "boundingBox": [
              720,
              300,
              874,
              300,
              885,
              376,
              731,
              376
            ],
            "text": "failing"
          },
          {
            "boundingBox": [
              877,
              300,
              905,
              300,
              916,
              376,
              888,
              376
            ],
            "text": ","
          }
        ]
      },
      {
        "boundingBox": [
          139,
          416,
          572,
          433,
          570,
          491,
          136,
          474
        ],
        "text": "but in rising every",
        "words": [
          {
            "boundingBox": [
              145,
              418,
              215,
              418,
              202,
              491,
              132,
              491
            ],
            "text": "but"
          },
          {
            "boundingBox": [
              227,
              418,
              275,
              418,
              262,
              491,
              214,
              491
            ],
            "text": "in"
          },
          {
            "boundingBox": [
              308,
              418,
              428,
              419,
              415,
              492,
              295,
              491
            ],
            "text": "rising"
          },
          {
            "boundingBox": [
              476,
              419,
              581,
              419,
              568,
              492,
              463,
              492
            ],
            "text": "every"
          }
        ]
      },
      {
        "boundingBox": [
          622,
          413,
          967,
          410,
          968,
          470,
          623,
          472
        ],
        "text": "time we fall",
        "words": [
          {
            "boundingBox": [
              627,
              408,
              722,
              409,
              713,
              470,
              618,
              468
            ],
            "text": "time"
          },
          {
            "boundingBox": [
              765,
              409,
              828,
              410,
              818,
              471,
              756,
              470
            ],
            "text": "we"
          },
          {
            "boundingBox": [
              873,
              410,
              976,
              412,
              967,
              472,
              864,
              471
            ],
            "text": "fall"
          }
        ]
      }
    ]
  }
}

Want to build this?

Recognise celebrities and landmarks

The Celebrity and Landmark Models are examples of Domain Specific Models. Our celebrity recognition model recognises 200K celebrities from business, politics, sports and entertainment. Our landmark recognition model recognises 9000 natural and man-made landmarks from around the world. Domain Specific Models is a continuously evolving feature within Computer Vision API.

See it in action

{
  "categories": [
    {
      "name": "people_",
      "score": 0.86328125,
      "detail": {
        "celebrities": [
          {
            "name": "Satya Nadella",
            "faceRectangle": {
              "left": 239,
              "top": 293,
              "width": 138,
              "height": 138
            },
            "confidence": 0.9999974
          }
        ],
        "landmarks": null
      }
    }
  ],
  "adult": null,
  "tags": [
    {
      "name": "person",
      "confidence": 0.99956613779067993
    },
    {
      "name": "suit",
      "confidence": 0.98934584856033325
    },
    {
      "name": "man",
      "confidence": 0.98844343423843384
    },
    {
      "name": "outdoor",
      "confidence": 0.860062301158905
    }
  ],
  "description": {
    "tags": [
      "person",
      "suit",
      "man",
      "necktie",
      "outdoor",
      "building",
      "clothing",
      "standing",
      "wearing",
      "business",
      "looking",
      "holding",
      "black",
      "front",
      "hand",
      "dressed",
      "phone",
      "field"
    ],
    "captions": [
      {
        "text": "Satya Nadella wearing a suit and tie",
        "confidence": 0.99033389849736619
      }
    ]
  },
  "requestId": "7497e066-2fc9-4e45-8e3f-ab5f1a3183fe",
  "metadata": {
    "width": 600,
    "height": 900,
    "format": "Jpeg"
  },
  "faces": [
    {
      "age": 49,
      "gender": "Male",
      "faceRectangle": {
        "left": 239,
        "top": 293,
        "width": 138,
        "height": 138
      }
    }
  ],
  "color": {
    "dominantColorForeground": "Black",
    "dominantColorBackground": "Black",
    "dominantColors": [
      "Black",
      "Grey"
    ],
    "accentColor": "7B5E50",
    "isBWImg": false
  },
  "imageType": {
    "clipArtType": 0,
    "lineDrawingType": 0
  }
}

Want to build this?

Analyse video in near real-time

Analyse video in near real-time Use any of the Computer Vision APIs with you video files by extracting frames of the video from your device and then sending those frames to the API calls of your choice. Get results from your videos faster.

Use our sample on GitHub to get started and build your own app.

Learn More

See it in action

Want to build this?

Generate a thumbnail

Generate a high quality storage-efficient thumbnail based on any input image. Use thumbnail generation to modify images to best suit your needs for size, shape and style. Apply smart cropping to generate thumbnails that differ from the aspect ratio of your original image, yet preserve the region of interest.

See it in action

By uploading data for this demo, you agree that Microsoft may store it and use it to improve Microsoft services, including this API. To help protect your privacy, we take steps to de-identify your data and keep it secure. We shall not publish your data or let other people use it.

Want to build this?

Explore the Cognitive Services APIs

Computer Vision API

Distill actionable information from images

Face API

Detect, identify, analyze, organise, and tag faces in photos

Content Moderator

Automated image, text and video moderation

Emotion API PREVIEW

Personalise user experiences with emotion recognition

Video API PREVIEW

Intelligent video processing

Custom Vision Service PREVIEW

Easily customize your own state-of-the-art computer vision models for your unique use case

Video Indexer PREVIEW

Unlock video insights

Language Understanding Intelligent Service PREVIEW

Teach your apps to understand commands from your users

Text Analytics API

Easily evaluate sentiment and topics to understand what users want

Bing Spell Check API

Detect and correct spelling mistakes in your app

Translator Text API

Easily conduct machine translation with a simple REST API call

Web Language Model API PREVIEW

Use the power of predictive language models trained on web-scale data

Linguistic Analysis API PREVIEW

Simplify complex language concepts and parse text with the Linguistic Analysis API

Translator Speech API

Easily conduct real-time speech translation with a simple REST API call

Speaker Recognition API PREVIEW

Use speech to identify and authenticate individual speakers

Bing Speech API

Convert speech to text and back again to understand user intent

Custom Speech Service PREVIEW

Overcome speech recognition barriers like speaking style, background noise and vocabulary

Recommendations API PREVIEW

Predict and recommend items your customers want

Academic Knowledge API PREVIEW

Tap into the wealth of academic content in the Microsoft Academic Graph

Knowledge Exploration Service PREVIEW

Enable interactive search experiences over structured data via natural language inputs

QnA Maker API PREVIEW

Distill information into conversational, easy-to-navigate answers

Entity Linking Intelligence Service API PREVIEW

Power your app's data links with named entity recognition and disambiguation

Custom Decision Service PREVIEW

A cloud-based, contextual decision-making API that sharpens with experience

Project Prague

Gesture based controls

Project Cuzco

Event associated with Wikipedia entries

Project Nanjing

Isochrones calculations

Project Abu Dhabi

Distance Matrix

Project Johannesburg

Route logistics

Project Wollongong

Location insights

Ready to supercharge your app?