Computer Vision

An AI service that analyzes content in images and video.

Extract rich information from images and video

Boost content discoverability, automate text extraction, analyze video in real time, and create products that more people can use by embedding vision capabilities in your apps. Use visual data processing to label content with objects and concepts, extract text, generate image descriptions, moderate content, and understand people’s movement in physical spaces. No machine learning expertise is required.

Text extraction (OCR)

Extract printed and handwritten text from images and documents with mixed languages and writing styles.

Image understanding

Pull from a rich ontology of more than 10,000 concepts and objects to generate value from your visual assets.

Spatial analysis

Analyze how people move in a space in real time.

Flexible deployment

Run Computer Vision in the cloud or on the edge, in containers.

Easily apply breakthrough computer vision

Add leading-edge computer vision technology to your own apps with a simple API call.

See it in action

person
person
subway train
Feature Name: Value
Objects [ { "rectangle": { "x": 93, "y": 178, "w": 115, "h": 237 }, "object": "person", "confidence": 0.764 }, { "rectangle": { "x": 0, "y": 229, "w": 101, "h": 206 }, "object": "person", "confidence": 0.624 }, { "rectangle": { "x": 161, "y": 31, "w": 439, "h": 423 }, "object": "subway train", "parent": { "object": "train", "parent": { "object": "Land vehicle", "parent": { "object": "Vehicle", "confidence": 0.926 }, "confidence": 0.923 }, "confidence": 0.917 }, "confidence": 0.801 } ]
Tags [ { "name": "train", "confidence": 0.9974923 }, { "name": "platform", "confidence": 0.9955777 }, { "name": "station", "confidence": 0.979665935 }, { "name": "indoor", "confidence": 0.9272351 }, { "name": "subway", "confidence": 0.838868737 }, { "name": "clothing", "confidence": 0.5561282 }, { "name": "person", "confidence": 0.505803 }, { "name": "pulling", "confidence": 0.431911945 } ]
Description { "tags": [ "train", "platform", "station", "building", "indoor", "subway", "track", "walking", "waiting", "pulling", "board", "people", "man", "luggage", "standing", "holding", "large", "woman", "suitcase" ], "captions": [ { "text": "people waiting at a train station", "confidence": 0.833144546 } ] }
Image format "Jpeg"
Image dimensions 462 x 600
Black and white false
Adult content false
Adult score 0.009112834
Gory false
Gore Score 0.046150554
Racy false
Racy score 0.0143244695
Categories [ { "name": "trans_trainstation", "score": 0.98828125 } ]
Faces []
Dominant color background
"Black"
Dominant color foreground
"Black"
Accent Color
#484C83
  1. Preview
  2. JSON
{
  "categories": [
    {
      "name": "trans_trainstation",
      "score": 0.98828125
    }
  ],
  "adult": {
    "adultScore": 0.009112834,
    "goreScore": 0.046150554,
    "racyScore": 0.0143244695
  },
  "tags": [
    {
      "name": "train",
      "confidence": 0.9974923
    },
    {
      "name": "platform",
      "confidence": 0.9955777
    },
    {
      "name": "station",
      "confidence": 0.979665935
    },
    {
      "name": "indoor",
      "confidence": 0.9272351
    },
    {
      "name": "subway",
      "confidence": 0.838868737
    },
    {
      "name": "clothing",
      "confidence": 0.5561282
    },
    {
      "name": "person",
      "confidence": 0.505803
    },
    {
      "name": "pulling",
      "confidence": 0.431911945
    }
  ],
  "description": {
    "tags": [
      "train",
      "platform",
      "station",
      "building",
      "indoor",
      "subway",
      "track",
      "walking",
      "waiting",
      "pulling",
      "board",
      "people",
      "man",
      "luggage",
      "standing",
      "holding",
      "large",
      "woman",
      "suitcase"
    ],
    "captions": [
      {
        "text": "people waiting at a train station",
        "confidence": 0.833144546
      }
    ]
  },
  "requestId": "d37ba051-3c54-409d-8eff-95eccb2ba71d",
  "metadata": {
    "width": 600,
    "height": 462,
    "format": "Jpeg"
  },
  "faces": [],
  "color": {
    "dominantColorForeground": "Black",
    "dominantColorBackground": "Black",
    "accentColor": "484C83"
  },
  "objects": [
    {
      "rectangle": {
        "x": 93,
        "y": 178,
        "w": 115,
        "h": 237
      },
      "object": "person",
      "confidence": 0.764
    },
    {
      "rectangle": {
        "y": 229,
        "w": 101,
        "h": 206
      },
      "object": "person",
      "confidence": 0.624
    },
    {
      "rectangle": {
        "x": 161,
        "y": 31,
        "w": 439,
        "h": 423
      },
      "object": "subway train",
      "parent": {
        "object": "train",
        "parent": {
          "object": "Land vehicle",
          "parent": {
            "object": "Vehicle",
            "confidence": 0.926
          },
          "confidence": 0.923
        },
        "confidence": 0.917
      },
      "confidence": 0.801
    }
  ]
}
{
  "categories": [
    {
      "name": "trans_trainstation",
      "score": 0.98828125
    }
  ],
  "adult": {
    "adultScore": 0.009112834,
    "goreScore": 0.046150554,
    "racyScore": 0.0143244695
  },
  "tags": [
    {
      "name": "train",
      "confidence": 0.9974923
    },
    {
      "name": "platform",
      "confidence": 0.9955777
    },
    {
      "name": "station",
      "confidence": 0.979665935
    },
    {
      "name": "indoor",
      "confidence": 0.9272351
    },
    {
      "name": "subway",
      "confidence": 0.838868737
    },
    {
      "name": "clothing",
      "confidence": 0.5561282
    },
    {
      "name": "person",
      "confidence": 0.505803
    },
    {
      "name": "pulling",
      "confidence": 0.431911945
    }
  ],
  "description": {
    "tags": [
      "train",
      "platform",
      "station",
      "building",
      "indoor",
      "subway",
      "track",
      "walking",
      "waiting",
      "pulling",
      "board",
      "people",
      "man",
      "luggage",
      "standing",
      "holding",
      "large",
      "woman",
      "suitcase"
    ],
    "captions": [
      {
        "text": "people waiting at a train station",
        "confidence": 0.833144546
      }
    ]
  },
  "requestId": "d37ba051-3c54-409d-8eff-95eccb2ba71d",
  "metadata": {
    "width": 600,
    "height": 462,
    "format": "Jpeg"
  },
  "faces": [],
  "color": {
    "dominantColorForeground": "Black",
    "dominantColorBackground": "Black",
    "accentColor": "484C83"
  },
  "objects": [
    {
      "rectangle": {
        "x": 93,
        "y": 178,
        "w": 115,
        "h": 237
      },
      "object": "person",
      "confidence": 0.764
    },
    {
      "rectangle": {
        "y": 229,
        "w": 101,
        "h": 206
      },
      "object": "person",
      "confidence": 0.624
    },
    {
      "rectangle": {
        "x": 161,
        "y": 31,
        "w": 439,
        "h": 423
      },
      "object": "subway train",
      "parent": {
        "object": "train",
        "parent": {
          "object": "Land vehicle",
          "parent": {
            "object": "Vehicle",
            "confidence": 0.926
          },
          "confidence": 0.923
        },
        "confidence": 0.917
      },
      "confidence": 0.801
    }
  ]
}

Transform your processes

Automatically identify more than 10,000 objects and concepts in your images. Extract printed and handwritten text from multiple image and document types, leveraging support for multiple languages and mixed writing styles. Apply these Computer Vision features to streamline processes, such as robotic process automation and digital asset management.

Maximize the value of your organization’s physical space

Understand how people move in a physical space – whether it’s an office or a store. Create apps that can count people in a room, trace paths, understand dwell times in front of a retail display, and determine wait times in queues. Use these features to build solutions that enable occupancy management, social distancing, optimize in-store and office layouts, as well as accelerate the check-out process. Run the service across multiple cameras and sites.

Learn more about this capability

Deploy anywhere, from the cloud to the edge

Run Computer Vision in the cloud or on-premises with containers. Apply it to diverse scenarios, like healthcare record image examination, text extraction of secure documents, or analysis of how people move through a store, where data security and low latency are paramount.

Learn about Computer Vision in containers

Build on industry-leading Azure security

  • Microsoft invests more than USD 1 billion annually on cybersecurity research and development.

  • We employ more than 3,500 security experts completely dedicated to your data security and privacy.

  • Azure has more certifications than any other cloud provider. View the comprehensive list.

World-class computer vision at competitive prices

Pay only for what you use with no upfront costs. With Computer Vision, you pay as you go based on number of transactions.

Get started with Computer Vision in 3 steps

Get instant access and $200 credit by signing up for your Azure free account.

Sign in to the Azure portal and add Computer Vision.

Learn how to embed Computer Vision with quickstarts and documentation.

Documentation and resources

Get started

Read our documentation

Take the Microsoft Learn courses

Explore our code samples

Check out our sample code

Frequently asked questions about Computer Vision

  • Computer Vision and other Cognitive Services offerings guarantee 99.9-percent availability. No SLA is provided for the Free pricing tier. See SLA details.
  • Your images and videos are automatically deleted after processing. Microsoft does not train on your data to enhance the underlying models. Video data does not leave your premises and video data is not stored on the Edge gateway where the container runs. Learn more about privacy and terms of usage.
  • Yes, you can extract one-off images from video content. With "spatial analysis” you can analyze video streams at high-frame rate using cameras connected via Real Time Streaming Protocol.
  • “Spatial analysis” only detects and locates human presence in video footage and outputs by using a bounding box around a human body. The AI models do not detect faces nor discover the identities or demographics of individuals.
  • The AI models detect and track movements in the video feed based on algorithms that identify the presence of one or more humans by a body bounding box. For each bounding box movement detected in a zone in the camera field of view, the AI models output event data including: bounding box coordinates of person’s body, event type (e.g. zone entry or exit, directional line crossing), pseudonymous identifier to track bounding box, and detection confidence score. This event data is sent to your own instance of Azure IoT Hub.

Ready when you are—let’s set up your Azure free account