Skip to main content

Computer Vision

An AI service that analyzes content in images and video.

Extract rich information from images and video

Boost content discoverability, automate text extraction, analyze video in real time, and create products that more people can use by embedding cloud vision capabilities in your apps with Computer Vision, part of Azure Cognitive Services. Use visual data processing to label content with objects and concepts, extract text, generate image descriptions, moderate content, and understand people's movement in physical spaces. No machine learning expertise is required.

Text extraction (OCR)

Extract printed and handwritten text from images and documents with mixed languages and writing styles.

Image understanding

Pull from a rich ontology of more than 10,000 concepts and objects to generate value from your visual assets.

Spatial analysis

Analyze how people move in a space in real time for occupancy count, social distancing and face mask detection.

Flexible deployment

Run Computer Vision in the cloud or on the edge, in containers.

Transform your processes

Automatically identify more than 10,000 objects and concepts in your images. Extract printed and handwritten text from multiple image and document types, leveraging support for multiple languages and mixed writing styles. Apply these Computer Vision features to streamline processes, such as robotic process automation and digital asset management.

Data monitoring app performance and usage such as active users, daily sessions per user, session duration and top devices.

Maximize the value of your organization’s physical space

Understand how people move in a physical space, whether it's an office or a store. Use the spatial analysis feature to create apps that can count people in a room, trace paths, understand dwell times in front of a retail display, and determine wait times in queues. Build solutions that enable occupancy management and social distancing, face mask compliance, optimize in-store and office layouts, and accelerate the checkout process. Run the service across multiple cameras and sites.

Deploy anywhere, from the cloud to the edge

Run Computer Vision in the cloud or on-premises with containers. Apply it to diverse scenarios, like healthcare record image examination, text extraction of secure documents, or analysis of how people move through a store, where data security and low latency are paramount.

Fuel App Innovation with Cloud AI Services

Learn 5 key ways your organization can get started with AI to realize value quickly.

Comprehensive security and compliance, built in

  • Microsoft invests more than USD$1 billion annually on cybersecurity research and development.

  • We employ more than 3,500 security experts who are dedicated to data security and privacy.

  • Azure has more certifications than any other cloud provider. View the comprehensive list.

  • World-class computer vision at competitive prices

    Pay only for what you use with no upfront costs. With Computer Vision, you pay as you go based on number of transactions.

Get started with an Azure free account


Start free. Get USD$200 credit to use within 30 days. While you have your credit, get free amounts of many of our most popular services, plus free amounts of 55+ other services that are always free.


After your credit, move to pay as you go to keep building with the same free services. Pay only if you use more than your free monthly amounts.


After 12 months, you'll keep getting 55+ always-free services—and still pay only for what you use beyond your free monthly amounts.

Documentation and resources

Get started

See code samples

Explore a sample app

Frequently asked questions about Computer Vision

  • Computer Vision and other Azure Cognitive Services offerings guarantee 99.9 percent availability. No SLA is provided for the Free pricing tier. See SLA details.

  • No. Microsoft automatically deletes your images and videos after processing, and doesn’t train on your data to enhance the underlying models. Video data doesn’t leave your premises, and video data isn’t stored on the edge where the container runs. Learn more about privacy and terms of usage.

  • After using Computer Vision to extract text from images and video, you can use Text Analytics to analyze sentiment, Translator to translate text into your desired language, or Immersive Reader to read the text aloud, making it more accessible. Additional Computer Vision–related capabilities include Form Recognizer to extract key-value pairs and tables from documents, Face to detect and recognize faces in images, Custom Vision to easily build your own computer-vision model from scratch, and Content Moderator to detect unwanted text or images.

  • No, spatial analysis detects and locates human presence in video footage and outputs by using a bounding box around a human body. The AI models don’t detect faces or determine individuals’ identities or demographics.

  • The spatial analysis AI models detect and track movements in the video feed based on algorithms that identify the presence of one or more humans by a body bounding box. For each bounding box movement detected in a zone in the camera field of view, the AI models output event data including bounding box coordinates of a person’s body, event type (for example, zone entry or exit, or directional line crossing), pseudonymous identifiers to track the bounding box, and a detection confidence score. This event data is sent to your own instance of Azure IoT Hub.

Start building with Cognitive Services

Try Computer Vision free

Chat with sales