Azure Vision in Foundry Tools| Microsoft Azure

Overview

Enhance your apps with Azure Vision

Azure Vision delivers innovative computer vision capabilities. Empower your apps to analyze images, read text, and detect faces using prebuilt image tagging, optical character recognition (OCR), and responsible facial recognition. Easily integrate vision features into your projects with no machine learning expertise required.

Learn more
Automatically caption images with natural language, use smart crop, and classify images (in preview).

Try it with Foundry
Track movement and analyze environments in real time using computer vision with image analysis and object detection.

Learn more
Extract printed and handwritten text from images with mixed languages and writing styles using OCR technology.

Learn more
Get clear guidance on how to use computer Azure Vision responsibly to meet your goals and achieve accurate results.

Review Microsoft responsible AI principles and documentation

Features

Analyze visual content in different ways with Azure Vision

Image analysis

Image analysis that pulls from more than 10,000 concepts and objects to detect, classify, caption, and generate insights.

Spatial analysis

Spatial analysis to understand people's presence and movements within physical areas in real time.

Optical character recognition (OCR)

Optical character recognition (OCR) to extract printed and handwritten text from images with varied languages and writing styles.

Facial recognition and liveness detection

Facial recognition and liveness detection to create intelligent applications that recognize and verify human identity.

Pricing

Azure Vision pricing

Pay for only what you use with no upfront costs. Azure Vision uses a pay-as-you-go consumption model based on number of transactions. Learn more about pricing for Azure Vision and Face API.

See Azure Vision pricing See Face API pricing

Azure Speech

Build generative AI apps faster using prebuilt or customizable speech AI models.

Learn more

Azure Content Understanding

Accelerate the transformation of multimodal data into insights.

Learn more

Azure Language

Build conversational interfaces, summarize documents, and analyze text using prebuilt AI-powered features.

Learn more

Foundry Tools

Discover Foundry Tools for fast, secure, and scalable AI integration across apps and agents

Learn more

Azure Document Intelligence

Accelerate information extraction from documents.

Learn more

Azure Vision

Extract text, detect objects, and analyze images with advanced vision AI.

Learn more

CUSTOMER STORIES

Trusted across industries, by companies of all sizes

From OCR to facial recognition, Azure Vision helps Prague Airport turn images into insights that drive action.

A Picture of Tablet screen showing a website.

CATRION used Azure Vision to automate invoice validation by extracting and verifying data from PDFs and scans, cutting review time by two-thirds, reducing errors, and improving workflow accuracy.

Goodwill used Azure Vision to extract item details from photos, streamlining listings and boosting clothing sales by over 35%.

“Coaches look at these elements. They look at the compression of the body. They look at various dynamic factors. These machine learning models, by measuring angles between the joints of the body while performing surf maneuvers, can actually help coaches to provide feedback.”

Kevin Schulz, Aerial Phenom and Surfer, Team USA

See more customer stories

Resources

Documentation and resources

A person holding a computer and a cup of coffee

Azure Vision documentation

Learn how to analyze visual content in different ways with quickstarts, tutorials, and samples.

Explore the documentation

Microsoft Learn courses

Build your skills with step-by-step guidance.

Start learning

Quickstart: Image analysis

Get started with the Image Analysis REST API or client libraries to set up a basic image tagging script.

Get started

Code samples

Explore what’s possible with Azure Vision.

Browse code samples

Transparency Note

Explore use cases for Azure AI Face service.

Learn more

FAQ

Azure Vision is a powerful tool within Foundry Tools (formerly Azure AI Services). Azure Vision provides a set of prebuilt APIs that enable applications and agents to visually interpret the world. Vision provides capabilities such as image analysis, object detection, spatial understanding, and optical character recognition (OCR). These tools allow developers to build intelligent solutions that can "see" and understand visual content in real time.

Vision is integrated into the unified Foundry platform to deliver advanced agentic AI experiences.

Azure Vision (formerly Azure AI Vision) is designed to accelerate the development of intelligent agents and applications without requiring deep machine learning expertise.
Yes, Azure Vision (formerly Azure AI Vision) is now part of the Foundry Tools suite. This rebranding is part of a broader platform unification under Foundry, designed to reflect how developers are increasingly using these services as modular tools to build intelligent, agentic applications.

Azure Vision in Foundry Tools continues to offer the same powerful capabilities—like image analysis, object detection, OCR, and spatial understanding—but is now positioned within a cohesive toolkit that supports agent workflows and multimodal AI scenarios. The rebrand helps clarify how Vision fits into the Foundry ecosystem, making it easier to discover, orchestrate, and integrate with other agents and tools.

This shift is about empowering developers with a unified experience for building AI agents that see, understand, and act.
No. Microsoft automatically deletes your images and videos after processing and does not train on your data to enhance the underlying models. Video data does not leave your premises, and video data is not stored on the edge where the container runs. Learn more about privacy and terms of usage.

Learn more about privacy and terms
No, spatial analysis detects and locates human presence in video footage and outputs a bounding box around each person detected. The AI models do not detect faces nor determine individuals’ identities nor demographics.
The spatial analysis AI models detect and track movements in the video feed based on algorithms that identify the presence of one or more humans by a body bounding box. For each person and bounding box detected in a zone in the camera field of view, the AI models output event data including bounding box coordinates of a person’s body, event type (for example, zone entry or exit, or directional line crossing), pseudonymous identifiers to track the bounding box, and a detection confidence score. This event data is sent to your own instance of Azure IoT Hub.
Yes. Because model customization is designed to be fine-tuned for your scenario, you need to provide labeled data to train your model.
The model customization feature of the service is optimized to quickly recognize major differences between images, so you can start prototyping your model with a small amount of data. You may start with as little as one image per label. If you have more labeled images, you may add more. Depending on the complexity of the problem and degree of accuracy required, you can continue adding additional images per label to improve your model.
The model customization feature for Azure Vision is the next generation of Custom Vision, with improved accuracy and few-shot learning capabilities.It is recommended that you migrate your training data to retrain your model with model customization in Azure Vision.

A woman sitting at a table using a laptop.

Choose the Azure account that’s right for you

Pay as you go or try Azure free for up to 30 days.

Get started with Azure

A woman with short curly hair smiling in a green shirt.

AI development tools

Design and manage AI applications

Create, customize, and scale AI apps and agents efficiently.

Explore Foundry

Business Solution Hub

Drive results with innovative cloud solutions

Browse the Microsoft Business Solutions Hub to find the products and solutions that can help your organization reach its goals.

Explore Microsoft solutions

Azure Vision in Foundry Tools

Enhance your apps with Azure Vision

Elevate your computer vision projects

Boost content discoverability with image analysis

Stream video in real time with spatial analysis

Read text from images with optical character recognition (OCR)

Apply AI responsibly

Analyze visual content in different ways with Azure Vision

Image analysis

Spatial analysis

Optical character recognition (OCR)

Facial recognition and liveness detection

Embedded security and compliance

Azure Vision pricing

Azure products work better together

Azure Speech

Azure Content Understanding

Azure Language

Foundry Tools

Azure Document Intelligence

Azure Vision

Trusted across industries, by companies of all sizes

Documentation and resources

Azure Vision documentation

Microsoft Learn courses

Quickstart: Image analysis

Code samples

Transparency Note

Frequently asked questions

What is Azure Vision in Foundry Tools?

Azure AI Vision is now called Azure Vision in Foundry Tools. How does that change the service?

Does Azure Vision in Foundry Tools store my images or videos or use them for product improvements?

Does spatial analysis detect faces or a person’s identity?

How does Azure Vision in Foundry Tools analyze people in a physical space?

Do I need to use my own data for training my custom model on Azure Vision in Foundry Tools?

How much data does Azure Vision in Foundry Tools need?

How is the model customization feature different from Custom Vision?

Choose the Azure account that’s right for you

Design and manage AI applications

Drive results with innovative cloud solutions