What is computer vison?

Learn what computer vision is, how computer vision works, and what computer vision is used for.

Computer vision

Computer vision is a field of computer science that focuses on enabling computers to identify and understand objects and people in images and videos. Like other types of AI, computer vision seeks to perform and automate tasks that replicate human capabilities. In this case, computer vision seeks to replicate both the way humans see, and the way humans make sense of what they see.

The range of practical applications for computer vision technology makes it a central component of many modern innovations and solutions. Computer vision can be run in the cloud or on premises.

How computer vision works

Computer vision applications use input from sensing devices, artificial intelligence, machine learning, and deep learning to replicate the way the human vision system works. Computer vision applications run on algorithms that are trained on massive amounts of visual data or images in the cloud. They recognize patterns in this visual data and use those patterns to determine the content of other images.

How an image is analyzed with computer vision

  • A sensing device captures an image. The sensing device is often just a camera, but could be a video camera, medical imaging device, or any other type of device that captures an image for analysis.
  • The image is then sent to an interpreting device. The interpreting device uses pattern recognition to break the image down, compare the patterns in the image against its library of known patterns, and determine if any of the content in the image is a match. The pattern could be something general, like the appearance of a certain type of object, or it could be based on unique identifiers such as facial features.
  • A user requests specific information about an image, and the interpreting device provides the information requested based on its analysis of the image.

Deep learning and computer vision

Modern computer vision applications are shifting away from statistical methods for analyzing images and increasingly relying on what is known as deep learning. With deep learning, a computer vision application runs on a type of algorithm called a neural network, which allows it deliver even more accurate analyses of images. In addition, deep learning allows a computer vision program to retain the information from each image it analyzes—so it gets more and more accurate the more it is used.

Computer vision capabilities

There are three main functions for how computer vision programs process images and return information:

The system classifies the objects in an image according to a defined category. For example, with object classification, a computer could distinguish people from objects in a photo and determine how many people appear in the photo.

The system identifies a particular object in a photo, video, or image. For example, with object identification, the system would be able to not only distinguish people in a photo, but also analyze their appearance to determine the identity or traits of those people.

The system analyzes a video to process the location of a moving object over time. For example, with object tracking, a parking lot surveillance camera could identify cars in a parking lot and provide information about the location and movements of those cars over time.

The system identifies letters and numbers in images and convert that text into machine-encoded text that can be read by other computer applications or edited by users.

What computer vision is used for

Computer vision is a powerful capability and it can be combined with many types of applications and sensing devices to support a number of practical use cases. Here are just a few different types of computer vision applications:

Content organization

Computer vision can be used to identify people or objects in photos and organize them based on that identification. Photo recognition applications like this are commonly used in photo storage and social media applications.

Text extraction

Optical character recognition can be used to boost content discoverability for information contained in large amounts of text and to enable document processing for robotic processing automation scenarios.

Augmented reality

Physical objects are detected and tracked in real-time with computer vision. This information is then used to realistically place virtual objects in a physical environment.

Agriculture

Images of crops taken from satellites, drones, or planes can be analyzed to monitor harvests, detect weed emergence, or identify crop nutrient deficiency.

Autonomous vehicles

Self-driving cars use real-time object identification and tracking to gather information about what's happening around a car and route the car accordingly.

Healthcare

Photos or images captured by other medical devices can be analyzed to provide to help doctors identify problems and make diagnoses more quickly and accurately.

Sports

Object detection and tracking is used for play and strategy analysis.

Manufacturing

Computer vision can monitor manufacturing machinery for maintenance purposes. It can also be used to monitor product quality and packaging on a production line.

Spatial analysis

The system identifies people or objects, such as cars, in a space and tracks their movement within that space.

Face recognition

Computer vision can be applied to identify individuals and.

Browse Azure computer vision solutions

Discover Azure Cognitive Services—a comprehensive family of AI services and cognitive APIs that make it easier to build intelligent apps with computer vision capabilities.

Explore computer vision in Azure

Boost content discoverability, accelerate text extraction, and create products that more people can use by embedding vision capabilities in your apps.

Learn more