Making apps more human: how the Computer Vision API brings intelligence to image analysis

Publicado em 1 junho, 2016

Senior Program Manager, Cognitive Services

When your goal is to democratize access to artificial intelligence, how do you give developers the ability to implement intelligent image analysis in their apps? You give them an API that can tear an image apart, identify and tag more than 300 elements, and continue to learn and evolve a it's trained. That's the Computer Vision API: a machine-learning image analyzer that any dev can drop into any app with just a few lines of code. And it is the next step on a long journey to make apps behave—and respond—in a more human way.

> Get a Computer Vision API key

Lions aren't dogs

Microsoft didn't begin using AI to analyze images with its Computer Vision API. There was an earlier, more limited iteration—the precursor to Computer Vision. It was part of an API suite—but more limited in scope. Microsoft developers including one of Computer Vision's architects, Cornelia Carapcea, listened to the devs who used the original API suite and decided how to evolve Computer Vision from the community's responses. For example, the original API didn't recognize lions—it thought they were dogs. So improvements needed to be made.

Carapcea and her team have spent the last year engineering the Computer Vision API to meet that demand. But her work on the API goes beyond tweaking what the machine can identify. Apps are becoming more human, she explains, and developers want ways to allow their apps to behave more like humans would behave. To do this, you must use AI and machine learning as the backbones. "The only way you can try to match the many layers—labels and classifications—that the human brain can accomplish is with AI," Carapcea says. "It's based on deep learning and neural networks, and large volumes of data."

> Read the research behind Computer Vision

An API that learns

What this does, she continues, is allow the API to classify elements of an image in much the same way a human brain would. For example: a four year old might see a pony, and know it's a pony. But would that four year old see a pony and know to call it a type of horse? Would that four year old know both of those creatures are called animals? That's the level of depth and intelligence Carapcea and her team strives to provide with Computer Vision: an API that recognizes the horse, the pony, and that both are types of animals. So, what you see here is that humans seem to learn by seeing the same pattern (or image) multiple times; the machine learning behind Computer Vision does the same.

A matter of language

But Computer Vision goes beyond even that—it constructs sentences based on the tags it identifies. The CaptionBot app is a basic application of what could be done with Computer Vision. "Its applications for accessibility, the visually impaired, or bots that understand images, are endless," Carapcea says. "This is cutting-edge stuff. No one else out there gives you image captioning as an API right now. If you want thousands of objects recognized, this is the API to use."

> Read API documentation

> Leave a comment on the User Voice forum