• 3 min read

Announcing a renaissance in computer vision AI with Microsoft’s Florence foundation model

Today, we are pleased to announce the public preview of Microsoft’s Florence foundation model, trained with billions of text-image pairs and integrated as cost-effective, production-ready computer vision services in Azure Cognitive Service for Vision.

Extract robust insights from image and video content with Azure Cognitive Service for Vision

We are pleased to announce the public preview of Microsoft’s Florence foundation model, trained with billions of text-image pairs and integrated as cost-effective, production-ready computer vision services in Azure Cognitive Service for Vision. The improved Vision Services enables developers to create cutting-edge, market-ready, responsible computer vision applications across various industries. Customers can now seamlessly digitize, analyze, and connect their data to natural language interactions, unlocking powerful insights from their image and video content to support accessibility, drive acquisition through SEO, protect users from harmful content, enhance security, and improve incident response times.

Microsoft was recently named a Leader in the IDC MarketScape: Worldwide General-Purpose Computer Vision AI Software Platforms 2022 Vendor Assessment (doc #US49776422, November 2022). The new Vision Services improves content discoverability with automatic captioning, smart cropping, classifying, background removal, and searching for images. Furthermore, users can track movements, analyze environments, and receive real-time alerts with responsible AI controls. 

Reddit will be using Vision Services to generate captions for hundreds of millions of images on its platform. Tiffany Ong, Reddit Product Manager of Consumer Product has said,

“With Microsoft’s Vision technology, we are making it easier for users to discover and understand our content. The newly created image captions make Reddit more accessible for everyone and give redditors more opportunities to explore our images, engage in conversations, and ultimately build connections and a sense of community.”

Microsoft is harnessing the power of the new Vision Services in Microsoft 365 apps like Teams, PowerPoint, Outlook, Word, Designer, OneDrive, in addition to the Microsoft Datacenter. Microsoft Teams is driving innovation in the digital space with the help of segmentation capabilities, taking virtual meetings to the next level. PowerPoint, Outlook, and Word leverage image captioning for automatic alt-text to improve accessibility. Microsoft Designer and OneDrive are using improved image tagging, image search, and background generation to simplify image discoverability and editing. Microsoft Datacenters are leveraging Vision Services to enhance security and infrastructure reliability.

At this week’s Microsoft Ability Summit, companies will learn how they can improve the accessibility of their visual content. We’ll share the future of our Seeing AI app and LinkedIn will share the benefits of utilizing Vision Services to deliver automatic alt-text descriptions for image analysis. As a preview, Jennison Asuncion, LinkedIn’s Head of Accessibility Engineering Evangelism has said,

“More than 40 percent of LinkedIn’s feed posts include at least one image. We want every member to have equal access to opportunity and are committed to ensuring that we make images accessible to our members who are blind or who have low vision so they can be a part of the online conversation. With Azure Cognitive Service for Vision, we can provide auto-captioning to edit and support alt. text descriptions. I’m excited about this new experience because now, not only will I know my colleague shared a picture from an event they attended, but that my CEO Ryan Roslansky is also in the picture.”

Try out the new out-of-the-box features our customers are using in Vision Studio:

  • Dense captions: Automatically deliver rich captions, design suggestions, accessible alt-text, SEO optimization, and intelligent photo curation to support digital content.
  • Image retrieval: Improve search recommendations and advertisements with natural language queries that seamlessly measure the similarity between images and text.
  • Background removal: Transform the look and feel of images by easily segmenting people and objects from their original background, replacing them with a preferred background scene.
  • Model customization: Lower costs and time to deliver custom models that match unique business demands at high precision, and with just a handful of images.
  • Video summarization (Video TL;DR): Search and interact with video content in the same intuitive way you think and write. Locate relevant content without the need for additional metadata.

Innovate responsibly

Review the responsible AI principles to learn how we are committed to developing AI systems that help make the world more accessible. We are focused on helping organizations take full advantage of AI, and we are investing heavily in programs that provide technology, resources, and expertise to empower those working to create a more sustainable, safe, and accessible world.

Get started today with Azure Cognitive Service for Vision

Revolutionize your computer vision applications with improved efficiency, accuracy, and accessibility in image and video processing, at the same low price. Visit Vision Studio to try out our latest demos.

Learn more about Azure Cognitive Service for Vision: