Visual assistant

Azure App Service
Azure AI Bot Service
Azure AI services

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

This solution presents a visual assistant that provides rich information that's based on the content of an image.

Architecture

Architecture diagram that shows the flow of data between a browser and a bot, and between the bot and search services.

Download a Visio file of this architecture.

Dataflow

  1. Users interact with a bot through a mobile app or a web app.
  2. The bot uses Language Understanding Intelligence Service (LUIS), which is built into the application, to identify the user intent and conversational context.
  3. The bot passes visual context, such as an image, to the Bing Visual Search API.
  4. The bot retrieves information from the Bing Entity Search API about people, places, artwork, monuments, and objects that are related to the image.
  5. The bot retrieves information from barcodes.
  6. Optionally, the bot gets more information about barcodes or queries that's limited to the user's domain by using the Bing Custom Search API.
  7. The visual assistant presents the user with the information about related products, destinations, celebrities, places, monuments, and artwork.

Components

  • Azure App Service is a fully managed HTTP-based service for hosting web apps, REST APIs, and mobile backends.
  • Azure Bot Service offers an environment for developing intelligent, enterprise-grade bots that enrich customer experiences. The integrated environment also provides a way to maintain control of your data.
  • The Bing Custom Search API provides a way to create customized search experiences with Bing's powerful ranking and global-scale search index.
  • The Bing Entity Search API offers search capabilities that identify relevant entities, such as well-known people, places, movies, TV shows, video games, books, and businesses.
  • The Bing Visual Search API returns data that's related to a given image, such as similar images, shopping sources for purchasing the item in the image, and webpages that include the image.
  • The Bing Web Search API provides search results after you issue a single API call. The results compile relevant information from billions of webpages, images, videos, and news.
  • Azure Cognitive Service for Language is part of Azure Cognitive Services that offers many natural language processing services.
  • Conversational language understanding is a feature of Cognitive Service for Language. This cloud-based API service offers machine-learning intelligence capabilities for building conversational apps. You can use LUIS to predict the meaning of a conversation and pull out relevant, detailed information.

Scenario details

This solution presents a visual assistant that provides rich information that's based on the content of an image. The assistant's capabilities include reading business cards, deciphering barcodes, and recognizing well-known people, places, objects, artwork, and monuments.

Potential use cases

Organizations can use this solution to provide:

  • Appointment scheduling.
  • Order and delivery tracking in manufacturing, automotive, and transportation applications.
  • Barcode purchases in retail.
  • Payment processing in finance and retail.
  • Subscription renewals in retail.
  • The identification of well-known people, places, objects, art, and monuments, in the education, media, and entertainment industries.

Next steps