Today we are announcing Cognitive Search, an AI-first approach to content understanding. Cognitive Search is powered by Azure Search with built-in Cognitive Services. It pulls data from a variety of Azure data sources and applies a set of composable cognitive skills which extract knowledge. This knowledge is then organized and stored in a search index enabling new experiences for exploring the data.
Finding latent knowledge in all data
Real-world data is messy. It often spans media types (e.g. text documents, PDF files, images, databases), changes constantly, and carries valuable knowledge in ways that is not readily usable. In our team we see the same challenges that emerge from this on a daily basis: our customers apply information retrieval solutions, such as Azure Search- combined with AI models, either pre-built models such as Cognitive Services or custom ones, to extract latent knowledge in their vast data stores.
The typical solution pattern for this is a data ingestion, enrichment and exploration model. Each of these brings its own challenges to the table—from large scale change tracking to file format support, and even composition of multiple AI models. Developers can do this today, but it takes a huge amount of effort, requires branching into multiple unrelated domains (from cracking PDFs to handling AI model composition), and distracts from the primary goal. This is where Cognitive Search comes in.
The new Cognitive Search capability in Azure Search is a concrete implementation of the ingest-enrich-explore pattern.
When you use Azure Search, you get direct support for each aspect of the process:
- Ingest: pull data from Azure Blob Storage, SQL DB, CosmosDB, MySQL, and Table Storage. For unstructured data in Blob Storage, the service not only reads the raw data but also supports extracting contents from several popular file formats such as PDFs and Office documents. The service supports change detection to keep up with changes through incremental processing, without having to go over the entire data set after initial ingestion.
- Enrich: use cognitive skills to augment data as it’s ingested using a powerful composition system. The service integrates with Cognitive Services to offer built-in support for OCR (for print and handwritten text), named entity recognition, key phrase extraction, language detection, image analysis with scene description/tagging capabilities, and more. We also included a webhook-based extensibility mechanism that allows you to add your own cognitive skills while fitting into the rest of the picture for ingestion and composition; custom skills are easy to build as Azure Functions or other tools that can expose webhooks. Knowledge created during enrichment often not only augments individual data items, but also connects entities and facts across different items, different stores and even different media types.
- Explore: the outcome of enrichment is additional knowledge derived from the ingested data combined with the outcome of applying various AI models to it. Different applications will want to surface different exploration experiences over this data. The original data and all annotations produced during enrichment are put in an Azure Search index—a powerful data store that supports keyword search, understands 56 languages, handles structured queries in addition to unstructured search, and offers faceted navigation. Best of all, the index processes all of this at lightning-fast speeds.
We’re lucky to have multiple customers work with us during the early stages of product development for Cognitive Search. This helped us understand real-world requirements and iterate on product adjustments.
Here are a few examples of how we’ve seen customers apply Cognitive Search:
Our Healthcare customers face a similar challenge with clinical data. Large volume of text includes references to general entities (e.g. people’s names) and domain-specific ones (e.g. drug and disease names) that need to be connected and related. Sometimes they also need to combine this with imagery that’s analyzed in well-known ways (e.g. OCR) as well as applying leading-edge methods (e.g. AI-assisted diagnostics).
In the Financial Services space, customers need to handle the challenge of extensive regulation described as large volume of documents, forms produced by their customers, contracts they handle with customers and providers, and more. Generally-applicable natural language processing techniques combined with specialized content understanding models enables them to provide their employees and customers with a global view of their information assets.
Oil & Gas companies have teams of geologists and other specialists that need to understand seismic and geologic data. They often have decades of PDFs with pictures of samples over sample sheets full of handwritten field notes. They need to connect places, people (domain experts), events, and navigate all this information to make key decisions.
Check out Scott Guthrie’s announcement in his //BUILD 2018 keynote, where you can see one of our early adopter customers, the NBA, talk about their scenario and how Cognitive Search brought together Azure Search, Cognitive Services and custom models built with Azure ML to power a rich data exploration experience powered by AI.
If you’re ready to try this out on your own data, head to the Azure portal, create an Azure Search service and you’ll see the “Cognitive Search” step in the Import Data flow.
When you’re ready to go past what the UX can do, check out the documentation to learn how to use more cognitive skills and how to extend the enrichment process with your own data.
To explore a scenario where we applied Cognitive Search to a public dataset, check out the JFK Files in AI.lab. We published a live version of that app and posted the code in GitHub in case you want to use it as a starting point for something you want to build.
We look forward to seeing what you’ll build with Cognitive Search! If you want to get in touch with the team, tweet with the #azuresearch hashtag or email us at firstname.lastname@example.org.
Pablo Castro - on behalf of the entire Content Search and Intelligence team