Leveraging complex data to build advanced search applications with Azure Search

已于 六月 27, 2019 发布

Principal Program Manager, Azure Search

Data is rarely simple. Not every piece of data we have can fit nicely into a single Excel worksheet of rows and columns. Data has many diverse relationships such as the multiple locations and phone numbers for a single customer or multiple authors and genres of a single book. Of course, relationships typically are even more complex than this, and as we start to leverage AI to understand our data the additional learnings we get only add to the complexity of relationships. For that reason, expecting customers to have to flatten the data so it can be searched and explored is often unrealistic. We heard this often and it quickly became our number one most requested Azure Search feature. Because of this we were excited to announce the general availability of complex types support in Azure Search. In this post, I want to take some time to explain what complex types adds to Azure Search and the kinds of things you can build using this capability. 

Azure Search is a platform as a service that helps developers create their own cloud search solutions.

What is complex data?

Complex data consists of data that includes hierarchical or nested substructures that do not break down neatly into a tabular rowset. For example a book with multiple authors, where each author can have multiple attributes, can’t be represented as a single row of data unless there is a way to model the authors as a collection of objects. Complex types provide this capability, and they can be used when the data cannot be modeled in simple field structures such as strings or integers.

Complex types applicability

At Microsoft Build 2019,  we demonstrated how complex types could be leveraged to build out an effective search application. In the session we looked at the Travel Stack Exchange site, one of the many online communities supported by StackExchange.

The StackExchange data was modeled in a JSON structure to allow easy ingestion it into Azure Search. If we look at the first post made to this site and focus on the first few fields, we see that all of them can be modeled using simple datatypes, including tags which can be modeled as a collection, or array of strings.

{
   "id": "1",
    "CreationDate": "2011-06-21T20:19:34.73",
    "Score": 8,
    "ViewCount": 462,
    "BodyHTML": "<p>My fiancée and I are looking for a good Caribbean cruise in October and were wondering which
    "Body": "my fiancée and i are looking for a good caribbean cruise in october and were wondering which islands
    "OwnerUserId": 9,
    "LastEditorUserId": 101,
    "LastEditDate": "2011-12-28T21:36:43.91",
    "LastActivityDate": "2012-05-24T14:52:14.76",
    "Title": "What are some Caribbean cruises for October?",
    "Tags": [
        "caribbean",
        "cruising",
        "vacations"
    ],
    "AnswerCount": 4,
    "CommentCount": 4,
    "CloseDate": "0001-01-01T00:00:00",​

However, as we look further down this dataset we see that the data quickly gets more complex and cannot be mapped into a flat structure. For example, there can be numerous comments and answers associated with a single document.  Even votes is defined here as a complex type (although technically it could have been flattened, but that would add work to transform the data).

"CloseDate": "0001-01-01T00:00:00",
    "Comments": [
        {
            "Score": 0,
            "Text": "To help with the cruise line question: Where are you located? My wife and I live in New Orlea
            "CreationDate": "2011-06-21T20:25:14.257",
           "UserId": 12
        },
        {
            "Score": 0,
            "Text": "Toronto, Ontario. We can fly out of anywhere though.",
            "CreationDate": "2011-06-21T20:27:35.3",
            "UserId": 9
        },
        {
            "Score": 3,
            "Text": "\"Best\" for what?  Please read [this page](
            "UserId": 20
        },
        {
            "Score": 2,
            "Text": "What do you want out of a cruise? To relax on a boat? To visit islands? Culture? Adventure?
            "CreationDate": "2011-06-24T05:07:16.643",
            "UserId": 65
        }
    ],
    "Votes": {
        "UpVotes": 10,
        "DownVotes": 2
    },
    "Answers": [
        {
            "IsAcceptedAnswer": "True",
            "Body": "This is less than an answer, but more than a comment…\n\nA large percentage of your travel b
            "Score": 7,
            "CreationDate": "2011-06-24T05:12:01.133",
            "OwnerUserId": 74

All of this data is important to the search experience. For example, you might want to:

In fact, we could even improve on the existing StackExchange search interface by leveraging Cognitive Search to extract key phrases from the answers to supply potential phrases for autocomplete as the user types in the search box.

All of this is now possible because not only can you map this data to a complex structure, but the search queries can support this enhanced structure to help build out a better search experience.

Next Steps

If you would like to learn more about Azure Search complex types, please visit the documentation, or check out the video and associated code I made which digs into this Travel StackExchange data in more detail.