Questions? Feedback? powered by Olark live chat software
Přeskočit navigaci

Lucene query language in Azure Search

Publikováno dne 7 prosince, 2015

Software Engineer, Azure Search

In the context of search, different applications need different amounts of control over exactly how searches are executed. In many cases you want straightforward keyword search, with separate optional control for relevance. That’s what Azure Search has traditionally offered and it works really well when you just want to take the user input and send it to the search service directly. However, we’ve seen many cases where the app needs more control over exactly what is searched, in what fields, with what term-level weights, etc. In order to enable these more advanced scenarios, we are excited to introduce support for a new search mode that allows developers to use the full Lucene query language for cases where you need fine-grained control.

The Lucene query language was developed in Apache Lucene. It has been widely adopted in the search domain for its expressiveness and flexibility. With the added support for the well-known query language in our service, you can now formulate a broader class of questions against Azure Search using the syntax you are already familiar with. We also hope that it will ease the burden for customers migrating from their existing Lucene based on-premises solutions to Azure Search.

How to enable the Lucene query syntax in the search query

You can use the queryType parameter in a Search request to switch between the two search modes:

  1. Simple to use the simple query language
  2. Full for the full Lucene query language

The queryType parameter defaults to the simple search mode, so you need not do anything if you choose not to use the new feature. When the parameter is set to full, search text is interpreted using the Lucene query parser.

The complete set of features and syntax in the Lucene query language can be found on our MSDN page. Below are some highlights on how some of its features address popular customer requests on Azure Search User Voice.

Field scoped searches

With the simple query language, the search terms provided in the query are always searched in all the searchable fields unless the query is scoped to specific searchable fields with the searchFields parameter. With the Lucene query language, you can scope your search to a specific field by placing a field name in front of a search clause. For example, the following will search for the word Leavenworth only within the city field.

city:Leavenworth&queryType=full

This feature is especially useful when you want to limit the search space for a term or a phrase so that matches from irrelevant fields do not affect or pollute the search results. You can often expect better performance and quality of the response by limiting the search space. Please keep in mind that the fields in the fielded terms must be configured as searchable.

Term boosting

Sometimes terms in a search query carry different degrees of importance and relevance. For example, in searching for a hotel, certain aspects may be more important to you as a customer and you want to reflect that in the search query. With the full Lucene query language, you can optionally assign a boost factor, a positive number, to a search term or phrase to control the relevance relative to other terms in the search query. A term without a boost value is automatically assigned a neutral boost value of 1. In this example we will give the word clean a boost value 3 and the word quiet a boost value 2 implying they are three times and twice more important respectively than other neutral terms in the search query.

search=quiet^2 clean^3 +hotels city:Leavenworth&querytype=full

Term boosting and field boosting in scoring profiles

The term boosting in the Lucene query language differs from and extends the field boosting offered in scoring profiles. The field weight in a scoring profile is only applied to matches from the specific field and is configurable only by the app designer. The term boosting in a query, on the other hand, is applied to all searchable fields and available to the end-user. The two types of control are applied independently of each other and combined when a boosted term in a search query has a match in a boosted field.

Searches with regular expressions

The Lucene query language supports regular expressions within single terms. This enables a scenario that has been highly requested on Azure Search User Voice: Support for infix and suffix queries. To search with a regex pattern, the pattern must be placed between forward slashes "/." The pattern is matched against the entire vocabulary of the index and thus it is often a good idea to keep the regex pattern as limiting as possible for performance and precision. In this example we will look for “hotel” and “motel” in cities that start with the letters Leaven.

search=/[hm]otel/ city:/Leaven.*/&queryType=full

Fuzzy searches

The Lucene query language supports fuzzy searches on single terms based on the Levenshtein Distance algorithm. To enable fuzzy search, place a tilde "~" symbol at the end of a term with an optional parameter, between 0 and 2, that specifies the maximum edit distance allowed for the match. For example, the search query below with the edit distance of 1 returns a result if there are documents in the index that contain a word that satisfies the matching condition.

search=+hotel city:Levenworth~1&queryType=full

Most features of the Lucene query syntax are implemented in Azure Search with range search being the only exception. It can be implemented with OData $filter expressions.

Please note, the new search mode is currently only available through our REST API. Support for it in the .NET SDK is coming soon, so stay tuned.

We hope you will find the new feature useful and we look forward to hearing about how you use the new supported syntax. For more details on the Lucene query language, please visit the MSDN documentation page. If you have questions, please feel free to leave your feedback on Azure Search User Voice.