Package org.apache.lucene.queryparser.flexible.standard


package org.apache.lucene.queryparser.flexible.standard
Lucene Flexible Query Parser Implementation

The old Lucene query parser used to have only one class that performed all the parsing operations. In the new query parser structure, the parsing was divided in 3 steps: parsing (syntax), processing (semantic) and building.

Flexible query parser is a modular, extensible framework for implementing Lucene query parsers. In the flexible query parser model, query parsing takes three steps: syntax parsing, processing (query semantics) and building (conversion to a Lucene Query).

The flexible query parser module provides not just the framework but also the StandardQueryParser - the default implementation of a fully fledged query parser that supports most of the classic query parser's syntax but also adds support for interval functions, min-should-match operator on Boolean groups and many hooks for customization of how the parser behaves at runtime.

The flexible query parser is divided in two packages:

Features

  1. full support for Boolean expressions, including groups
  2. syntax parsers - support for arbitrary syntax parsers, that can be converted into QueryNode trees.
  3. query node processors - optimize, validate, rewrite the QueryNode trees
  4. processor pipelines - select your favorite query processors and build a pipeline to implement the features you need.
  5. query configuration handlers
  6. query builders - convert QueryNode trees into Lucene Query instances.

Design

The flexible query parser was designed to have a very generic architecture, so that it can be easily used for different products with varying query syntax needs.

The query parser has three layers and its core is what we call the query node tree. It is a tree of objects that represent the syntax of the original query, for example, for 'a AND b' the tree could look like this:

       AND
      /   \
     A     B
 

The three flexible query parser layers are:

SyntaxParser
This layer is the text parsing layer which simply transforms the query text string into a QueryNode tree. Every text parser must implement the interface SyntaxParser. The default implementation is StandardSyntaxParser.
QueryNodeProcessor
The query node processor does most of the work: it contains a chain of query node processors. Each processor can walk the tree and modify nodes or even the tree's structure. This allows for query optimization before the node tree is converted to an actual query.
QueryBuilder
The third layer is a configurable map of builders, which map query nodes to their adapters that convert each node into a Query.