Class CommonTermsQuery

java.lang.Object
org.apache.lucene.search.Query
org.apache.lucene.queries.CommonTermsQuery

public class CommonTermsQuery extends Query
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

  • Field Details

    • terms

      protected final List<Term> terms
    • maxTermFrequency

      protected final float maxTermFrequency
    • lowFreqOccur

      protected final BooleanClause.Occur lowFreqOccur
    • highFreqOccur

      protected final BooleanClause.Occur highFreqOccur
    • lowFreqBoost

      protected float lowFreqBoost
    • highFreqBoost

      protected float highFreqBoost
    • lowFreqMinNrShouldMatch

      protected float lowFreqMinNrShouldMatch
    • highFreqMinNrShouldMatch

      protected float highFreqMinNrShouldMatch
  • Constructor Details

  • Method Details

    • add

      public void add(Term term)
      Adds a term to the CommonTermsQuery
      Parameters:
      term - the term to add
    • rewrite

      public Query rewrite(IndexReader reader) throws IOException
      Description copied from class: Query
      Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.

      Callers are expected to call rewrite multiple times if necessary, until the rewritten query is the same as the original query.

      Overrides:
      rewrite in class Query
      Throws:
      IOException
      See Also:
    • visit

      public void visit(QueryVisitor visitor)
      Description copied from class: Query
      Recurse through the query tree, visiting any child queries
      Specified by:
      visit in class Query
      Parameters:
      visitor - a QueryVisitor to be called by each query in the tree
    • calcLowFreqMinimumNumberShouldMatch

      protected int calcLowFreqMinimumNumberShouldMatch(int numOptional)
    • calcHighFreqMinimumNumberShouldMatch

      protected int calcHighFreqMinimumNumberShouldMatch(int numOptional)
    • minNrShouldMatch

      private final int minNrShouldMatch(float minNrShouldMatch, int numOptional)
    • buildQuery

      protected Query buildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)
    • collectTermStates

      public void collectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) throws IOException
      Throws:
      IOException
    • setLowFreqMinimumNumberShouldMatch

      public void setLowFreqMinimumNumberShouldMatch(float min)
      Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

      By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

      Parameters:
      min - the number of optional clauses that must match
    • getLowFreqMinimumNumberShouldMatch

      public float getLowFreqMinimumNumberShouldMatch()
      Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
    • setHighFreqMinimumNumberShouldMatch

      public void setHighFreqMinimumNumberShouldMatch(float min)
      Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

      By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

      Parameters:
      min - the number of optional clauses that must match
    • getHighFreqMinimumNumberShouldMatch

      public float getHighFreqMinimumNumberShouldMatch()
      Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
    • getTerms

      public List<Term> getTerms()
      Gets the list of terms.
    • getMaxTermFrequency

      public float getMaxTermFrequency()
      Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
    • getLowFreqOccur

      public BooleanClause.Occur getLowFreqOccur()
      Gets the BooleanClause.Occur used for low frequency terms.
    • getHighFreqOccur

      public BooleanClause.Occur getHighFreqOccur()
      Gets the BooleanClause.Occur used for high frequency terms.
    • getLowFreqBoost

      public float getLowFreqBoost()
      Gets the boost used for low frequency terms.
    • getHighFreqBoost

      public float getHighFreqBoost()
      Gets the boost used for high frequency terms.
    • toString

      public String toString(String field)
      Description copied from class: Query
      Prints a query to a string, with field assumed to be the default field and omitted.
      Specified by:
      toString in class Query
    • hashCode

      public int hashCode()
      Description copied from class: Query
      Override and implement query hash code properly in a subclass. This is required so that QueryCache works properly.
      Specified by:
      hashCode in class Query
      See Also:
    • equals

      public boolean equals(Object other)
      Description copied from class: Query
      Override and implement query instance equivalence properly in a subclass. This is required so that QueryCache works properly.

      Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical that other instance. Utility methods are provided for certain repetitive code.

      Specified by:
      equals in class Query
      See Also:
    • equalsTo

      private boolean equalsTo(CommonTermsQuery other)
    • newTermQuery

      protected Query newTermQuery(Term term, TermStates termStates)
      Builds a new TermQuery instance.

      This is intended for subclasses that wish to customize the generated queries.

      Parameters:
      term - term
      termStates - the TermStates to be used to create the low level term query. Can be null.
      Returns:
      new TermQuery instance