Package org.apache.lucene.misc.search
Class DiversifiedTopDocsCollector
java.lang.Object
org.apache.lucene.search.TopDocsCollector<DiversifiedTopDocsCollector.ScoreDocKey>
org.apache.lucene.misc.search.DiversifiedTopDocsCollector
- All Implemented Interfaces:
Collector
public abstract class DiversifiedTopDocsCollector
extends TopDocsCollector<DiversifiedTopDocsCollector.ScoreDocKey>
A
TopDocsCollector
that controls diversity in results by ensuring no more than
maxHitsPerKey results from a common source are collected in the final results.
An example application might be a product search in a marketplace where no more than 3 results per retailer are permitted in search results.
To compare behaviour with other forms of collector, a useful analogy might be the problem of making a compilation album of 1967's top hit records:
- A vanilla query's results might look like a "Best of the Beatles" album - high quality but not much diversity
- A GroupingSearch would produce the equivalent of "The 10 top-selling artists of 1967 - some killer and quite a lot of filler"
- A "diversified" query would be the top 20 hit records of that year - with a max of 3 Beatles hits in order to maintain diversity
- Working in one pass over the data
- Not requiring the client to guess how many groups are required
- Removing low-scoring "filler" which sits at the end of each group's hits
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
An extension to ScoreDoc that includes a key used for grouping purposes(package private) static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected int
private int
(package private) DiversifiedTopDocsCollector.ScoreDocKey
Fields inherited from class org.apache.lucene.search.TopDocsCollector
EMPTY_TOPDOCS, pq, totalHits, totalHitsRelation
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract NumericDocValues
getKeys
(LeafReaderContext context) Get a source of values used for grouping keysgetLeafCollector
(LeafReaderContext context) Create a newcollector
to collect the given context.protected DiversifiedTopDocsCollector.ScoreDocKey
insert
(DiversifiedTopDocsCollector.ScoreDocKey addition, int docBase, NumericDocValues keys) protected TopDocs
newTopDocs
(ScoreDoc[] results, int start) Returns aTopDocs
instance containing the given results.private void
perKeyGroupRemove
(DiversifiedTopDocsCollector.ScoreDocKey globalOverflow) Indicates what features are required from the scorer.Methods inherited from class org.apache.lucene.search.TopDocsCollector
getTotalHits, populateResults, topDocs, topDocs, topDocs, topDocsSize
-
Field Details
-
spare
-
globalQueue
-
numHits
private int numHits -
perKeyQueues
-
maxNumPerKey
protected int maxNumPerKey -
sparePerKeyQueues
-
-
Constructor Details
-
DiversifiedTopDocsCollector
public DiversifiedTopDocsCollector(int numHits, int maxHitsPerKey)
-
-
Method Details
-
getKeys
Get a source of values used for grouping keys -
scoreMode
Description copied from interface:Collector
Indicates what features are required from the scorer. -
newTopDocs
Description copied from class:TopDocsCollector
Returns aTopDocs
instance containing the given results. Ifresults
is null it means there are no results to return, either because there were 0 calls to collect() or because the arguments to topDocs were invalid.- Overrides:
newTopDocs
in classTopDocsCollector<DiversifiedTopDocsCollector.ScoreDocKey>
-
insert
protected DiversifiedTopDocsCollector.ScoreDocKey insert(DiversifiedTopDocsCollector.ScoreDocKey addition, int docBase, NumericDocValues keys) throws IOException - Throws:
IOException
-
perKeyGroupRemove
-
getLeafCollector
Description copied from interface:Collector
Create a newcollector
to collect the given context.- Parameters:
context
- next atomic reader context- Throws:
IOException
-