public class TrecContentSource extends ContentSource
ContentSource
over the TREC collection.
Supports the following configuration parameters (on top of
ContentSource
):
TrecDocParser
class to use for
parsing the TREC documents content (default=TrecGov2Parser).
HTMLParser
class to use for
parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DOC |
static java.lang.String |
DOCNO |
static java.lang.String |
NEW_LINE
separator between lines in the byffer
|
static java.lang.String |
TERMINATING_DOC |
static java.lang.String |
TERMINATING_DOCNO |
encoding, forever, logStep, verbose
Constructor and Description |
---|
TrecContentSource() |
Modifier and Type | Method and Description |
---|---|
void |
close()
Called when reading from this content source is no longer required.
|
DocData |
getNextDocData(DocData docData)
Returns the next
DocData from the content source. |
java.util.Date |
parseDate(java.lang.String dateStr) |
void |
resetInputs()
Resets the input for this content source, so that the test would behave as
if it was just started, input-wise.
|
void |
setConfig(Config config)
Sets the
Config for this content source. |
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLog
public static final java.lang.String DOCNO
public static final java.lang.String TERMINATING_DOCNO
public static final java.lang.String DOC
public static final java.lang.String TERMINATING_DOC
public static final java.lang.String NEW_LINE
public java.util.Date parseDate(java.lang.String dateStr)
public void close() throws java.io.IOException
ContentItemsSource
close
in interface java.io.Closeable
close
in interface java.lang.AutoCloseable
close
in class ContentItemsSource
java.io.IOException
public DocData getNextDocData(DocData docData) throws NoMoreDataException, java.io.IOException
ContentSource
DocData
from the content source.
Implementations must account for multi-threading, as multiple threads
can call this method simultaneously.getNextDocData
in class ContentSource
NoMoreDataException
java.io.IOException
public void resetInputs() throws java.io.IOException
ContentItemsSource
NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
resetInputs
in class ContentItemsSource
java.io.IOException
public void setConfig(Config config)
ContentItemsSource
Config
for this content source. If you override this
method, you must call super.setConfig.setConfig
in class ContentItemsSource
Copyright © 2000–2019 The Apache Software Foundation. All rights reserved.