Class BlockReader
- java.lang.Object
-
- org.apache.lucene.index.TermsEnum
-
- org.apache.lucene.index.BaseTermsEnum
-
- org.apache.lucene.codecs.uniformsplit.BlockReader
-
- All Implemented Interfaces:
Accountable
,BytesRefIterator
- Direct Known Subclasses:
IntersectBlockReader
,STBlockReader
public class BlockReader extends BaseTermsEnum implements Accountable
Seeks the block corresponding to a given term, read the block bytes, and scans the block terms.Reads fully the block in
blockReadBuffer
. Then scans the block terms in memory. The details region is lazily decoded withtermStatesReadBuffer
which shares the same byte array withblockReadBuffer
. SeeBlockWriter
andBlockLine
for the block format.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus
-
-
Field Summary
Fields Modifier and Type Field Description private static long
BASE_RAM_USAGE
protected BlockDecoder
blockDecoder
protected int
blockFirstLineStart
Offset of the start of the first line of the current block (just after the header), relative to the block start.protected BlockHeader
blockHeader
Current block header.protected IndexInput
blockInput
IndexInput
on theblock file
.protected BlockLine
blockLine
Current block line.protected BlockLine.Serializer
blockLineReader
protected ByteArrayDataInput
blockReadBuffer
In-memory read buffer for the current block.protected long
blockStartFP
Current block start file pointer, absolute in theblock file
.protected IndexDictionary.Browser
dictionaryBrowser
Holds theIndexDictionary.Browser
once loaded.protected java.util.function.Supplier<IndexDictionary.Browser>
dictionaryBrowserSupplier
IndexDictionary.Browser
supplier for lazy loading.protected FieldMetadata
fieldMetadata
protected BytesRefBuilder
forcedTerm
Set whenseekExact(BytesRef, TermState)
is called.protected int
lineIndexInBlock
Current line index in the block.protected PostingsReaderBase
postingsReader
protected BytesRef
scratchBlockBytes
protected BlockTermState
scratchTermState
protected BlockTermState
termState
Current block line details.protected boolean
termStateForced
Whether the currentTermState
has been forced with a call toseekExact(BytesRef, TermState)
.protected DeltaBaseTermStateSerializer
termStateSerializer
protected ByteArrayDataInput
termStatesReadBuffer
In-memory read buffer for the details region of the current block.
-
Constructor Summary
Constructors Modifier Constructor Description protected
BlockReader(java.util.function.Supplier<IndexDictionary.Browser> dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
clearTermState()
protected int
compareToMiddleAndJump(BytesRef searchedTerm)
Compares the searched term to the middle term of the block.protected BytesRef
decodeBlockBytesIfNeeded(int numBlockBytes)
int
docFreq()
Returns the number of documents containing the current term.protected IndexDictionary.Browser
getOrCreateDictionaryBrowser()
ImpactsEnum
impacts(int flags)
Return aImpactsEnum
.protected void
initializeBlockReadLazily()
protected void
initializeHeader(BytesRef searchedTerm, long targetBlockStartFP)
Reads and setsblockHeader
.protected boolean
isBeyondLastTerm(BytesRef searchedTerm, long blockStartFP)
Indicates whether the searched term is beyond the last term of the field.protected boolean
isCurrentTerm(BytesRef searchedTerm)
BytesRef
next()
Increments the iteration to the nextBytesRef
in the iterator.protected BytesRef
nextTerm()
Moves to the next term line and reads it, it may be in the next block.long
ord()
Returns ordinal position for current term.PostingsEnum
postings(PostingsEnum reuse, int flags)
GetPostingsEnum
for the current term, with control over whether freqs, positions, offsets or payloads are required.long
ramBytesUsed()
Return the memory usage of this object in bytes.protected BlockHeader
readHeader()
Reads the block header.protected BlockLine
readLineInBlock()
Reads the current block line.protected BlockTermState
readTermState()
Reads theBlockTermState
on the current line.protected BlockTermState
readTermStateIfNotRead()
Reads theBlockTermState
if it is not already set.TermsEnum.SeekStatus
seekCeil(BytesRef searchedTerm)
Seeks to the specified term, if it exists, or to the next (ceiling) term.void
seekExact(long ord)
Not supported.boolean
seekExact(BytesRef searchedTerm)
Attempts to seek to the exact term, returning true if the term is found.void
seekExact(BytesRef term, TermState state)
Positions thisBlockReader
without re-seeking the term dictionary.protected TermsEnum.SeekStatus
seekInBlock(BytesRef searchedTerm)
Seeks to the provided term in this block.protected TermsEnum.SeekStatus
seekInBlock(BytesRef searchedTerm, long blockStartFP)
Seeks to the provided term in the block starting at the provided file pointer.BytesRef
term()
Returns current term.TermState
termState()
Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.long
totalTermFreq()
Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term).-
Methods inherited from class org.apache.lucene.index.BaseTermsEnum
attributes
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
BASE_RAM_USAGE
private static final long BASE_RAM_USAGE
-
blockInput
protected IndexInput blockInput
IndexInput
on theblock file
.
-
postingsReader
protected final PostingsReaderBase postingsReader
-
fieldMetadata
protected final FieldMetadata fieldMetadata
-
blockDecoder
protected final BlockDecoder blockDecoder
-
blockLineReader
protected BlockLine.Serializer blockLineReader
-
blockReadBuffer
protected ByteArrayDataInput blockReadBuffer
In-memory read buffer for the current block.
-
termStatesReadBuffer
protected ByteArrayDataInput termStatesReadBuffer
In-memory read buffer for the details region of the current block. It shares the same byte array asblockReadBuffer
, with a different position.
-
termStateSerializer
protected DeltaBaseTermStateSerializer termStateSerializer
-
dictionaryBrowserSupplier
protected final java.util.function.Supplier<IndexDictionary.Browser> dictionaryBrowserSupplier
IndexDictionary.Browser
supplier for lazy loading.
-
dictionaryBrowser
protected IndexDictionary.Browser dictionaryBrowser
Holds theIndexDictionary.Browser
once loaded.
-
blockStartFP
protected long blockStartFP
Current block start file pointer, absolute in theblock file
.
-
blockHeader
protected BlockHeader blockHeader
Current block header.
-
blockLine
protected BlockLine blockLine
Current block line.
-
termState
protected BlockTermState termState
Current block line details.
-
blockFirstLineStart
protected int blockFirstLineStart
Offset of the start of the first line of the current block (just after the header), relative to the block start.
-
lineIndexInBlock
protected int lineIndexInBlock
Current line index in the block.
-
termStateForced
protected boolean termStateForced
Whether the currentTermState
has been forced with a call toseekExact(BytesRef, TermState)
.- See Also:
forcedTerm
-
forcedTerm
protected BytesRefBuilder forcedTerm
Set whenseekExact(BytesRef, TermState)
is called.This optimizes the use-case when the caller calls first
seekExact(BytesRef, TermState)
and thenpostings(PostingsEnum, int)
. In this case we don't access the terms block file (we don't seek) but directly the postings file because we already have theTermState
with the file pointers to the postings file.
-
scratchBlockBytes
protected BytesRef scratchBlockBytes
-
scratchTermState
protected final BlockTermState scratchTermState
-
-
Constructor Detail
-
BlockReader
protected BlockReader(java.util.function.Supplier<IndexDictionary.Browser> dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder) throws java.io.IOException
- Parameters:
dictionaryBrowserSupplier
- to load theIndexDictionary.Browser
lazily inseekCeil(BytesRef)
.blockDecoder
- Optional block decoder, may be null if none. It can be used for decompression or decryption.- Throws:
java.io.IOException
-
-
Method Detail
-
seekCeil
public TermsEnum.SeekStatus seekCeil(BytesRef searchedTerm) throws java.io.IOException
Description copied from class:TermsEnum
Seeks to the specified term, if it exists, or to the next (ceiling) term. Returns SeekStatus to indicate whether exact term was found, a different term was found, or EOF was hit. The target term may be before or after the current term. If this returns SeekStatus.END, the enum is unpositioned.
-
seekExact
public boolean seekExact(BytesRef searchedTerm) throws java.io.IOException
Description copied from class:TermsEnum
Attempts to seek to the exact term, returning true if the term is found. If this returns false, the enum is unpositioned. For some codecs, seekExact may be substantially faster thanTermsEnum.seekCeil(org.apache.lucene.util.BytesRef)
.- Overrides:
seekExact
in classBaseTermsEnum
- Returns:
- true if the term is found; return false if the enum is unpositioned.
- Throws:
java.io.IOException
-
isCurrentTerm
protected boolean isCurrentTerm(BytesRef searchedTerm)
-
isBeyondLastTerm
protected boolean isBeyondLastTerm(BytesRef searchedTerm, long blockStartFP)
Indicates whether the searched term is beyond the last term of the field.- Parameters:
blockStartFP
- The current block start file pointer.
-
seekInBlock
protected TermsEnum.SeekStatus seekInBlock(BytesRef searchedTerm, long blockStartFP) throws java.io.IOException
Seeks to the provided term in the block starting at the provided file pointer. Does not exceed the block.- Throws:
java.io.IOException
-
seekInBlock
protected TermsEnum.SeekStatus seekInBlock(BytesRef searchedTerm) throws java.io.IOException
Seeks to the provided term in this block.Does not exceed this block;
TermsEnum.SeekStatus.END
is returned if it follows the block.Compares the line terms with the
searchedTerm
, taking advantage of the incremental encoding properties.Scans linearly the terms. Updates the current block line with the current term.
- Throws:
java.io.IOException
-
compareToMiddleAndJump
protected int compareToMiddleAndJump(BytesRef searchedTerm) throws java.io.IOException
Compares the searched term to the middle term of the block. If the searched term is lexicographically equal or after the middle term then jumps to the second half of the block directly.- Returns:
- The comparison between the searched term and the middle term.
- Throws:
java.io.IOException
-
readLineInBlock
protected BlockLine readLineInBlock() throws java.io.IOException
Reads the current block line. SetsblockLine
and incrementslineIndexInBlock
.- Returns:
- The
BlockLine
; or null if there no more line in the block. - Throws:
java.io.IOException
-
seekExact
public void seekExact(BytesRef term, TermState state)
Positions thisBlockReader
without re-seeking the term dictionary.The block containing the term is not read by this method. It will be read lazily only if needed, for example if
next()
is called. Callingpostings(org.apache.lucene.index.PostingsEnum, int)
after this method does require the block to be read.- Overrides:
seekExact
in classBaseTermsEnum
- Parameters:
term
- the term the TermState corresponds tostate
- theTermState
-
seekExact
public void seekExact(long ord)
Not supported.
-
next
public BytesRef next() throws java.io.IOException
Description copied from interface:BytesRefIterator
Increments the iteration to the nextBytesRef
in the iterator. Returns the resultingBytesRef
ornull
if the end of the iterator is reached. The returned BytesRef may be re-used across calls to next. After this method returns null, do not call it again: the results are undefined.- Specified by:
next
in interfaceBytesRefIterator
- Returns:
- the next
BytesRef
in the iterator ornull
if the end of the iterator is reached. - Throws:
java.io.IOException
- If there is a low-level I/O error.
-
nextTerm
protected BytesRef nextTerm() throws java.io.IOException
Moves to the next term line and reads it, it may be in the next block. The term details are not read yet. They will be read only when needed withreadTermStateIfNotRead()
.- Returns:
- The read term bytes; or null if there is no more term for the field.
- Throws:
java.io.IOException
-
initializeHeader
protected void initializeHeader(BytesRef searchedTerm, long targetBlockStartFP) throws java.io.IOException
Reads and setsblockHeader
. Sets null if there is no block for the field anymore.- Parameters:
searchedTerm
- The searched term; or null if none.targetBlockStartFP
- The file pointer of the block to read.- Throws:
java.io.IOException
-
initializeBlockReadLazily
protected void initializeBlockReadLazily()
-
readHeader
protected BlockHeader readHeader() throws java.io.IOException
Reads the block header. SetsblockHeader
.- Returns:
- The block header; or null if there is no block for the field anymore.
- Throws:
java.io.IOException
-
decodeBlockBytesIfNeeded
protected BytesRef decodeBlockBytesIfNeeded(int numBlockBytes) throws java.io.IOException
- Throws:
java.io.IOException
-
readTermStateIfNotRead
protected BlockTermState readTermStateIfNotRead() throws java.io.IOException
Reads theBlockTermState
if it is not already set. SetstermState
.- Throws:
java.io.IOException
-
readTermState
protected BlockTermState readTermState() throws java.io.IOException
Reads theBlockTermState
on the current line. SetstermState
.Overriding method may return null if there is no
BlockTermState
(in this case the extending class must support a nulltermState
).- Returns:
- The
BlockTermState
; or null if none. - Throws:
java.io.IOException
-
term
public BytesRef term()
Description copied from class:TermsEnum
Returns current term. Do not call this when the enum is unpositioned.
-
ord
public long ord()
Description copied from class:TermsEnum
Returns ordinal position for current term. This is an optional method (the codec may throwUnsupportedOperationException
). Do not call this when the enum is unpositioned.
-
docFreq
public int docFreq() throws java.io.IOException
Description copied from class:TermsEnum
Returns the number of documents containing the current term. Do not call this when the enum is unpositioned.TermsEnum.SeekStatus.END
.
-
totalTermFreq
public long totalTermFreq() throws java.io.IOException
Description copied from class:TermsEnum
Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term). Note that, like other term measures, this measure does not take deleted documents into account.- Specified by:
totalTermFreq
in classTermsEnum
- Throws:
java.io.IOException
-
termState
public TermState termState() throws java.io.IOException
Description copied from class:TermsEnum
Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.NOTE: A seek by
TermState
might not capture theAttributeSource
's state. Callers must maintain theAttributeSource
states separately- Overrides:
termState
in classBaseTermsEnum
- Throws:
java.io.IOException
- See Also:
TermState
,TermsEnum.seekExact(BytesRef, TermState)
-
postings
public PostingsEnum postings(PostingsEnum reuse, int flags) throws java.io.IOException
Description copied from class:TermsEnum
GetPostingsEnum
for the current term, with control over whether freqs, positions, offsets or payloads are required. Do not call this when the enum is unpositioned. This method will not return null.NOTE: the returned iterator may return deleted documents, so deleted documents have to be checked on top of the
PostingsEnum
.- Specified by:
postings
in classTermsEnum
- Parameters:
reuse
- pass a prior PostingsEnum for possible reuseflags
- specifies which optional per-document values you require; seePostingsEnum.FREQS
- Throws:
java.io.IOException
-
impacts
public ImpactsEnum impacts(int flags) throws java.io.IOException
Description copied from class:TermsEnum
Return aImpactsEnum
.- Specified by:
impacts
in classTermsEnum
- Throws:
java.io.IOException
- See Also:
TermsEnum.postings(PostingsEnum, int)
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getOrCreateDictionaryBrowser
protected IndexDictionary.Browser getOrCreateDictionaryBrowser()
-
clearTermState
protected void clearTermState()
-
-