Class TermsHashPerField

java.lang.Object
org.apache.lucene.index.TermsHashPerField
All Implemented Interfaces:
Comparable<TermsHashPerField>
Direct Known Subclasses:
FreqProxTermsWriterPerField, TermVectorsConsumerPerField

abstract class TermsHashPerField extends Object implements Comparable<TermsHashPerField>
This class stores streams of information per term without knowing the size of the stream ahead of time. Each stream typically encodes one level of information like term frequency per document or term proximity. Internally this class allocates a linked list of slices that can be read by a ByteSliceReader for each term. Terms are first deduplicated in a BytesRefHash once this is done internal data-structures point to the current offset of each stream that can be written to.
  • Field Details

    • HASH_INIT_SIZE

      private static final int HASH_INIT_SIZE
      See Also:
    • nextPerField

      private final TermsHashPerField nextPerField
    • intPool

      private final IntBlockPool intPool
    • bytePool

      final ByteBlockPool bytePool
    • termStreamAddressBuffer

      private int[] termStreamAddressBuffer
    • streamAddressOffset

      private int streamAddressOffset
    • streamCount

      private final int streamCount
    • fieldName

      private final String fieldName
    • indexOptions

      final IndexOptions indexOptions
    • bytesHash

      private final BytesRefHash bytesHash
    • postingsArray

      ParallelPostingsArray postingsArray
    • lastDocID

      private int lastDocID
    • sortedTermIDs

      private int[] sortedTermIDs
    • doNextCall

      private boolean doNextCall
  • Constructor Details

  • Method Details

    • reset

      void reset()
    • initReader

      final void initReader(ByteSliceReader reader, int termID, int stream)
    • sortTerms

      final void sortTerms()
      Collapse the hash table and sort in-place; also sets this.sortedTermIDs to the results This method must not be called twice unless reset() or reinitHash() was called.
    • getSortedTermIDs

      final int[] getSortedTermIDs()
      Returns the sorted term IDs. sortTerms() must be called before
    • reinitHash

      final void reinitHash()
    • add

      private void add(int textStart, int docID) throws IOException
      Throws:
      IOException
    • initStreamSlices

      private void initStreamSlices(int termID, int docID) throws IOException
      Called when we first encounter a new term. We must allocate slies to store the postings (vInt compressed doc/freq/prox), and also the int pointers to where (in our ByteBlockPool storage) the postings for this term begin.
      Throws:
      IOException
    • assertDocId

      private boolean assertDocId(int docId)
    • add

      void add(BytesRef termBytes, int docID) throws IOException
      Called once per inverted token. This is the primary entry point (for first TermsHash); postings use this API.
      Throws:
      IOException
    • positionStreamSlice

      private int positionStreamSlice(int termID, int docID) throws IOException
      Throws:
      IOException
    • writeByte

      final void writeByte(int stream, byte b)
    • writeBytes

      final void writeBytes(int stream, byte[] b, int offset, int len)
    • writeVInt

      final void writeVInt(int stream, int i)
    • getNextPerField

      final TermsHashPerField getNextPerField()
    • getFieldName

      final String getFieldName()
    • compareTo

      public final int compareTo(TermsHashPerField other)
      Specified by:
      compareTo in interface Comparable<TermsHashPerField>
    • finish

      void finish() throws IOException
      Finish adding all instances of this field to the current document.
      Throws:
      IOException
    • getNumTerms

      final int getNumTerms()
    • start

      boolean start(IndexableField field, boolean first)
      Start adding a new field instance; first is true if this is the first time this field name was seen in the document.
    • newTerm

      abstract void newTerm(int termID, int docID) throws IOException
      Called when a term is seen for the first time.
      Throws:
      IOException
    • addTerm

      abstract void addTerm(int termID, int docID) throws IOException
      Called when a previously seen term is seen again.
      Throws:
      IOException
    • newPostingsArray

      abstract void newPostingsArray()
      Called when the postings array is initialized or resized.
    • createPostingsArray

      abstract ParallelPostingsArray createPostingsArray(int size)
      Creates a new postings array of the specified size.