Class IDVersionPostingsFormat

java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.sandbox.codecs.idversion.IDVersionPostingsFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

public class IDVersionPostingsFormat extends PostingsFormat
A PostingsFormat optimized for primary-key (ID) fields that also record a version (long) for each ID, delivered as a payload created by longToBytes(long, org.apache.lucene.util.BytesRef) during indexing. At search time, the TermsEnum implementation IDVersionSegmentTermsEnum enables fast (using only the terms index when possible) lookup for whether a given ID was previously indexed with version > N (see IDVersionSegmentTermsEnum.seekExact(BytesRef,long).

This is most effective if the app assigns monotonically increasing global version to each indexed doc. Then, during indexing, use IDVersionSegmentTermsEnum.seekExact(BytesRef,long) (along with LiveFieldValues) to decide whether the document you are about to index was already indexed with a higher version, and skip it if so.

The field is effectively indexed as DOCS_ONLY and the docID is pulsed into the terms dictionary, but the user must feed in the version as a payload on the first token.

NOTE: term vectors cannot be indexed with this field (not that you should really ever want to do this).

  • Field Details

    • MIN_VERSION

      public static final long MIN_VERSION
      version must be >= this.
      See Also:
    • MAX_VERSION

      public static final long MAX_VERSION
      version must be <= this, because we encode with ZigZag.
      See Also:
    • minTermsInBlock

      private final int minTermsInBlock
    • maxTermsInBlock

      private final int maxTermsInBlock
  • Constructor Details

    • IDVersionPostingsFormat

      public IDVersionPostingsFormat()
    • IDVersionPostingsFormat

      public IDVersionPostingsFormat(int minTermsInBlock, int maxTermsInBlock)
  • Method Details

    • fieldsConsumer

      public FieldsConsumer fieldsConsumer(SegmentWriteState state) throws IOException
      Description copied from class: PostingsFormat
      Writes a new segment
      Specified by:
      fieldsConsumer in class PostingsFormat
      Throws:
      IOException
    • fieldsProducer

      public FieldsProducer fieldsProducer(SegmentReadState state) throws IOException
      Description copied from class: PostingsFormat
      Reads a segment. NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
      Specified by:
      fieldsProducer in class PostingsFormat
      Throws:
      IOException
    • bytesToLong

      public static long bytesToLong(BytesRef bytes)
    • longToBytes

      public static void longToBytes(long v, BytesRef bytes)