Class CompressingStoredFieldsIndexWriter
- java.lang.Object
-
- org.apache.lucene.codecs.compressing.CompressingStoredFieldsIndexWriter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public final class CompressingStoredFieldsIndexWriter extends java.lang.Object implements java.io.Closeable
Efficient index format for block-basedCodec
s.This writer generates a file which can be loaded into memory using memory-efficient data structures to quickly locate the block that contains any document.
In order to have a compact in-memory representation, for every block of 1024 chunks, this index computes the average number of bytes per chunk and for every chunk, only stores the difference between
- ${chunk number} * ${average length of a chunk}
- and the actual start offset of the chunk
Data is written as follows:
- PackedIntsVersion, <Block>BlockCount, BlocksEndMarker
- PackedIntsVersion -->
PackedInts.VERSION_CURRENT
as aVInt
- BlocksEndMarker --> 0 as a
VInt
, this marks the end of blocks since blocks are not allowed to start with 0 - Block --> BlockChunks, <DocBases>, <StartPointers>
- BlockChunks --> a
VInt
which is the number of chunks encoded in the block - DocBases --> DocBase, AvgChunkDocs, BitsPerDocBaseDelta, DocBaseDeltas
- DocBase --> first document ID of the block of chunks, as a
VInt
- AvgChunkDocs --> average number of documents in a single chunk, as a
VInt
- BitsPerDocBaseDelta --> number of bits required to represent a delta from the average using ZigZag encoding
- DocBaseDeltas -->
packed
array of BlockChunks elements of BitsPerDocBaseDelta bits each, representing the deltas from the average doc base using ZigZag encoding. - StartPointers --> StartPointerBase, AvgChunkSize, BitsPerStartPointerDelta, StartPointerDeltas
- StartPointerBase --> the first start pointer of the block, as a
VLong
- AvgChunkSize --> the average size of a chunk of compressed documents, as a
VLong
- BitsPerStartPointerDelta --> number of bits required to represent a delta from the average using ZigZag encoding
- StartPointerDeltas -->
packed
array of BlockChunks elements of BitsPerStartPointerDelta bits each, representing the deltas from the average start pointer using ZigZag encoding - Footer -->
CodecFooter
Notes
- For any block, the doc base of the n-th chunk can be restored with
DocBase + AvgChunkDocs * n + DocBaseDeltas[n]
. - For any block, the start pointer of the n-th chunk can be restored with
StartPointerBase + AvgChunkSize * n + StartPointerDeltas[n]
. - Once data is loaded into memory, you can lookup the start pointer of any document chunk by performing two binary searches: a first one based on the values of DocBase in order to find the right block, and then inside the block based on DocBaseDeltas (by reconstructing the doc bases for every chunk).
-
-
Field Summary
Fields Modifier and Type Field Description (package private) int
blockChunks
(package private) int
blockDocs
(package private) int
blockSize
(package private) int[]
docBaseDeltas
(package private) IndexOutput
fieldsIndexOut
(package private) long
firstStartPointer
(package private) long
maxStartPointer
(package private) long[]
startPointerDeltas
(package private) int
totalDocs
-
Constructor Summary
Constructors Constructor Description CompressingStoredFieldsIndexWriter(IndexOutput indexOutput, int blockSize)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
(package private) void
finish(int numDocs, long maxPointer)
private void
reset()
private void
writeBlock()
(package private) void
writeIndex(int numDocs, long startPointer)
-
-
-
Field Detail
-
fieldsIndexOut
final IndexOutput fieldsIndexOut
-
blockSize
final int blockSize
-
totalDocs
int totalDocs
-
blockDocs
int blockDocs
-
blockChunks
int blockChunks
-
firstStartPointer
long firstStartPointer
-
maxStartPointer
long maxStartPointer
-
docBaseDeltas
final int[] docBaseDeltas
-
startPointerDeltas
final long[] startPointerDeltas
-
-
Constructor Detail
-
CompressingStoredFieldsIndexWriter
CompressingStoredFieldsIndexWriter(IndexOutput indexOutput, int blockSize) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
reset
private void reset()
-
writeBlock
private void writeBlock() throws java.io.IOException
- Throws:
java.io.IOException
-
writeIndex
void writeIndex(int numDocs, long startPointer) throws java.io.IOException
- Throws:
java.io.IOException
-
finish
void finish(int numDocs, long maxPointer) throws java.io.IOException
- Throws:
java.io.IOException
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
-