Package org.apache.lucene.index
Class DocumentsWriter
- java.lang.Object
-
- org.apache.lucene.index.DocumentsWriter
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,Accountable
final class DocumentsWriter extends java.lang.Object implements java.io.Closeable, Accountable
This class accepts multiple added documents and directly writes segment files. Each added document is passed to the indexing chain, which in turn processes the document into the different codec formats. Some formats write bytes to files immediately, e.g. stored fields and term vectors, while others are buffered by the indexing chain and written only on flush. Once we have used our allowed RAM buffer, or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to the Directory. Threads: Multiple threads are allowed into addDocument at once. There is an initial synchronized call to getThreadState which allocates a ThreadState for this thread. The same thread will get the same ThreadState over time (thread affinity) so that if there are consistent patterns (for example each thread is indexing a different content source) then we make better use of RAM. Then processDocument is called on that ThreadState without synchronization (most of the "heavy lifting" is in this call). Finally the synchronized "finishDocument" is called to flush changes to the directory. When flush is called by IndexWriter we forcefully idle all threads and flush only once they are all idle. This means you can call flush with a given thread even while other threads are actively adding/deleting documents. Exceptions: Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call abort() to discard all docs added since the last flush. All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static interface
DocumentsWriter.FlushNotifications
-
Field Summary
Fields Modifier and Type Field Description private boolean
closed
private LiveIndexWriterConfig
config
private DocumentsWriterDeleteQueue
currentFullFlushDelQueue
(package private) DocumentsWriterDeleteQueue
deleteQueue
private Directory
directory
private Directory
directoryOrig
private boolean
enableTestPoints
(package private) DocumentsWriterFlushControl
flushControl
private DocumentsWriter.FlushNotifications
flushNotifications
(package private) FlushPolicy
flushPolicy
private FieldInfos.FieldNumbers
globalFieldNumberMap
private int
indexCreatedVersionMajor
private InfoStream
infoStream
private long
lastSeqNo
private java.util.concurrent.atomic.AtomicInteger
numDocsInRAM
private boolean
pendingChangesInCurrentFullFlush
private java.util.concurrent.atomic.AtomicLong
pendingNumDocs
(package private) DocumentsWriterPerThreadPool
perThreadPool
private java.util.function.Supplier<java.lang.String>
segmentNameSupplier
private DocumentsWriterFlushQueue
ticketQueue
-
Constructor Summary
Constructors Constructor Description DocumentsWriter(DocumentsWriter.FlushNotifications flushNotifications, int indexCreatedVersionMajor, java.util.concurrent.atomic.AtomicLong pendingNumDocs, boolean enableTestPoints, java.util.function.Supplier<java.lang.String> segmentNameSupplier, LiveIndexWriterConfig config, Directory directoryOrig, Directory directory, FieldInfos.FieldNumbers globalFieldNumberMap)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) void
abort()
Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs.private int
abortThreadState(DocumentsWriterPerThreadPool.ThreadState perThread)
Returns how many documents were aborted.(package private) boolean
anyChanges()
boolean
anyDeletions()
private boolean
applyAllDeletes()
If buffered deletes are using too much heap, resolve them and write disk and return true.private long
applyDeleteOrUpdate(java.util.function.ToLongFunction<DocumentsWriterDeleteQueue> function)
private boolean
assertTicketQueueModification(DocumentsWriterDeleteQueue deleteQueue)
void
close()
(package private) long
deleteQueries(Query... queries)
(package private) long
deleteTerms(Term... terms)
private boolean
doFlush(DocumentsWriterPerThread flushingDWPT)
private void
ensureInitialized(DocumentsWriterPerThreadPool.ThreadState state)
private void
ensureOpen()
(package private) void
finishFullFlush(boolean success)
(package private) long
flushAllThreads()
(package private) boolean
flushOneDWPT()
int
getBufferedDeleteTermsSize()
long
getFlushingBytes()
Returns the number of bytes currently being flushed This is a subset of the value returned byramBytesUsed()
long
getMaxCompletedSequenceNumber()
returns the maximum sequence number for all previously completed operationsint
getNumBufferedDeleteTerms()
(package private) int
getNumDocs()
Returns how many docs are currently buffered in RAM.(package private) java.io.Closeable
lockAndAbortAll()
Locks all currently active DWPT and aborts them.private boolean
postUpdate(DocumentsWriterPerThread flushingDWPT, boolean hasEvents)
private boolean
preUpdate()
(package private) void
purgeFlushTickets(boolean forced, IOUtils.IOConsumer<DocumentsWriterFlushQueue.FlushTicket> consumer)
long
ramBytesUsed()
Return the memory usage of this object in bytes.private boolean
setFlushingDeleteQueue(DocumentsWriterDeleteQueue session)
(package private) void
setLastSeqNo(long seqNo)
(package private) void
subtractFlushedNumDocs(int numFlushed)
(package private) long
updateDocument(java.lang.Iterable<? extends IndexableField> doc, Analyzer analyzer, DocumentsWriterDeleteQueue.Node<?> delNode)
(package private) long
updateDocuments(java.lang.Iterable<? extends java.lang.Iterable<? extends IndexableField>> docs, Analyzer analyzer, DocumentsWriterDeleteQueue.Node<?> delNode)
(package private) long
updateDocValues(DocValuesUpdate... updates)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
directoryOrig
private final Directory directoryOrig
-
directory
private final Directory directory
-
globalFieldNumberMap
private final FieldInfos.FieldNumbers globalFieldNumberMap
-
indexCreatedVersionMajor
private final int indexCreatedVersionMajor
-
pendingNumDocs
private final java.util.concurrent.atomic.AtomicLong pendingNumDocs
-
enableTestPoints
private final boolean enableTestPoints
-
segmentNameSupplier
private final java.util.function.Supplier<java.lang.String> segmentNameSupplier
-
flushNotifications
private final DocumentsWriter.FlushNotifications flushNotifications
-
closed
private volatile boolean closed
-
infoStream
private final InfoStream infoStream
-
config
private final LiveIndexWriterConfig config
-
numDocsInRAM
private final java.util.concurrent.atomic.AtomicInteger numDocsInRAM
-
deleteQueue
volatile DocumentsWriterDeleteQueue deleteQueue
-
ticketQueue
private final DocumentsWriterFlushQueue ticketQueue
-
pendingChangesInCurrentFullFlush
private volatile boolean pendingChangesInCurrentFullFlush
-
perThreadPool
final DocumentsWriterPerThreadPool perThreadPool
-
flushPolicy
final FlushPolicy flushPolicy
-
flushControl
final DocumentsWriterFlushControl flushControl
-
lastSeqNo
private long lastSeqNo
-
currentFullFlushDelQueue
private volatile DocumentsWriterDeleteQueue currentFullFlushDelQueue
-
-
Constructor Detail
-
DocumentsWriter
DocumentsWriter(DocumentsWriter.FlushNotifications flushNotifications, int indexCreatedVersionMajor, java.util.concurrent.atomic.AtomicLong pendingNumDocs, boolean enableTestPoints, java.util.function.Supplier<java.lang.String> segmentNameSupplier, LiveIndexWriterConfig config, Directory directoryOrig, Directory directory, FieldInfos.FieldNumbers globalFieldNumberMap)
-
-
Method Detail
-
deleteQueries
long deleteQueries(Query... queries) throws java.io.IOException
- Throws:
java.io.IOException
-
setLastSeqNo
void setLastSeqNo(long seqNo)
-
deleteTerms
long deleteTerms(Term... terms) throws java.io.IOException
- Throws:
java.io.IOException
-
updateDocValues
long updateDocValues(DocValuesUpdate... updates) throws java.io.IOException
- Throws:
java.io.IOException
-
applyDeleteOrUpdate
private long applyDeleteOrUpdate(java.util.function.ToLongFunction<DocumentsWriterDeleteQueue> function) throws java.io.IOException
- Throws:
java.io.IOException
-
applyAllDeletes
private boolean applyAllDeletes() throws java.io.IOException
If buffered deletes are using too much heap, resolve them and write disk and return true.- Throws:
java.io.IOException
-
purgeFlushTickets
void purgeFlushTickets(boolean forced, IOUtils.IOConsumer<DocumentsWriterFlushQueue.FlushTicket> consumer) throws java.io.IOException
- Throws:
java.io.IOException
-
getNumDocs
int getNumDocs()
Returns how many docs are currently buffered in RAM.
-
ensureOpen
private void ensureOpen() throws AlreadyClosedException
- Throws:
AlreadyClosedException
-
abort
void abort() throws java.io.IOException
Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs. This resets our state, discarding any docs added since last flush.- Throws:
java.io.IOException
-
flushOneDWPT
final boolean flushOneDWPT() throws java.io.IOException
- Throws:
java.io.IOException
-
lockAndAbortAll
java.io.Closeable lockAndAbortAll() throws java.io.IOException
Locks all currently active DWPT and aborts them. The returned Closeable should be closed once the locks for the aborted DWPTs can be released.- Throws:
java.io.IOException
-
abortThreadState
private int abortThreadState(DocumentsWriterPerThreadPool.ThreadState perThread) throws java.io.IOException
Returns how many documents were aborted.- Throws:
java.io.IOException
-
getMaxCompletedSequenceNumber
public long getMaxCompletedSequenceNumber()
returns the maximum sequence number for all previously completed operations
-
anyChanges
boolean anyChanges()
-
getBufferedDeleteTermsSize
public int getBufferedDeleteTermsSize()
-
getNumBufferedDeleteTerms
public int getNumBufferedDeleteTerms()
-
anyDeletions
public boolean anyDeletions()
-
close
public void close()
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
-
preUpdate
private boolean preUpdate() throws java.io.IOException
- Throws:
java.io.IOException
-
postUpdate
private boolean postUpdate(DocumentsWriterPerThread flushingDWPT, boolean hasEvents) throws java.io.IOException
- Throws:
java.io.IOException
-
ensureInitialized
private void ensureInitialized(DocumentsWriterPerThreadPool.ThreadState state) throws java.io.IOException
- Throws:
java.io.IOException
-
updateDocuments
long updateDocuments(java.lang.Iterable<? extends java.lang.Iterable<? extends IndexableField>> docs, Analyzer analyzer, DocumentsWriterDeleteQueue.Node<?> delNode) throws java.io.IOException
- Throws:
java.io.IOException
-
updateDocument
long updateDocument(java.lang.Iterable<? extends IndexableField> doc, Analyzer analyzer, DocumentsWriterDeleteQueue.Node<?> delNode) throws java.io.IOException
- Throws:
java.io.IOException
-
doFlush
private boolean doFlush(DocumentsWriterPerThread flushingDWPT) throws java.io.IOException
- Throws:
java.io.IOException
-
subtractFlushedNumDocs
void subtractFlushedNumDocs(int numFlushed)
-
setFlushingDeleteQueue
private boolean setFlushingDeleteQueue(DocumentsWriterDeleteQueue session)
-
assertTicketQueueModification
private boolean assertTicketQueueModification(DocumentsWriterDeleteQueue deleteQueue)
-
flushAllThreads
long flushAllThreads() throws java.io.IOException
- Throws:
java.io.IOException
-
finishFullFlush
void finishFullFlush(boolean success) throws java.io.IOException
- Throws:
java.io.IOException
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getFlushingBytes
public long getFlushingBytes()
Returns the number of bytes currently being flushed This is a subset of the value returned byramBytesUsed()
-
-