Package org.apache.lucene.analysis.icu
Class ICUNormalizer2CharFilter
java.lang.Object
java.io.Reader
org.apache.lucene.analysis.CharFilter
org.apache.lucene.analysis.charfilter.BaseCharFilter
org.apache.lucene.analysis.icu.ICUNormalizer2CharFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Readable
Normalize token text with ICU's
Normalizer2
.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
private int
private int
private final StringBuilder
private boolean
private final com.ibm.icu.text.Normalizer2
private final StringBuilder
private final CharacterUtils.CharacterBuffer
Fields inherited from class org.apache.lucene.analysis.CharFilter
input
-
Constructor Summary
ConstructorsConstructorDescriptionCreate a new Normalizer2CharFilter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)ICUNormalizer2CharFilter
(Reader in, com.ibm.icu.text.Normalizer2 normalizer) Create a new Normalizer2CharFilter with the specified Normalizer2ICUNormalizer2CharFilter
(Reader in, com.ibm.icu.text.Normalizer2 normalizer, int bufferSize) -
Method Summary
Modifier and TypeMethodDescriptionprivate int
normalizeInputUpto
(int length) private int
outputFromResultBuffer
(char[] cbuf, int begin, int len) int
read
(char[] cbuf, int off, int len) private int
private int
private int
private void
private void
recordOffsetDiff
(int inputLength, int outputLength) Methods inherited from class org.apache.lucene.analysis.charfilter.BaseCharFilter
addOffCorrectMap, correct, getLastCumulativeDiff
Methods inherited from class org.apache.lucene.analysis.CharFilter
close, correctOffset
Methods inherited from class java.io.Reader
mark, markSupported, nullReader, read, read, read, ready, reset, skip, transferTo
-
Field Details
-
normalizer
private final com.ibm.icu.text.Normalizer2 normalizer -
inputBuffer
-
resultBuffer
-
inputFinished
private boolean inputFinished -
afterQuickCheckYes
private boolean afterQuickCheckYes -
checkedInputBoundary
private int checkedInputBoundary -
charCount
private int charCount -
tmpBuffer
-
-
Constructor Details
-
ICUNormalizer2CharFilter
Create a new Normalizer2CharFilter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold) -
ICUNormalizer2CharFilter
Create a new Normalizer2CharFilter with the specified Normalizer2- Parameters:
in
- textnormalizer
- normalizer to use
-
ICUNormalizer2CharFilter
ICUNormalizer2CharFilter(Reader in, com.ibm.icu.text.Normalizer2 normalizer, int bufferSize)
-
-
Method Details
-
read
- Specified by:
read
in classReader
- Throws:
IOException
-
readInputToBuffer
- Throws:
IOException
-
readAndNormalizeFromInput
private int readAndNormalizeFromInput() -
readFromInputWhileSpanQuickCheckYes
private int readFromInputWhileSpanQuickCheckYes() -
readFromIoNormalizeUptoBoundary
private int readFromIoNormalizeUptoBoundary() -
normalizeInputUpto
private int normalizeInputUpto(int length) -
recordOffsetDiff
private void recordOffsetDiff(int inputLength, int outputLength) -
outputFromResultBuffer
private int outputFromResultBuffer(char[] cbuf, int begin, int len)
-