public class KoreanAnalyzer extends Analyzer
KoreanTokenizer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Constructor and Description |
---|
KoreanAnalyzer()
Creates a new KoreanAnalyzer.
|
KoreanAnalyzer(UserDictionary userDict,
KoreanTokenizer.DecompoundMode mode,
java.util.Set<POS.Tag> stopTags,
boolean outputUnknownUnigrams)
Creates a new KoreanAnalyzer.
|
Modifier and Type | Method and Description |
---|---|
protected Analyzer.TokenStreamComponents |
createComponents(java.lang.String fieldName)
Creates a new
Analyzer.TokenStreamComponents instance for this analyzer. |
protected TokenStream |
normalize(java.lang.String fieldName,
TokenStream in)
Wrap the given
TokenStream in order to apply normalization filters. |
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStream
public KoreanAnalyzer()
public KoreanAnalyzer(UserDictionary userDict, KoreanTokenizer.DecompoundMode mode, java.util.Set<POS.Tag> stopTags, boolean outputUnknownUnigrams)
userDict
- Optional: if non-null, user dictionary.mode
- Decompound mode.stopTags
- The set of part of speech that should be filtered.outputUnknownUnigrams
- If true outputs unigrams for unknown words.protected Analyzer.TokenStreamComponents createComponents(java.lang.String fieldName)
Analyzer
Analyzer.TokenStreamComponents
instance for this analyzer.createComponents
in class Analyzer
fieldName
- the name of the fields content passed to the
Analyzer.TokenStreamComponents
sink as a readerAnalyzer.TokenStreamComponents
for this analyzer.protected TokenStream normalize(java.lang.String fieldName, TokenStream in)
Analyzer
TokenStream
in order to apply normalization filters.
The default implementation returns the TokenStream
as-is. This is
used by Analyzer.normalize(String, String)
.Copyright © 2000–2019 The Apache Software Foundation. All rights reserved.