Package org.apache.lucene.analysis.ko
Class KoreanTokenizerFactory
- java.lang.Object
-
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.util.TokenizerFactory
-
- org.apache.lucene.analysis.ko.KoreanTokenizerFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
public class KoreanTokenizerFactory extends TokenizerFactory implements ResourceLoaderAware
Factory forKoreanTokenizer
.<fieldType name="text_ko" class="solr.TextField"> <analyzer> <tokenizer class="solr.KoreanTokenizerFactory" decompoundMode="discard" userDictionary="user.txt" userDictionaryEncoding="UTF-8" outputUnknownUnigrams="false" discardPunctuation="true" /> </analyzer> </fieldType>
Supports the following attributes:
- userDictionary: User dictionary path.
- userDictionaryEncoding: User dictionary encoding.
- decompoundMode: Decompound mode. Either 'none', 'discard', 'mixed'. Default is discard. See
KoreanTokenizer.DecompoundMode
- outputUnknownUnigrams: If true outputs unigrams for unknown words.
- discardPunctuation: true if punctuation tokens should be dropped from the output.
- Since:
- 7.4.0
-
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.String
DECOMPOUND_MODE
private static java.lang.String
DISCARD_PUNCTUATION
private boolean
discardPunctuation
private KoreanTokenizer.DecompoundMode
mode
static java.lang.String
NAME
SPI nameprivate static java.lang.String
OUTPUT_UNKNOWN_UNIGRAMS
private boolean
outputUnknownUnigrams
private static java.lang.String
USER_DICT_ENCODING
private static java.lang.String
USER_DICT_PATH
private UserDictionary
userDictionary
private java.lang.String
userDictionaryEncoding
private java.lang.String
userDictionaryPath
-
Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description KoreanTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new KoreanTokenizerFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description KoreanTokenizer
create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactoryvoid
inform(ResourceLoader loader)
Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).-
Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
-
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final java.lang.String NAME
SPI name- See Also:
- Constant Field Values
-
USER_DICT_PATH
private static final java.lang.String USER_DICT_PATH
- See Also:
- Constant Field Values
-
USER_DICT_ENCODING
private static final java.lang.String USER_DICT_ENCODING
- See Also:
- Constant Field Values
-
DECOMPOUND_MODE
private static final java.lang.String DECOMPOUND_MODE
- See Also:
- Constant Field Values
-
OUTPUT_UNKNOWN_UNIGRAMS
private static final java.lang.String OUTPUT_UNKNOWN_UNIGRAMS
- See Also:
- Constant Field Values
-
DISCARD_PUNCTUATION
private static final java.lang.String DISCARD_PUNCTUATION
- See Also:
- Constant Field Values
-
userDictionaryPath
private final java.lang.String userDictionaryPath
-
userDictionaryEncoding
private final java.lang.String userDictionaryEncoding
-
userDictionary
private UserDictionary userDictionary
-
mode
private final KoreanTokenizer.DecompoundMode mode
-
outputUnknownUnigrams
private final boolean outputUnknownUnigrams
-
discardPunctuation
private final boolean discardPunctuation
-
-
Method Detail
-
inform
public void inform(ResourceLoader loader) throws java.io.IOException
Description copied from interface:ResourceLoaderAware
Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
inform
in interfaceResourceLoaderAware
- Throws:
java.io.IOException
-
create
public KoreanTokenizer create(AttributeFactory factory)
Description copied from class:TokenizerFactory
Creates a TokenStream of the specified input using the given AttributeFactory- Specified by:
create
in classTokenizerFactory
-
-