Class UserDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ja.dict.UserDictionary
-
- All Implemented Interfaces:
Dictionary
public final class UserDictionary extends java.lang.Object implements Dictionary
Class for building a User Dictionary. This class allows for custom segmentation of phrases.
-
-
Field Summary
Fields Modifier and Type Field Description private static int
CUSTOM_DICTIONARY_WORD_ID_OFFSET
private java.lang.String[]
data
private static int[][]
EMPTY_RESULT
private TokenInfoFST
fst
static int
LEFT_ID
static int
RIGHT_ID
private int[][]
segmentations
static int
WORD_COST
-
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
-
Constructor Summary
Constructors Modifier Constructor Description private
UserDictionary(java.util.List<java.lang.String[]> featureEntries)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private java.lang.String[]
getAllFeaturesArray(int wordId)
java.lang.String
getBaseForm(int wordId, char[] surface, int off, int len)
Get base form of wordprivate java.lang.String
getFeature(int wordId, int... fields)
TokenInfoFST
getFST()
java.lang.String
getInflectionForm(int wordId)
Get inflection form of tokensjava.lang.String
getInflectionType(int wordId)
Get inflection type of tokensint
getLeftId(int wordId)
Get left id of specified wordjava.lang.String
getPartOfSpeech(int wordId)
Get Part-Of-Speech of tokensjava.lang.String
getPronunciation(int wordId, char[] surface, int off, int len)
Get pronunciation of tokensjava.lang.String
getReading(int wordId, char[] surface, int off, int len)
Get reading of tokensint
getRightId(int wordId)
Get right id of specified wordint
getWordCost(int wordId)
Get word cost of specified wordint[][]
lookup(char[] chars, int off, int len)
Lookup words in textint[]
lookupSegmentation(int phraseID)
static UserDictionary
open(java.io.Reader reader)
private int[][]
toIndexArray(java.util.Map<java.lang.Integer,int[]> input)
Convert Map of index and wordIdAndLength to array of {wordId, index, length}
-
-
-
Field Detail
-
fst
private final TokenInfoFST fst
-
segmentations
private final int[][] segmentations
-
data
private final java.lang.String[] data
-
CUSTOM_DICTIONARY_WORD_ID_OFFSET
private static final int CUSTOM_DICTIONARY_WORD_ID_OFFSET
- See Also:
- Constant Field Values
-
WORD_COST
public static final int WORD_COST
- See Also:
- Constant Field Values
-
LEFT_ID
public static final int LEFT_ID
- See Also:
- Constant Field Values
-
RIGHT_ID
public static final int RIGHT_ID
- See Also:
- Constant Field Values
-
EMPTY_RESULT
private static final int[][] EMPTY_RESULT
-
-
Method Detail
-
open
public static UserDictionary open(java.io.Reader reader) throws java.io.IOException
- Throws:
java.io.IOException
-
lookup
public int[][] lookup(char[] chars, int off, int len) throws java.io.IOException
Lookup words in text- Parameters:
chars
- textoff
- offset into textlen
- length of text- Returns:
- array of {wordId, position, length}
- Throws:
java.io.IOException
-
getFST
public TokenInfoFST getFST()
-
toIndexArray
private int[][] toIndexArray(java.util.Map<java.lang.Integer,int[]> input)
Convert Map of index and wordIdAndLength to array of {wordId, index, length}- Returns:
- array of {wordId, index, length}
-
lookupSegmentation
public int[] lookupSegmentation(int phraseID)
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:Dictionary
Get left id of specified word- Specified by:
getLeftId
in interfaceDictionary
- Returns:
- left id
-
getRightId
public int getRightId(int wordId)
Description copied from interface:Dictionary
Get right id of specified word- Specified by:
getRightId
in interfaceDictionary
- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:Dictionary
Get word cost of specified word- Specified by:
getWordCost
in interfaceDictionary
- Returns:
- word's cost
-
getReading
public java.lang.String getReading(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get reading of tokens- Specified by:
getReading
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
public java.lang.String getPartOfSpeech(int wordId)
Description copied from interface:Dictionary
Get Part-Of-Speech of tokens- Specified by:
getPartOfSpeech
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Part-Of-Speech of the token
-
getBaseForm
public java.lang.String getBaseForm(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get base form of word- Specified by:
getBaseForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getPronunciation
public java.lang.String getPronunciation(int wordId, char[] surface, int off, int len)
Description copied from interface:Dictionary
Get pronunciation of tokens- Specified by:
getPronunciation
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
public java.lang.String getInflectionType(int wordId)
Description copied from interface:Dictionary
Get inflection type of tokens- Specified by:
getInflectionType
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
public java.lang.String getInflectionForm(int wordId)
Description copied from interface:Dictionary
Get inflection form of tokens- Specified by:
getInflectionForm
in interfaceDictionary
- Parameters:
wordId
- word ID of token- Returns:
- inflection form, or null
-
getAllFeaturesArray
private java.lang.String[] getAllFeaturesArray(int wordId)
-
getFeature
private java.lang.String getFeature(int wordId, int... fields)
-
-