Package | Description |
---|---|
org.apache.lucene.analysis |
Text analysis.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.bn |
Analyzer for Bengali Language.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
|
org.apache.lucene.analysis.ckb |
Analyzer for Sorani Kurdish.
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.commongrams |
Construct n-grams for frequently occurring terms and phrases.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.core |
Basic, general-purpose analysis components.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analyzer for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.icu |
Analysis components based on ICU
|
org.apache.lucene.analysis.icu.segmentation |
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analyzer for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.ko |
Analyzer for Korean.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.minhash |
MinHash filtering (for LSH).
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous Tokenstreams.
|
org.apache.lucene.analysis.morfologik |
This package provides dictionary-driven lemmatization ("accurate stemming")
filter and analyzer for the Polish Language, driven by the
Morfologik library developed
by Dawid Weiss and Marcin Miłkowski.
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.path |
Analysis components for path-like strings such as filenames.
|
org.apache.lucene.analysis.pattern |
Set of components for pattern-based (regex) analysis.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.phonetic |
Analysis components for phonetic search.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters.
|
org.apache.lucene.analysis.sinks | |
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.sr |
Analyzer for Serbian.
|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29. |
org.apache.lucene.analysis.stempel |
Stempel: Algorithmic Stemmer
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.analysis.util |
Utility functions for text analysis.
|
org.apache.lucene.analysis.wikipedia |
Tokenizer that is aware of Wikipedia syntax.
|
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.search |
Code to search indices.
|
org.apache.lucene.search.highlight |
Highlighting search terms.
|
org.apache.lucene.search.suggest.analyzing |
Analyzer based autosuggest.
|
org.apache.lucene.search.suggest.document |
Support for document suggestion
|
org.apache.lucene.util |
Some utility classes.
|
Modifier and Type | Class and Description |
---|---|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CannedBinaryTokenStream
TokenStream from a canned list of binary (BytesRef-based)
tokens.
|
class |
CannedTokenStream
TokenStream from a canned list of Tokens.
|
class |
CrankyTokenFilter
Throws IOException from random Tokenstream methods.
|
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
class |
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position>
An abstract TokenFilter to make it easier to build graph
token filters requiring some lookahead.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
MockFixedLengthPayloadFilter
TokenFilter that adds random fixed-length payloads.
|
class |
MockGraphTokenFilter
Randomly inserts overlapped (posInc=0) tokens with
posLength sometimes > 1.
|
class |
MockHoleInjectingTokenFilter
Randomly injects holes (similar to what a stopfilter would do)
|
class |
MockLowerCaseFilter
A lowercasing
TokenFilter . |
class |
MockRandomLookaheadTokenFilter
Uses
LookaheadTokenFilter to randomly peek at future tokens. |
class |
MockSynonymFilter
adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".
|
class |
MockTokenFilter
A tokenfilter for testing that removes terms accepted by a DFA.
|
class |
MockTokenizer
Tokenizer for testing.
|
class |
MockVariableLengthPayloadFilter
TokenFilter that adds random variable-length payloads.
|
class |
SimplePayloadFilter
Simple payload filter that sets the payload as pos: XXXX
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TokenStream
|
class |
ValidatingTokenFilter
A TokenFilter that checks consistency of the tokens (eg
offsets are consistent with one another).
|
Constructor and Description |
---|
TokenStream(AttributeSource input)
A TokenStream that uses the same attributes as the supplied one.
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Modifier and Type | Class and Description |
---|---|
class |
BulgarianStemFilter
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Modifier and Type | Class and Description |
---|---|
class |
BengaliNormalizationFilter
A
TokenFilter that applies BengaliNormalizer to normalize the
orthography. |
class |
BengaliStemFilter
A
TokenFilter that applies BengaliStemmer to stem Bengali words. |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Modifier and Type | Class and Description |
---|---|
class |
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKWidthFilter
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
Modifier and Type | Class and Description |
---|---|
class |
SoraniNormalizationFilter
A
TokenFilter that applies SoraniNormalizer to normalize the
orthography. |
class |
SoraniStemFilter
A
TokenFilter that applies SoraniStemmer to stem Sorani words. |
Modifier and Type | Class and Description |
---|---|
class |
HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text.
|
Modifier and Type | Class and Description |
---|---|
class |
CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing.
|
class |
CommonGramsQueryFilter
Wrap a CommonGramsFilter optimizing phrase queries by only returning single
words when they are not a member of a bigram.
|
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Modifier and Type | Class and Description |
---|---|
class |
DecimalDigitFilter
Folds all Unicode digits in
[:General_Category=Decimal_Number:]
to Basic Latin digits (0-9 ). |
class |
FlattenGraphFilter
Converts an incoming graph token stream, such as one from
SynonymGraphFilter , into a flat form so that
all nodes form a single linear chain with no side paths. |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LowerCaseTokenizer
Deprecated.
Use
LetterTokenizer followed by LowerCaseFilter |
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
UnicodeWhitespaceTokenizer
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
class |
UpperCaseFilter
Normalizes token text to UPPER CASE.
|
class |
WhitespaceTokenizer
A tokenizer that divides text at whitespace characters as defined by
Character.isWhitespace(int) . |
Modifier and Type | Class and Description |
---|---|
class |
CzechStemFilter
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Modifier and Type | Class and Description |
---|---|
class |
GermanLightStemFilter
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Modifier and Type | Class and Description |
---|---|
class |
EnglishMinimalStemFilter
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter
A high-performance kstem filter for english.
|
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
Modifier and Type | Class and Description |
---|---|
class |
SpanishLightStemFilter
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Modifier and Type | Class and Description |
---|---|
class |
FinnishLightStemFilter
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Modifier and Type | Class and Description |
---|---|
class |
FrenchLightStemFilter
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
Modifier and Type | Class and Description |
---|---|
class |
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Modifier and Type | Class and Description |
---|---|
class |
GalicianMinimalStemFilter
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Modifier and Type | Class and Description |
---|---|
class |
HindiNormalizationFilter
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Modifier and Type | Class and Description |
---|---|
class |
HungarianLightStemFilter
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Modifier and Type | Class and Description |
---|---|
class |
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Modifier and Type | Class and Description |
---|---|
class |
ICUFoldingFilter
A TokenFilter that applies search term folding to Unicode text,
applying foldings from UTR#30 Character Foldings.
|
class |
ICUNormalizer2Filter
Normalize token text with ICU's
Normalizer2 |
class |
ICUTransformFilter
A
TokenFilter that transforms text with ICU. |
Modifier and Type | Class and Description |
---|---|
class |
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
|
Modifier and Type | Class and Description |
---|---|
class |
IndonesianStemFilter
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Modifier and Type | Class and Description |
---|---|
class |
IndicNormalizationFilter
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
Modifier and Type | Class and Description |
---|---|
class |
ItalianLightStemFilter
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Modifier and Type | Class and Description |
---|---|
class |
JapaneseBaseFormFilter
Replaces term text with the
BaseFormAttribute . |
class |
JapaneseKatakanaStemFilter
A
TokenFilter that normalizes common katakana spelling variations
ending in a long sound character by removing this character (U+30FC). |
class |
JapaneseNumberFilter
A
TokenFilter that normalizes Japanese numbers (kansūji) to regular Arabic
decimal numbers in half-width characters. |
class |
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.
|
class |
JapaneseReadingFormFilter
A
TokenFilter that replaces the term
attribute with the reading of a token in either katakana or romaji form. |
class |
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis.
|
Modifier and Type | Class and Description |
---|---|
class |
KoreanPartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.
|
class |
KoreanReadingFormFilter
Replaces term text with the
ReadingAttribute which is
the Hangul transcription of Hanja characters. |
class |
KoreanTokenizer
Tokenizer for Korean that uses morphological analysis.
|
Modifier and Type | Class and Description |
---|---|
class |
LatvianStemFilter
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Modifier and Type | Class and Description |
---|---|
class |
MinHashFilter
Generate min hash tokens from an incoming stream of tokens.
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CapitalizationFilter
A filter to apply normal capitalization rules to Tokens.
|
class |
CodepointCountFilter
Removes words that are too long or too short from the stream.
|
class |
ConcatenateGraphFilter
Concatenates/Joins every incoming token with a separator into one output token for every path through the
token stream (which is a graph).
|
class |
ConcatenatingTokenStream
A TokenStream that takes an array of input TokenStreams as sources, and
concatenates them together.
|
class |
ConditionalTokenFilter
Allows skipping TokenFilters based on the current set of attributes.
|
class |
DateRecognizerFilter
Filters all tokens that cannot be parsed to a date, using the provided
DateFormat . |
class |
DelimitedTermFrequencyTokenFilter
Characters before the delimiter are the "token", the textual integer after is the term frequency.
|
class |
EmptyTokenStream
An always exhausted token stream.
|
class |
FingerprintFilter
Filter outputs a single token which is a concatenation of the sorted and
de-duplicated set of input tokens.
|
class |
FixBrokenOffsetsFilter
Deprecated.
Fix the token filters that create broken offsets in the first place.
|
class |
HyphenatedWordsFilter
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines.
|
class |
KeepWordFilter
A TokenFilter that only keeps tokens with text contained in the
required words.
|
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordRepeatFilter
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with
KeywordAttribute.setKeyword(boolean) set to true and once set to false . |
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LimitTokenOffsetFilter
Lets all tokens pass through until it sees one with a start offset <= a
configured limit, which won't pass and ends the stream.
|
class |
LimitTokenPositionFilter
This TokenFilter limits its emitted tokens to those with positions that
are not greater than the configured limit.
|
class |
PatternKeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
ProtectedTermFilter
A ConditionalTokenFilter that only applies its wrapped filters to tokens that
are not contained in a protected set.
|
class |
RemoveDuplicatesTokenFilter
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
|
class |
ScandinavianFoldingFilter
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.
|
class |
ScandinavianNormalizationFilter
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ
and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.
|
class |
SetKeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
StemmerOverrideFilter
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
class |
TrimFilter
Trims leading and trailing whitespace from Tokens in the stream.
|
class |
TruncateTokenFilter
A token filter for truncating the terms into a specific length.
|
class |
TypeAsSynonymFilter
Adds the
TypeAttribute.type() as a synonym,
i.e. |
class |
WordDelimiterFilter
Deprecated.
Use
WordDelimiterGraphFilter instead: it produces a correct
token graph so that e.g. PhraseQuery works correctly when it's used in
the search time analyzer. |
class |
WordDelimiterGraphFilter
Splits words into subwords and performs optional transformations on subword
groups, producing a correct token graph so that e.g.
|
Modifier and Type | Class and Description |
---|---|
class |
MorfologikFilter
TokenFilter using Morfologik library to transform input tokens into lemma and
morphosyntactic (POS) tokens. |
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).
|
Modifier and Type | Class and Description |
---|---|
class |
NorwegianLightStemFilter
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Modifier and Type | Class and Description |
---|---|
class |
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.
|
class |
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies.
|
Modifier and Type | Class and Description |
---|---|
class |
PatternCaptureGroupTokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture
group in one or more patterns.
|
class |
PatternReplaceFilter
A TokenFilter which applies a Pattern to each token in the stream,
replacing match occurances with the specified replacement string.
|
class |
PatternTokenizer
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream.
|
class |
SimplePatternSplitTokenizer
|
class |
SimplePatternTokenizer
|
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
TypeAttribute |
class |
TokenOffsetPayloadTokenFilter
Adds the
OffsetAttribute.startOffset()
and OffsetAttribute.endOffset()
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
TypeAttribute a payload. |
Modifier and Type | Class and Description |
---|---|
class |
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.
|
class |
DaitchMokotoffSoundexFilter
Create tokens for phonetic matches based on Daitch–Mokotoff Soundex.
|
class |
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)
|
class |
PhoneticFilter
Create tokens for phonetic matches.
|
Modifier and Type | Class and Description |
---|---|
class |
PortugueseLightStemFilter
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Modifier and Type | Class and Description |
---|---|
class |
RussianLightStemFilter
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
Modifier and Type | Class and Description |
---|---|
class |
FixedShingleFilter
A FixedShingleFilter constructs shingles (token n-grams) from a token stream.
|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
Modifier and Type | Class and Description |
---|---|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee.
|
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Class and Description |
---|---|
class |
SerbianNormalizationFilter
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.
|
class |
SerbianNormalizationRegularFilter
Normalizes Serbian Cyrillic to Latin.
|
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer
A grammar-based tokenizer constructed with JFlex
|
class |
StandardFilter
Deprecated.
StandardFilter is a no-op and can be removed from code
|
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Modifier and Type | Class and Description |
---|---|
class |
StempelFilter
Transforms the token stream as per the stemming algorithm.
|
Modifier and Type | Class and Description |
---|---|
class |
SwedishLightStemFilter
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Modifier and Type | Class and Description |
---|---|
class |
SynonymFilter
Deprecated.
Use
SynonymGraphFilter instead, but be sure to also
use FlattenGraphFilter at index time (not at search time) as well. |
class |
SynonymGraphFilter
Applies single- or multi-token synonyms from a
SynonymMap
to an incoming TokenStream , producing a fully correct graph
output. |
Modifier and Type | Class and Description |
---|---|
class |
ThaiTokenizer
Tokenizer that use
BreakIterator to tokenize Thai text. |
Modifier and Type | Class and Description |
---|---|
class |
ApostropheFilter
Strips all characters after an apostrophe (including the apostrophe itself).
|
class |
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case.
|
Modifier and Type | Class and Description |
---|---|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
SegmentingTokenizerBase
Breaks text into sentences with a
BreakIterator and
allows subclasses to decompose these sentences into words. |
Modifier and Type | Class and Description |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Class and Description |
---|---|
static class |
BaseTermVectorsFormatTestCase.RandomTokenStream
Produces a random TokenStream based off of provided terms.
|
Modifier and Type | Method and Description |
---|---|
AttributeSource |
PostingsEnum.attributes()
Deprecated.
this method is unused and will be removed in 7.0
|
AttributeSource |
FilterLeafReader.FilterTermsEnum.attributes() |
AttributeSource |
FilterLeafReader.FilterPostingsEnum.attributes() |
AttributeSource |
FilteredTermsEnum.attributes()
Returns the related attributes, the returned
AttributeSource
is shared with the delegate TermsEnum . |
AttributeSource |
TermsEnum.attributes()
Returns the related attributes.
|
AttributeSource |
FieldInvertState.getAttributeSource()
Returns the
AttributeSource from the TokenStream that provided the indexed tokens for this
field. |
Modifier and Type | Method and Description |
---|---|
protected TermsEnum |
MultiTermQuery.RewriteMethod.getTermsEnum(MultiTermQuery query,
Terms terms,
AttributeSource atts)
Returns the
MultiTermQuery s TermsEnum |
protected TermsEnum |
FuzzyQuery.getTermsEnum(Terms terms,
AttributeSource atts) |
protected TermsEnum |
AutomatonQuery.getTermsEnum(Terms terms,
AttributeSource atts) |
protected abstract TermsEnum |
MultiTermQuery.getTermsEnum(Terms terms,
AttributeSource atts)
Construct the enumeration to be used, expanding the
pattern term.
|
Constructor and Description |
---|
FuzzyTermsEnum(Terms terms,
AttributeSource atts,
Term term,
int maxEdits,
int prefixLength,
boolean transpositions)
Constructor for enumeration of all terms from specified
reader which share a prefix of
length prefixLength with term and which have at most maxEdits edits. |
Modifier and Type | Class and Description |
---|---|
class |
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|
class |
TokenStreamFromTermVector
TokenStream created from a term vector field.
|
Modifier and Type | Class and Description |
---|---|
class |
SuggestStopFilter
Like
StopFilter except it will not remove the
last token if that token was not followed by some token
separator. |
Modifier and Type | Class and Description |
---|---|
class |
CompletionTokenStream
A
ConcatenateGraphFilter but we can set the payload and provide access to config options. |
Modifier and Type | Method and Description |
---|---|
AttributeSource |
AttributeSource.cloneAttributes()
Performs a clone of all
AttributeImpl instances returned in a new
AttributeSource instance. |
Modifier and Type | Method and Description |
---|---|
void |
AttributeSource.copyTo(AttributeSource target)
Copies the contents of this
AttributeSource to the given target AttributeSource . |
Constructor and Description |
---|
AttributeSource(AttributeSource input)
An AttributeSource that uses the same attributes as the supplied one.
|
Copyright © 2000–2019 The Apache Software Foundation. All rights reserved.