Package com.ibm.icu.text
Class StringMatcher
java.lang.Object
com.ibm.icu.text.StringMatcher
- All Implemented Interfaces:
UnicodeMatcher
,UnicodeReplacer
An object that matches a fixed input string, implementing the
UnicodeMatcher API. This object also implements the
UnicodeReplacer API, allowing it to emit the matched text as
output. Since the match text may contain flexible match elements,
such as UnicodeSets, the emitted text is not the match pattern, but
instead a substring of the actual matched text. Following
convention, the output text is the leftmost match seen up to this
point.
A StringMatcher may represent a segment, in which case it has a
positive segment number. This affects how the matcher converts
itself to a pattern but does not otherwise affect its function.
A StringMatcher that is not a segment should not be used as a
UnicodeReplacer.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final RuleBasedTransliterator.Data
Context object that maps stand-ins to matcher and replacer objects.private int
Limit offset, in the match text, of the rightmost match.private int
Start offset, in the match text, of the rightmost match.private String
The text to be matched.private int
The segment number, 1-based, or 0 if not a segment.Fields inherited from interface com.ibm.icu.text.UnicodeMatcher
ETHER, U_MATCH, U_MISMATCH, U_PARTIAL_MATCH
-
Constructor Summary
ConstructorsConstructorDescriptionStringMatcher
(String theString, int start, int limit, int segmentNum, RuleBasedTransliterator.Data theData) Construct a matcher that matches a substring of the given pattern string.StringMatcher
(String theString, int segmentNum, RuleBasedTransliterator.Data theData) Construct a matcher that matches the given pattern string. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addMatchSetTo
(UnicodeSet toUnionTo) Implementation of UnicodeMatcher API.void
addReplacementSetTo
(UnicodeSet toUnionTo) Union the set of all characters that may output by this object into the given set.int
matches
(Replaceable text, int[] offset, int limit, boolean incremental) Implement UnicodeMatcherboolean
matchesIndexValue
(int v) Implement UnicodeMatcherint
replace
(Replaceable text, int start, int limit, int[] cursor) UnicodeReplacer APIvoid
Remove any match data.toPattern
(boolean escapeUnprintable) Implement UnicodeMatchertoReplacerPattern
(boolean escapeUnprintable) UnicodeReplacer API
-
Field Details
-
pattern
The text to be matched. -
matchStart
private int matchStartStart offset, in the match text, of the rightmost match. -
matchLimit
private int matchLimitLimit offset, in the match text, of the rightmost match. -
segmentNumber
private int segmentNumberThe segment number, 1-based, or 0 if not a segment. -
data
Context object that maps stand-ins to matcher and replacer objects.
-
-
Constructor Details
-
StringMatcher
Construct a matcher that matches the given pattern string.- Parameters:
theString
- the pattern to be matched, possibly containing stand-ins that represent nested UnicodeMatcher objects.segmentNum
- the segment number from 1..n, or 0 if this is not a segment.theData
- context object mapping stand-ins to UnicodeMatcher objects.
-
StringMatcher
public StringMatcher(String theString, int start, int limit, int segmentNum, RuleBasedTransliterator.Data theData) Construct a matcher that matches a substring of the given pattern string.- Parameters:
theString
- the pattern to be matched, possibly containing stand-ins that represent nested UnicodeMatcher objects.start
- first character of theString to be matchedlimit
- index after the last character of theString to be matched.segmentNum
- the segment number from 1..n, or 0 if this is not a segment.theData
- context object mapping stand-ins to UnicodeMatcher objects.
-
-
Method Details
-
matches
Implement UnicodeMatcher- Specified by:
matches
in interfaceUnicodeMatcher
- Parameters:
text
- the text to be matchedoffset
- on input, the index into text at which to begin matching. On output, the limit of the matched text. The number of matched characters is the output value of offset minus the input value. Offset should always point to the HIGH SURROGATE (leading code unit) of a pair of surrogates, both on entry and upon return.limit
- the limit index of text to be matched. Greater than offset for a forward direction match, less than offset for a backward direction match. The last character to be considered for matching will be text.charAt(limit-1) in the forward direction or text.charAt(limit+1) in the backward direction.incremental
- if true, then assume further characters may be inserted at limit and check for partial matching. Otherwise assume the text as given is complete.- Returns:
- a match degree value indicating a full match, a partial match, or a mismatch. If incremental is false then U_PARTIAL_MATCH should never be returned.
-
toPattern
Implement UnicodeMatcher- Specified by:
toPattern
in interfaceUnicodeMatcher
- Parameters:
escapeUnprintable
- if true then convert unprintable character to their hex escape representations, \\uxxxx or \\Uxxxxxxxx. Unprintable characters are those other than U+000A, U+0020..U+007E.
-
matchesIndexValue
public boolean matchesIndexValue(int v) Implement UnicodeMatcher- Specified by:
matchesIndexValue
in interfaceUnicodeMatcher
-
addMatchSetTo
Implementation of UnicodeMatcher API. Union the set of all characters that may be matched by this object into the given set.- Specified by:
addMatchSetTo
in interfaceUnicodeMatcher
- Parameters:
toUnionTo
- the set into which to union the source characters
-
replace
UnicodeReplacer API- Specified by:
replace
in interfaceUnicodeReplacer
- Parameters:
text
- the text to be matchedstart
- inclusive start index of text to be replacedlimit
- exclusive end index of text to be replaced; must be greater than or equal to startcursor
- output parameter for the cursor position. Not all replacer objects will update this, but in a complete tree of replacer objects, representing the entire output side of a transliteration rule, at least one must update it.- Returns:
- the number of 16-bit code units in the text replacing the characters at offsets start..(limit-1) in text
-
toReplacerPattern
UnicodeReplacer API- Specified by:
toReplacerPattern
in interfaceUnicodeReplacer
- Parameters:
escapeUnprintable
- if true then convert unprintable character to their hex escape representations, \\uxxxx or \\Uxxxxxxxx. Unprintable characters are defined by Utility.isUnprintable().
-
resetMatch
public void resetMatch()Remove any match data. This must be called before performing a set of matches with this segment. -
addReplacementSetTo
Union the set of all characters that may output by this object into the given set.- Specified by:
addReplacementSetTo
in interfaceUnicodeReplacer
- Parameters:
toUnionTo
- the set into which to union the output characters
-