Package com.ibm.icu.text
Class TransliteratorParser
java.lang.Object
com.ibm.icu.text.TransliteratorParser
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate class
This class implements the SymbolTable interface.private static class
RuleBody subclass for a String[] array.private static class
A private abstract class representing the interface to rule source code that is broken up into lines.private static class
A class representing one side of a rule. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final char
private static final char
private static final char
private static final char
private static final char
PUBLIC data member containing the parsed compound filter, if any.private static final char
private static final char
private RuleBasedTransliterator.Data
The current data object for which we are parsing rulesprivate static final char
private static final char
PUBLIC data member.private int
private static final char
private static final String
private int
The stand-in character for the 'dot' set, represented by '.' in patterns.private static final char
private static final char
private static final char
private static final char
private static final char
private static final String
private static final String
private static final int
PUBLIC data member.private static UnicodeSet
private static UnicodeSet
private static UnicodeSet
private static final char
private static final char
private static final String
private TransliteratorParser.ParseData
Temporary symbol table used during parsing.private static final char
private static final char
private static final char
private static final char
private static final char
private List
<StringMatcher> Vector of StringMatcher objects for segments.private StringBuffer
String of standins for segments.private String
When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];".private static final char
private char
The last available stand-in for variables.Temporary table of variable names.private char
The next available stand-in for variables.Temporary vector of set variables.private static final char
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate void
appendVariableDef
(String name, StringBuffer buf) Append the value of the given variable name to the given StringBuffer.private void
checkVariableRange
(int ch, String rule, int start) Assert that the given character is NOT within the variable range.(package private) char
generateStandInFor
(Object obj) Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer.(package private) char
Return the stand-in for the dot set.char
getSegmentStandin
(int seg) Return the standin for segment seg (1-based).void
Parse a set of rules.private int
parsePragma
(String rule, int pos, int limit) Parse a pragma.private int
MAIN PARSER.(package private) void
parseRules
(TransliteratorParser.RuleBody ruleArray, int dir) Parse an array of zero or more rules.private final char
parseSet
(String rule, ParsePosition pos) Parse a UnicodeSet out, store it, and return the stand-in character used to represent it.private void
pragmaMaximumBackup
(int backup) Set the maximum backup to 'backup', in response to a pragma statement.private void
Begin normalizing all rules using the given mode, in response to a pragma statement.(package private) static boolean
resemblesPragma
(String rule, int pos, int limit) Return true if the given rule looks like a pragma.(package private) static final int
void
setSegmentObject
(int seg, StringMatcher obj) Set the object for segment seg (1-based).private void
setVariableRange
(int start, int end) Set the variable range to [start, end] (inclusive).(package private) static final void
syntaxError
(String msg, String rule, int start) Throw an exception indicating a syntax error.
-
Field Details
-
dataVector
PUBLIC data member. A Vector of RuleBasedTransliterator.Data objects, one for each discrete group of rules in the rule set -
idBlockVector
PUBLIC data member. A Vector of Strings containing all of the ID blocks in the rule set -
curData
The current data object for which we are parsing rules -
compoundFilter
PUBLIC data member containing the parsed compound filter, if any. -
direction
private int direction -
parseData
Temporary symbol table used during parsing. -
variablesVector
Temporary vector of set variables. When parsing is complete, this is copied into the array data.variables. As with data.variables, element 0 corresponds to character data.variablesBase. -
variableNames
Temporary table of variable names. When parsing is complete, this is copied into data.variableNames. -
segmentStandins
String of standins for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc. -
segmentObjects
Vector of StringMatcher objects for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc. -
variableNext
private char variableNextThe next available stand-in for variables. This starts at some point in the private use area (discovered dynamically) and increments up towardvariableLimit
. At any point during parsing, available variables arevariableNext..variableLimit-1
. -
variableLimit
private char variableLimitThe last available stand-in for variables. This is discovered dynamically. At any point during parsing, available variables arevariableNext..variableLimit-1
. During variable definition we use the special value variableLimit-1 as a placeholder. -
undefinedVariableName
When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];". Instead, we save the name of the undefined variable, and substitute in the placeholder char variableLimit - 1, and decrement variableLimit. -
dotStandIn
private int dotStandInThe stand-in character for the 'dot' set, represented by '.' in patterns. This is allocated the first time it is needed, and reused thereafter. -
ID_TOKEN
- See Also:
-
ID_TOKEN_LEN
private static final int ID_TOKEN_LEN- See Also:
-
VARIABLE_DEF_OP
private static final char VARIABLE_DEF_OP- See Also:
-
FORWARD_RULE_OP
private static final char FORWARD_RULE_OP- See Also:
-
REVERSE_RULE_OP
private static final char REVERSE_RULE_OP- See Also:
-
FWDREV_RULE_OP
private static final char FWDREV_RULE_OP- See Also:
-
OPERATORS
- See Also:
-
HALF_ENDERS
- See Also:
-
QUOTE
private static final char QUOTE- See Also:
-
ESCAPE
private static final char ESCAPE- See Also:
-
END_OF_RULE
private static final char END_OF_RULE- See Also:
-
RULE_COMMENT_CHAR
private static final char RULE_COMMENT_CHAR- See Also:
-
CONTEXT_ANTE
private static final char CONTEXT_ANTE- See Also:
-
CONTEXT_POST
private static final char CONTEXT_POST- See Also:
-
CURSOR_POS
private static final char CURSOR_POS- See Also:
-
CURSOR_OFFSET
private static final char CURSOR_OFFSET- See Also:
-
ANCHOR_START
private static final char ANCHOR_START- See Also:
-
KLEENE_STAR
private static final char KLEENE_STAR- See Also:
-
ONE_OR_MORE
private static final char ONE_OR_MORE- See Also:
-
ZERO_OR_ONE
private static final char ZERO_OR_ONE- See Also:
-
DOT
private static final char DOT- See Also:
-
DOT_SET
- See Also:
-
SEGMENT_OPEN
private static final char SEGMENT_OPEN- See Also:
-
SEGMENT_CLOSE
private static final char SEGMENT_CLOSE- See Also:
-
FUNCTION
private static final char FUNCTION- See Also:
-
ALT_REVERSE_RULE_OP
private static final char ALT_REVERSE_RULE_OP- See Also:
-
ALT_FORWARD_RULE_OP
private static final char ALT_FORWARD_RULE_OP- See Also:
-
ALT_FWDREV_RULE_OP
private static final char ALT_FWDREV_RULE_OP- See Also:
-
ALT_FUNCTION
private static final char ALT_FUNCTION- See Also:
-
ILLEGAL_TOP
-
ILLEGAL_SEG
-
ILLEGAL_FUNC
-
-
Constructor Details
-
TransliteratorParser
public TransliteratorParser()Constructor.
-
-
Method Details
-
parse
Parse a set of rules. After the parse completes, examine the public data members for results. -
parseRules
Parse an array of zero or more rules. The strings in the array are treated as if they were concatenated together, with rule terminators inserted between array elements if not present already. Any previous rules are discarded. Typically this method is called exactly once, during construction. The member this.data will be set to null if there are no rules.- Throws:
IllegalIcuArgumentException
- if there is a syntax error in the rules
-
parseRule
MAIN PARSER. Parse the next rule in the given rule string, starting at pos. Return the index after the last character parsed. Do not parse characters at or after limit. Important: The character at pos must be a non-whitespace character that is not the comment character. This method handles quoting, escaping, and whitespace removal. It parses the end-of-rule character. It recognizes context and cursor indicators. Once it does a lexical breakdown of the rule at pos, it creates a rule object and adds it to our rule list. This method is tightly coupled to the inner class RuleHalf. -
setVariableRange
private void setVariableRange(int start, int end) Set the variable range to [start, end] (inclusive). -
checkVariableRange
Assert that the given character is NOT within the variable range. If it is, signal an error. This is necessary to ensure that the variable range does not overlap characters used in a rule. -
pragmaMaximumBackup
private void pragmaMaximumBackup(int backup) Set the maximum backup to 'backup', in response to a pragma statement. -
pragmaNormalizeRules
Begin normalizing all rules using the given mode, in response to a pragma statement. -
resemblesPragma
Return true if the given rule looks like a pragma.- Parameters:
pos
- offset to the first non-whitespace character of the rule.limit
- pointer past the last character of the rule.
-
parsePragma
Parse a pragma. This method assumes resemblesPragma() has already returned true.- Parameters:
pos
- offset to the first non-whitespace character of the rule.limit
- pointer past the last character of the rule.- Returns:
- the position index after the final ';' of the pragma, or -1 on failure.
-
syntaxError
Throw an exception indicating a syntax error. Search the rule string for the probable end of the rule. Of course, if the error is that the end of rule marker is missing, then the rule end will not be found. In any case the rule start will be correctly reported.- Parameters:
msg
- error descriptionrule
- pattern stringstart
- position of first character of current rule
-
ruleEnd
-
parseSet
Parse a UnicodeSet out, store it, and return the stand-in character used to represent it. -
generateStandInFor
Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer. Store the object. -
getSegmentStandin
public char getSegmentStandin(int seg) Return the standin for segment seg (1-based). -
setSegmentObject
Set the object for segment seg (1-based). -
getDotStandIn
char getDotStandIn()Return the stand-in for the dot set. It is allocated the first time and reused thereafter. -
appendVariableDef
Append the value of the given variable name to the given StringBuffer.- Throws:
IllegalIcuArgumentException
- if the name is unknown.
-