Package com.ibm.icu.impl.coll
Class CollationFCD
java.lang.Object
com.ibm.icu.impl.coll.CollationFCD
Data and functions for the FCD check fast path.
The fast path looks at a pair of 16-bit code units and checks
whether there is an FCD boundary between them;
there is if the first unit has a trailing ccc=0 (!hasTccc(first))
or the second unit has a leading ccc=0 (!hasLccc(second)),
or both.
When the fast path finds a possible non-boundary,
then the FCD check slow path looks at the actual sequence of FCD values.
This is a pure optimization.
The fast path must at least find all possible non-boundaries.
If the fast path is too pessimistic, it costs performance.
For a pair of BMP characters, the fast path tests are precise (1 bit per character).
For a supplementary code point, the two units are its lead and trail surrogates.
We set hasTccc(lead)=true if any of its 1024 associated supplementary code points
has lccc!=0 or tccc!=0.
We set hasLccc(trail)=true for all trail surrogates.
As a result, we leave the fast path if the lead surrogate might start a
supplementary code point that is not FCD-inert.
(So the fast path need not detect that there is a surrogate pair,
nor look ahead to the next full code point.)
hasLccc(lead)=true if any of its 1024 associated supplementary code points
has lccc!=0, for fast boundary checking between BMP & supplementary.
hasTccc(trail)=false:
It should only be tested for unpaired trail surrogates which are FCD-inert.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
hasLccc
(int c) static boolean
hasTccc
(int c) (package private) static boolean
isFCD16OfTibetanCompositeVowel
(int fcd16) Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results.(package private) static boolean
maybeTibetanCompositeVowel
(int c) Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results.(package private) static boolean
mayHaveLccc
(int c)
-
Field Details
-
lcccIndex
private static final byte[] lcccIndex -
lcccBits
private static final int[] lcccBits -
tcccIndex
private static final byte[] tcccIndex -
tcccBits
private static final int[] tcccBits
-
-
Constructor Details
-
CollationFCD
public CollationFCD()
-
-
Method Details
-
hasLccc
public static boolean hasLccc(int c) -
hasTccc
public static boolean hasTccc(int c) -
mayHaveLccc
static boolean mayHaveLccc(int c) -
maybeTibetanCompositeVowel
static boolean maybeTibetanCompositeVowel(int c) Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results. This is a fast and imprecise test.- Parameters:
c
- a code point- Returns:
- true if c is U+0F73, U+0F75 or U+0F81 or one of several other Tibetan characters
-
isFCD16OfTibetanCompositeVowel
static boolean isFCD16OfTibetanCompositeVowel(int fcd16) Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results. They have distinct lccc/tccc combinations: 129/130 or 129/132.- Parameters:
fcd16
- the FCD value (lccc/tccc combination) of a code point- Returns:
- true if fcd16 is from U+0F73, U+0F75 or U+0F81
-