Package com.ibm.icu.impl
Class BMPSet
java.lang.Object
com.ibm.icu.impl.BMPSet
Helper class for frozen UnicodeSets, implements contains() and span() optimized for BMP code points.
Latin-1: Look up bytes.
2-byte characters: Bits organized vertically.
3-byte characters: Use zero/one/mixed data per 64-block in U+0000..U+FFFF, with mixed for illegal ranges.
Supplementary characters: Binary search over
the supplementary part of the parent set's inversion list.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate int[]
One bit per 64 BMP code points.private boolean[]
One boolean ('true' or 'false') per Latin-1 character.private final int[]
The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points.private int[]
Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000).private final int
private int[]
One bit per code point from U+0000..U+07FF.static int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionboolean
contains
(int c) private final boolean
containsSlow
(int c, int lo, int hi) private int
findCodePoint
(int c, int lo, int hi) Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range.private void
initBits()
private static void
set32x64Bits
(int[] table, int start, int limit) Set bits in a bit rectangle in "vertical" bit organization.final int
span
(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount) Span the initial substring for which each character c has spanCondition==contains(c).final int
spanBack
(CharSequence s, int limit, UnicodeSet.SpanCondition spanCondition) Symmetrical with span().
-
Field Details
-
U16_SURROGATE_OFFSET
public static int U16_SURROGATE_OFFSET -
latin1Contains
private boolean[] latin1ContainsOne boolean ('true' or 'false') per Latin-1 character. -
table7FF
private int[] table7FFOne bit per code point from U+0000..U+07FF. The bits are organized vertically; consecutive code points correspond to the same bit positions in consecutive table words. With code point parts lead=c{10..6} trail=c{5..0} it is set.contains(c)==(table7FF[trail] bit lead) Bits for 0..FF are unused (0). -
bmpBlockBits
private int[] bmpBlockBitsOne bit per 64 BMP code points. The bits are organized vertically; consecutive 64-code point blocks correspond to the same bit position in consecutive table words. With code point parts lead=c{15..12} t1=c{11..6} test bits (lead+16) and lead in bmpBlockBits[t1]. If the upper bit is 0, then the lower bit indicates if contains(c) for all code points in the 64-block. If the upper bit is 1, then the block is mixed and set.contains(c) must be called. Bits for 0..7FF are unused (0). -
list4kStarts
private int[] list4kStartsInversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000). U+0800 is the first 3-byte-UTF-8 code point. Code points below U+0800 are always looked up in the bit tables. The last pair of indexes is for finding supplementary code points. -
list
private final int[] listThe inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points. The list is terminated with list[listLength-1]=0x110000. -
listLength
private final int listLength
-
-
Constructor Details
-
BMPSet
public BMPSet(int[] parentList, int parentListLength) -
BMPSet
-
-
Method Details
-
contains
public boolean contains(int c) -
span
public final int span(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount) Span the initial substring for which each character c has spanCondition==contains(c). It must be spanCondition==0 or 1.- Parameters:
start
- The start indexoutCount
- If not null: Receives the number of code points in the span.- Returns:
- the limit (exclusive end) of the span NOTE: to reduce the overhead of function call to contains(c), it is manually inlined here. Check for sufficient length for trail unit for each surrogate pair. Handle single surrogates as surrogate code points as usual in ICU.
-
spanBack
Symmetrical with span(). Span the trailing substring for which each character c has spanCondition==contains(c). It must be s.length >= limit and spanCondition==0 or 1.- Returns:
- The string index which starts the span (i.e. inclusive).
-
set32x64Bits
private static void set32x64Bits(int[] table, int start, int limit) Set bits in a bit rectangle in "vertical" bit organization. startinvalid input: '<'limitinvalid input: '<'=0x800 -
initBits
private void initBits() -
findCodePoint
private int findCodePoint(int c, int lo, int hi) Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range. For restricting the search for finding in the range start..end, pass in lo=findCodePoint(start) and hi=findCodePoint(end) with 0invalid input: '<'=loinvalid input: '<'=hiinvalid input: '<'len. findCodePoint(c) defaults to lo=0 and hi=len-1.- Parameters:
c
- a character in a subrange of MIN_VALUE..MAX_VALUElo
- The lowest index to be returned.hi
- The highest index to be returned.- Returns:
- the smallest integer i in the range lo..hi, inclusive, such that c invalid input: '<' list[i]
-
containsSlow
private final boolean containsSlow(int c, int lo, int hi)
-