Class BMPSet

java.lang.Object
com.ibm.icu.impl.BMPSet

public final class BMPSet extends Object
Helper class for frozen UnicodeSets, implements contains() and span() optimized for BMP code points. Latin-1: Look up bytes. 2-byte characters: Bits organized vertically. 3-byte characters: Use zero/one/mixed data per 64-block in U+0000..U+FFFF, with mixed for illegal ranges. Supplementary characters: Binary search over the supplementary part of the parent set's inversion list.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private int[]
    One bit per 64 BMP code points.
    private boolean[]
    One boolean ('true' or 'false') per Latin-1 character.
    private final int[]
    The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points.
    private int[]
    Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000).
    private final int
     
    private int[]
    One bit per code point from U+0000..U+07FF.
    static int
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    BMPSet(int[] parentList, int parentListLength)
     
    BMPSet(BMPSet otherBMPSet, int[] newParentList, int newParentListLength)
     
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    contains(int c)
     
    private final boolean
    containsSlow(int c, int lo, int hi)
     
    private int
    findCodePoint(int c, int lo, int hi)
    Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range.
    private void
     
    private static void
    set32x64Bits(int[] table, int start, int limit)
    Set bits in a bit rectangle in "vertical" bit organization.
    final int
    span(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
    Span the initial substring for which each character c has spanCondition==contains(c).
    final int
    spanBack(CharSequence s, int limit, UnicodeSet.SpanCondition spanCondition)
    Symmetrical with span().

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • U16_SURROGATE_OFFSET

      public static int U16_SURROGATE_OFFSET
    • latin1Contains

      private boolean[] latin1Contains
      One boolean ('true' or 'false') per Latin-1 character.
    • table7FF

      private int[] table7FF
      One bit per code point from U+0000..U+07FF. The bits are organized vertically; consecutive code points correspond to the same bit positions in consecutive table words. With code point parts lead=c{10..6} trail=c{5..0} it is set.contains(c)==(table7FF[trail] bit lead) Bits for 0..FF are unused (0).
    • bmpBlockBits

      private int[] bmpBlockBits
      One bit per 64 BMP code points. The bits are organized vertically; consecutive 64-code point blocks correspond to the same bit position in consecutive table words. With code point parts lead=c{15..12} t1=c{11..6} test bits (lead+16) and lead in bmpBlockBits[t1]. If the upper bit is 0, then the lower bit indicates if contains(c) for all code points in the 64-block. If the upper bit is 1, then the block is mixed and set.contains(c) must be called. Bits for 0..7FF are unused (0).
    • list4kStarts

      private int[] list4kStarts
      Inversion list indexes for restricted binary searches in findCodePoint(), from findCodePoint(U+0800, U+1000, U+2000, .., U+F000, U+10000). U+0800 is the first 3-byte-UTF-8 code point. Code points below U+0800 are always looked up in the bit tables. The last pair of indexes is for finding supplementary code points.
    • list

      private final int[] list
      The inversion list of the parent set, for the slower contains() implementation for mixed BMP blocks and for supplementary code points. The list is terminated with list[listLength-1]=0x110000.
    • listLength

      private final int listLength
  • Constructor Details

    • BMPSet

      public BMPSet(int[] parentList, int parentListLength)
    • BMPSet

      public BMPSet(BMPSet otherBMPSet, int[] newParentList, int newParentListLength)
  • Method Details

    • contains

      public boolean contains(int c)
    • span

      public final int span(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount)
      Span the initial substring for which each character c has spanCondition==contains(c). It must be spanCondition==0 or 1.
      Parameters:
      start - The start index
      outCount - If not null: Receives the number of code points in the span.
      Returns:
      the limit (exclusive end) of the span NOTE: to reduce the overhead of function call to contains(c), it is manually inlined here. Check for sufficient length for trail unit for each surrogate pair. Handle single surrogates as surrogate code points as usual in ICU.
    • spanBack

      public final int spanBack(CharSequence s, int limit, UnicodeSet.SpanCondition spanCondition)
      Symmetrical with span(). Span the trailing substring for which each character c has spanCondition==contains(c). It must be s.length >= limit and spanCondition==0 or 1.
      Returns:
      The string index which starts the span (i.e. inclusive).
    • set32x64Bits

      private static void set32x64Bits(int[] table, int start, int limit)
      Set bits in a bit rectangle in "vertical" bit organization. startinvalid input: '<'limitinvalid input: '<'=0x800
    • initBits

      private void initBits()
    • findCodePoint

      private int findCodePoint(int c, int lo, int hi)
      Same as UnicodeSet.findCodePoint(int c) except that the binary search is restricted for finding code points in a certain range. For restricting the search for finding in the range start..end, pass in lo=findCodePoint(start) and hi=findCodePoint(end) with 0invalid input: '<'=loinvalid input: '<'=hiinvalid input: '<'len. findCodePoint(c) defaults to lo=0 and hi=len-1.
      Parameters:
      c - a character in a subrange of MIN_VALUE..MAX_VALUE
      lo - The lowest index to be returned.
      hi - The highest index to be returned.
      Returns:
      the smallest integer i in the range lo..hi, inclusive, such that c invalid input: '<' list[i]
    • containsSlow

      private final boolean containsSlow(int c, int lo, int hi)