Class CharsetSelector

java.lang.Object
com.ibm.icu.charset.CharsetSelector

public final class CharsetSelector extends Object
Charset Selector A charset selector is built with a list of charset names and given an input CharSequence returns the list of names the corresponding charsets which can convert the CharSequence.
  • Field Details

    • trie

      private IntTrie trie
    • pv

      private int[] pv
    • encodings

      private String[] encodings
  • Constructor Details

    • CharsetSelector

      public CharsetSelector(List<String> charsetList, UnicodeSet excludedCodePoints, int mappingTypes)
      Construct a CharsetSelector from a list of charset names.
      Parameters:
      charsetList - a list of charset names in the form of strings. If charsetList is empty, a selector for all available charset is constructed.
      excludedCodePoints - a set of code points to be excluded from consideration. Excluded code points appearing in the input CharSequence do not change the selection result. It could be empty when no code point should be excluded.
      mappingTypes - an int which determines whether to consider only roundtrip mappings or also fallbacks, e.g. CharsetICU.ROUNDTRIP_SET. See CharsetICU.java for the constants that are currently supported.
      Throws:
      IllegalArgumentException - if the parameters is invalid.
      IllegalCharsetNameException - If the given charset name is illegal.
      UnsupportedCharsetException - If no support for the named charset is available in this instance of the Java virtual machine.
  • Method Details

    • generateSelectorData

      private void generateSelectorData(PropsVectors pvec, UnicodeSet excludedCodePoints, int mappingTypes)
    • intersectMasks

      private boolean intersectMasks(int[] dest, int pvIndex, int len)
    • selectForMask

      private List<String> selectForMask(int[] mask)
    • countOnes

      private int countOnes(int[] mask, int len)
    • selectForString

      public List<String> selectForString(CharSequence unicodeText)
      Select charsets that can map all characters in a CharSequence, ignoring the excluded code points.
      Parameters:
      unicodeText - a CharSequence. It could be empty.
      Returns:
      a list that contains charset names in the form of strings. The returned encoding names and their order will be the same as supplied when building the selector.