Package nom.tam.util

Class ByteParser


  • public class ByteParser
    extends java.lang.Object
    This class provides routines for efficient parsing of data stored in a byte array. This routine is optimized (in theory at least!) for efficiency rather than accuracy. The values read in for doubles or floats may differ in the last bit or so from the standard input utilities, especially in the case where a float is specified as a very long string of digits (substantially longer than the precision of the type).

    The get methods generally are available with or without a length parameter specified. When a length parameter is specified only the bytes with the specified range from the current offset will be search for the number. If no length is specified, the entire buffer from the current offset will be searched.

    The getString method returns a string with leading and trailing white space left intact. For all other get calls, leading white space is ignored. If fillFields is set, then the get methods check that only white space follows valid data and a FormatException is thrown if that is not the case. If fillFields is not set and valid data is found, then the methods return having read as much as possible. E.g., for the sequence "T123.258E13", a getBoolean, getInteger and getFloat call would return true, 123, and 2.58e12 when called in succession.

    • Field Detail

      • EXPONENT_DENORMALISATION_CORR_LIMIT

        private static final int EXPONENT_DENORMALISATION_CORR_LIMIT
        See Also:
        Constant Field Values
      • EXPONENT_DENORMALISATION_FACTOR

        private static final double EXPONENT_DENORMALISATION_FACTOR
        See Also:
        Constant Field Values
      • INFINITY_LOWER

        private static final byte[] INFINITY_LOWER
      • INFINITY_UPPER

        private static final byte[] INFINITY_UPPER
      • INFINITY_LENGTH

        private static final int INFINITY_LENGTH
      • INFINITY_SHORTCUT_LENGTH

        private static final int INFINITY_SHORTCUT_LENGTH
        See Also:
        Constant Field Values
      • NOT_A_NUMBER_LOWER

        private static final byte[] NOT_A_NUMBER_LOWER
      • NOT_A_NUMBER_UPPER

        private static final byte[] NOT_A_NUMBER_UPPER
      • NOT_A_NUMBER_LENGTH

        private static final int NOT_A_NUMBER_LENGTH
      • NUMBER_BASE

        private static final int NUMBER_BASE
        The underlying number base used in this class.
        See Also:
        Constant Field Values
      • NUMBER_BASE_DOUBLE

        private static final double NUMBER_BASE_DOUBLE
        The underlying number base used in this class as a double value.
        See Also:
        Constant Field Values
      • foundSign

        private boolean foundSign
        Did we find a sign last time we checked?
      • input

        private byte[] input
        Array being parsed
      • numberLength

        private int numberLength
        Length of last parsed value
      • offset

        private int offset
        Current offset into input.
    • Constructor Detail

      • ByteParser

        public ByteParser​(byte[] input)
        Construct a parser.
        Parameters:
        input - The byte array to be parsed. Note that the array can be re-used by refilling its contents and resetting the offset.
    • Method Detail

      • checkSign

        private int checkSign()
        Find the sign for a number . This routine looks for a sign (+/-) at the current location and return +1/-1 if one is found, or +1 if not. The foundSign boolean is set if a sign is found and offset is incremented.
      • getBareInteger

        private double getBareInteger​(int length)
        Get the integer value starting at the current position. This routine returns a double rather than an int/long to enable it to read very long integers (with reduced precision) such as 111111111111111111111111111111111111111111. Note that this routine does set numberLength.
        Parameters:
        length - The maximum number of characters to use.
      • getBoolean

        public boolean getBoolean()
                           throws FormatException
        Returns:
        a boolean value from the beginning of the buffer.
        Throws:
        FormatException - if the double was in an unknown format
      • getBoolean

        public boolean getBoolean​(int length)
                           throws FormatException
        Parameters:
        length - The maximum number of characters used to parse this boolean.
        Returns:
        a boolean value from a specified region of the buffer
        Throws:
        FormatException - if the double was in an unknown format
      • getBuffer

        public byte[] getBuffer()
        Returns:
        the buffer being used by the parser
      • getDouble

        public double getDouble()
                         throws FormatException
        Read in the buffer until a double is read. This will read the entire buffer if fillFields is set.
        Returns:
        The value found.
        Throws:
        FormatException - if the double was in an unknown format
      • getDouble

        public double getDouble​(int length)
                         throws FormatException
        Parameters:
        length - The maximum number of characters used to parse this number. If fillFields is specified then exactly only whitespace may follow a valid double value.
        Returns:
        a parsed double from the buffer. Leading spaces are ignored.
        Throws:
        FormatException - if the double was in an unknown format
      • getFloat

        public float getFloat()
                       throws FormatException
        Returns:
        a floating point value from the buffer. (see getDouble(int())
        Throws:
        FormatException - if the float was in an unknown format
      • getFloat

        public float getFloat​(int length)
                       throws FormatException
        Parameters:
        length - The maximum number of characters used to parse this float.
        Returns:
        a floating point value in a region of the buffer
        Throws:
        FormatException - if the float was in an unknown format
      • getInt

        public int getInt()
                   throws FormatException
        Returns:
        an integer at the beginning of the buffer
        Throws:
        FormatException - if the integer was in an unknown format
      • getInt

        public int getInt​(int length)
                   throws FormatException
        Parameters:
        length - The maximum number of characters used to parse this integer. @throws FormatException if the integer was in an unknown format
        Returns:
        a region of the buffer to an integer
        Throws:
        FormatException - if the integer was in an unknown format
      • getLong

        public long getLong​(int length)
                     throws FormatException
        Parameters:
        length - The maximum number of characters used to parse this long.
        Returns:
        a long in a specified region of the buffer
        Throws:
        FormatException - if the long was in an unknown format
      • getNumberLength

        public int getNumberLength()
        Returns:
        the number of characters used to parse the previous number (or the length of the previous String returned).
      • getOffset

        public int getOffset()
        Get the current offset.
        Returns:
        The current offset within the buffer.
      • getString

        public java.lang.String getString​(int length)
        Parameters:
        length - The length of the string.
        Returns:
        a string.
      • isCaseInsensitiv

        private boolean isCaseInsensitiv​(int length,
                                         int constantLength,
                                         byte[] lowerConstant,
                                         byte[] upperConstant)
      • setBuffer

        public void setBuffer​(byte[] buf)
        Set the buffer for the parser.
        Parameters:
        buf - buffer to set
      • setOffset

        public void setOffset​(int offset)
        Set the offset into the array.
        Parameters:
        offset - The desired offset from the beginning of the array.
      • skip

        public void skip​(int nBytes)
        Skip bytes in the buffer.
        Parameters:
        nBytes - number of bytes to skip
      • skipWhite

        public int skipWhite​(int length)
        Skip white space. This routine skips with space in the input .
        Parameters:
        length - The maximum number of characters to skip.
        Returns:
        the number of character skipped. White space is defined as ' ', '\t', '\n' or '\r'