Class HTMLStripCharFilter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Readable

    public final class HTMLStripCharFilter
    extends BaseCharFilter
    A CharFilter that wraps another Reader and attempts to strip out HTML constructs.
    • Field Detail

      • YYEOF

        private static final int YYEOF
        This character denotes the end of file
        See Also:
        Constant Field Values
      • ZZ_BUFFERSIZE

        private static final int ZZ_BUFFERSIZE
        initial size of the lookahead buffer
        See Also:
        Constant Field Values
      • CHARACTER_REFERENCE_TAIL

        private static final int CHARACTER_REFERENCE_TAIL
        See Also:
        Constant Field Values
      • LEFT_ANGLE_BRACKET_SLASH

        private static final int LEFT_ANGLE_BRACKET_SLASH
        See Also:
        Constant Field Values
      • LEFT_ANGLE_BRACKET_SPACE

        private static final int LEFT_ANGLE_BRACKET_SPACE
        See Also:
        Constant Field Values
      • END_TAG_TAIL_SUBSTITUTE

        private static final int END_TAG_TAIL_SUBSTITUTE
        See Also:
        Constant Field Values
      • START_TAG_TAIL_INCLUDE

        private static final int START_TAG_TAIL_INCLUDE
        See Also:
        Constant Field Values
      • START_TAG_TAIL_EXCLUDE

        private static final int START_TAG_TAIL_EXCLUDE
        See Also:
        Constant Field Values
      • START_TAG_TAIL_SUBSTITUTE

        private static final int START_TAG_TAIL_SUBSTITUTE
        See Also:
        Constant Field Values
      • ZZ_LEXSTATE

        private static final int[] ZZ_LEXSTATE
        ZZ_LEXSTATE[l] is the state in the DFA for the lexical state l ZZ_LEXSTATE[l+1] is the state in the DFA for the lexical state l at the beginning of a line l is of the form l = 2*k, k a non negative integer
      • ZZ_CMAP_PACKED

        private static final java.lang.String ZZ_CMAP_PACKED
        Translates characters to character classes
        See Also:
        Constant Field Values
      • ZZ_CMAP

        private static final char[] ZZ_CMAP
        Translates characters to character classes
      • ZZ_ACTION

        private static final int[] ZZ_ACTION
        Translates DFA states to action switch labels.
      • ZZ_ACTION_PACKED_0

        private static final java.lang.String ZZ_ACTION_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_ROWMAP

        private static final int[] ZZ_ROWMAP
        Translates a state to a row index in the transition table
      • ZZ_ROWMAP_PACKED_0

        private static final java.lang.String ZZ_ROWMAP_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_TRANS

        private static final int[] ZZ_TRANS
        The transition table of the DFA
      • ZZ_TRANS_PACKED_0

        private static final java.lang.String ZZ_TRANS_PACKED_0
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_1

        private static final java.lang.String ZZ_TRANS_PACKED_1
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_2

        private static final java.lang.String ZZ_TRANS_PACKED_2
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_3

        private static final java.lang.String ZZ_TRANS_PACKED_3
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_4

        private static final java.lang.String ZZ_TRANS_PACKED_4
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_5

        private static final java.lang.String ZZ_TRANS_PACKED_5
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_6

        private static final java.lang.String ZZ_TRANS_PACKED_6
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_7

        private static final java.lang.String ZZ_TRANS_PACKED_7
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_8

        private static final java.lang.String ZZ_TRANS_PACKED_8
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_9

        private static final java.lang.String ZZ_TRANS_PACKED_9
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_10

        private static final java.lang.String ZZ_TRANS_PACKED_10
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_11

        private static final java.lang.String ZZ_TRANS_PACKED_11
        See Also:
        Constant Field Values
      • ZZ_TRANS_PACKED_12

        private static final java.lang.String ZZ_TRANS_PACKED_12
        See Also:
        Constant Field Values
      • ZZ_ERROR_MSG

        private static final java.lang.String[] ZZ_ERROR_MSG
      • ZZ_ATTRIBUTE

        private static final int[] ZZ_ATTRIBUTE
        ZZ_ATTRIBUTE[aState] contains the attributes of state aState
      • ZZ_ATTRIBUTE_PACKED_0

        private static final java.lang.String ZZ_ATTRIBUTE_PACKED_0
        See Also:
        Constant Field Values
      • zzReader

        private java.io.Reader zzReader
        the input device
      • zzState

        private int zzState
        the current state of the DFA
      • zzLexicalState

        private int zzLexicalState
        the current lexical state
      • zzBuffer

        private char[] zzBuffer
        this buffer contains the current text to be matched and is the source of the yytext() string
      • zzMarkedPos

        private int zzMarkedPos
        the textposition at the last accepting state
      • zzCurrentPos

        private int zzCurrentPos
        the current text position in the buffer
      • zzStartRead

        private int zzStartRead
        startRead marks the beginning of the yytext() string in the buffer
      • zzEndRead

        private int zzEndRead
        endRead marks the last character in the buffer, that has been read from input
      • yyline

        private int yyline
        number of newlines encountered up to the start of the matched text
      • yychar

        private int yychar
        the number of characters up to the start of the matched text
      • yycolumn

        private int yycolumn
        the number of characters from the last newline up to the start of the matched text
      • zzAtBOL

        private boolean zzAtBOL
        zzAtBOL == true iff the scanner is currently at the beginning of a line
      • zzAtEOF

        private boolean zzAtEOF
        zzAtEOF == true iff the scanner is at the EOF
      • zzEOFDone

        private boolean zzEOFDone
        denotes if the user-EOF-code has already been executed
      • zzFinalHighSurrogate

        private int zzFinalHighSurrogate
        The number of occupied positions in zzBuffer beyond zzEndRead. When a lead/high surrogate has been read from the input stream into the final zzBuffer position, this will have a value of 1; otherwise, it will have a value of 0.
      • upperCaseVariantsAccepted

        private static final java.util.Map<java.lang.String,​java.lang.String> upperCaseVariantsAccepted
      • entityValues

        private static final CharArrayMap<java.lang.Character> entityValues
      • INITIAL_INPUT_SEGMENT_SIZE

        private static final int INITIAL_INPUT_SEGMENT_SIZE
        See Also:
        Constant Field Values
      • BLOCK_LEVEL_START_TAG_REPLACEMENT

        private static final char BLOCK_LEVEL_START_TAG_REPLACEMENT
        See Also:
        Constant Field Values
      • BLOCK_LEVEL_END_TAG_REPLACEMENT

        private static final char BLOCK_LEVEL_END_TAG_REPLACEMENT
        See Also:
        Constant Field Values
      • BR_START_TAG_REPLACEMENT

        private static final char BR_START_TAG_REPLACEMENT
        See Also:
        Constant Field Values
      • BR_END_TAG_REPLACEMENT

        private static final char BR_END_TAG_REPLACEMENT
        See Also:
        Constant Field Values
      • inputStart

        private int inputStart
      • cumulativeDiff

        private int cumulativeDiff
      • escapeBR

        private boolean escapeBR
      • escapeSCRIPT

        private boolean escapeSCRIPT
      • escapeSTYLE

        private boolean escapeSTYLE
      • restoreState

        private int restoreState
      • previousRestoreState

        private int previousRestoreState
      • outputCharCount

        private int outputCharCount
      • eofReturnValue

        private int eofReturnValue
    • Constructor Detail

      • HTMLStripCharFilter

        public HTMLStripCharFilter​(java.io.Reader in,
                                   java.util.Set<java.lang.String> escapedTags)
        Creates a new HTMLStripCharFilter over the provided Reader with the specified start and end tags.
        Parameters:
        in - Reader to strip html tags from.
        escapedTags - Tags in this set (both start and end tags) will not be filtered out.
      • HTMLStripCharFilter

        public HTMLStripCharFilter​(java.io.Reader in)
        Creates a new scanner
        Parameters:
        in - the java.io.Reader to read input from.
    • Method Detail

      • zzUnpackAction

        private static int[] zzUnpackAction()
      • zzUnpackAction

        private static int zzUnpackAction​(java.lang.String packed,
                                          int offset,
                                          int[] result)
      • zzUnpackRowMap

        private static int[] zzUnpackRowMap()
      • zzUnpackRowMap

        private static int zzUnpackRowMap​(java.lang.String packed,
                                          int offset,
                                          int[] result)
      • zzUnpackTrans

        private static int[] zzUnpackTrans()
      • zzUnpackTrans

        private static int zzUnpackTrans​(java.lang.String packed,
                                         int offset,
                                         int[] result)
      • zzUnpackAttribute

        private static int[] zzUnpackAttribute()
      • zzUnpackAttribute

        private static int zzUnpackAttribute​(java.lang.String packed,
                                             int offset,
                                             int[] result)
      • read

        public int read()
                 throws java.io.IOException
        Overrides:
        read in class java.io.Reader
        Throws:
        java.io.IOException
      • read

        public int read​(char[] cbuf,
                        int off,
                        int len)
                 throws java.io.IOException
        Specified by:
        read in class java.io.Reader
        Throws:
        java.io.IOException
      • close

        public void close()
                   throws java.io.IOException
        Description copied from class: CharFilter
        Closes the underlying input stream.

        NOTE: The default implementation closes the input Reader, so be sure to call super.close() when overriding this method.

        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Overrides:
        close in class CharFilter
        Throws:
        java.io.IOException
      • getInitialBufferSize

        static int getInitialBufferSize()
      • zzUnpackCMap

        private static char[] zzUnpackCMap​(java.lang.String packed)
        Unpacks the compressed character translation table.
        Parameters:
        packed - the packed character translation table
        Returns:
        the unpacked character translation table
      • zzRefill

        private boolean zzRefill()
                          throws java.io.IOException
        Refills the input buffer.
        Returns:
        false, iff there was new input.
        Throws:
        java.io.IOException - if any I/O-Error occurs
      • yyclose

        private final void yyclose()
                            throws java.io.IOException
        Closes the input stream.
        Throws:
        java.io.IOException
      • yyreset

        private final void yyreset​(java.io.Reader reader)
        Resets the scanner to read from a new input stream. Does not close the old reader. All internal variables are reset, the old input stream cannot be reused (internal buffer is discarded and lost). Lexical state is set to ZZ_INITIAL. Internal scan buffer is resized down to its initial length, if it has grown.
        Parameters:
        reader - the new input stream
      • yystate

        private final int yystate()
        Returns the current lexical state.
      • yybegin

        private final void yybegin​(int newState)
        Enters a new lexical state
        Parameters:
        newState - the new lexical state
      • yytext

        private final java.lang.String yytext()
        Returns the text matched by the current regular expression.
      • yycharat

        private final char yycharat​(int pos)
        Returns the character at position pos from the matched text. It is equivalent to yytext().charAt(pos), but faster
        Parameters:
        pos - the position of the character to fetch. A value from 0 to yylength()-1.
        Returns:
        the character at position pos
      • yylength

        private final int yylength()
        Returns the length of the matched text region.
      • zzScanError

        private void zzScanError​(int errorCode)
        Reports an error that occurred while scanning. In a wellformed scanner (no or only correct usage of yypushback(int) and a match-all fallback rule) this method will only be called with things that "Can't Possibly Happen". If this method is called, something is seriously wrong (e.g. a JFlex bug producing a faulty scanner etc.). Usual syntax/scanner level error handling should be done in error fallback rules.
        Parameters:
        errorCode - the code of the errormessage to display
      • yypushback

        private void yypushback​(int number)
        Pushes the specified amount of characters back into the input stream. They will be read again by then next call of the scanning method
        Parameters:
        number - the number of characters to be read again. This number must not be greater than yylength()!
      • zzDoEOF

        private void zzDoEOF()
        Contains user EOF-code, which will be executed exactly once, when the end of file is reached
      • nextChar

        private int nextChar()
                      throws java.io.IOException
        Resumes scanning until the next regular expression is matched, the end of input is encountered or an I/O-Error occurs.
        Returns:
        the next token
        Throws:
        java.io.IOException - if any I/O-Error occurs