Class LegacyPDFStreamEngine

  • Direct Known Subclasses:
    PDFMarkedContentExtractor, PDFTextStripper

    class LegacyPDFStreamEngine
    extends PDFStreamEngine
    LEGACY text calculations which are known to be incorrect but are depended on by PDFTextStripper. This class exists only so that we don't break the code of users who have their own subclasses of PDFTextStripper. It replaces the mostly empty implementation of showGlyph() in PDFStreamEngine with a heuristic implementation which is backwards compatible. DO NOT USE THIS CODE UNLESS YOU ARE WORKING WITH PDFTextStripper. THIS CODE IS DELIBERATELY INCORRECT, USE PDFStreamEngine INSTEAD.
    • Field Detail

      • LOG

        private static final org.apache.commons.logging.Log LOG
      • pageRotation

        private int pageRotation
      • translateMatrix

        private Matrix translateMatrix
      • glyphList

        private final GlyphList glyphList
      • fontHeightMap

        private final java.util.Map<COSDictionary,​java.lang.Float> fontHeightMap
    • Constructor Detail

      • LegacyPDFStreamEngine

        LegacyPDFStreamEngine()
                       throws java.io.IOException
        Constructor.
        Throws:
        java.io.IOException
    • Method Detail

      • processPage

        public void processPage​(PDPage page)
                         throws java.io.IOException
        This will initialize and process the contents of the stream.
        Overrides:
        processPage in class PDFStreamEngine
        Parameters:
        page - the page to process
        Throws:
        java.io.IOException - if there is an error accessing the stream.
      • showGlyph

        protected void showGlyph​(Matrix textRenderingMatrix,
                                 PDFont font,
                                 int code,
                                 java.lang.String unicode,
                                 Vector displacement)
                          throws java.io.IOException
        Called when a glyph is to be processed. The heuristic calculations here were originally written by Ben Litchfield for PDFStreamEngine.
        Overrides:
        showGlyph in class PDFStreamEngine
        Parameters:
        textRenderingMatrix - the current text rendering matrix, Trm
        font - the current font
        code - internal PDF character code for the glyph
        unicode - the Unicode text for this glyph, or null if the PDF does provide it
        displacement - the displacement (i.e. advance) of the glyph in text space
        Throws:
        java.io.IOException - if the glyph cannot be processed
      • computeFontHeight

        protected float computeFontHeight​(PDFont font)
                                   throws java.io.IOException
        Compute the font height. Override this if you want to use own calculations.
        Parameters:
        font - the font.
        Returns:
        the font height.
        Throws:
        java.io.IOException - if there is an error while getting the font bounding box.
      • processTextPosition

        protected void processTextPosition​(TextPosition text)
        A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.
        Parameters:
        text - The text to be processed.