Package org.apache.pdfbox.text
Class LegacyPDFStreamEngine
java.lang.Object
org.apache.pdfbox.contentstream.PDFStreamEngine
org.apache.pdfbox.text.LegacyPDFStreamEngine
- Direct Known Subclasses:
PDFMarkedContentExtractor
,PDFTextStripper
LEGACY text calculations which are known to be incorrect but are depended on by PDFTextStripper.
This class exists only so that we don't break the code of users who have their own subclasses of
PDFTextStripper. It replaces the mostly empty implementation of showGlyph() in PDFStreamEngine
with a heuristic implementation which is backwards compatible.
DO NOT USE THIS CODE UNLESS YOU ARE WORKING WITH PDFTextStripper.
THIS CODE IS DELIBERATELY INCORRECT, USE PDFStreamEngine INSTEAD.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final Map<COSDictionary,
Float> private static final GlyphList
private static final org.apache.commons.logging.Log
private int
private PDRectangle
private Matrix
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected float
computeFontHeight
(PDFont font) Compute the font height.void
processPage
(PDPage page) This will initialize and process the contents of the stream.protected void
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.protected void
Called when a glyph is to be processed.Methods inherited from class org.apache.pdfbox.contentstream.PDFStreamEngine
addOperator, applyTextAdjustment, beginMarkedContentSequence, beginText, decreaseLevel, endMarkedContentSequence, endText, getAppearance, getCurrentPage, getGraphicsStackSize, getGraphicsState, getInitialMatrix, getLevel, getResources, getTextLineMatrix, getTextMatrix, increaseLevel, operatorException, processAnnotation, processChildStream, processOperator, processOperator, processSoftMask, processTilingPattern, processTilingPattern, processTransparencyGroup, processType3Stream, registerOperatorProcessor, restoreGraphicsStack, restoreGraphicsState, saveGraphicsStack, saveGraphicsState, setLineDashPattern, setTextLineMatrix, setTextMatrix, showAnnotation, showFontGlyph, showFontGlyph, showForm, showGlyph, showText, showTextString, showTextStrings, showTransparencyGroup, showType3Glyph, showType3Glyph, transformedPoint, transformWidth, unsupportedOperator
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOG -
pageRotation
private int pageRotation -
pageSize
-
translateMatrix
-
GLYPHLIST
-
fontHeightMap
-
-
Constructor Details
-
LegacyPDFStreamEngine
LegacyPDFStreamEngine() throws IOExceptionConstructor.- Throws:
IOException
-
-
Method Details
-
processPage
This will initialize and process the contents of the stream.- Overrides:
processPage
in classPDFStreamEngine
- Parameters:
page
- the page to process- Throws:
IOException
- if there is an error accessing the stream.
-
showGlyph
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException Called when a glyph is to be processed. The heuristic calculations here were originally written by Ben Litchfield for PDFStreamEngine.- Overrides:
showGlyph
in classPDFStreamEngine
- Parameters:
textRenderingMatrix
- the current text rendering matrix, Trmfont
- the current fontcode
- internal PDF character code for the glyphunicode
- the Unicode text for this glyph, or null if the PDF does provide itdisplacement
- the displacement (i.e. advance) of the glyph in text space- Throws:
IOException
- if the glyph cannot be processed
-
computeFontHeight
Compute the font height. Override this if you want to use own calculations.- Parameters:
font
- the font.- Returns:
- the font height.
- Throws:
IOException
- if there is an error while getting the font bounding box.
-
processTextPosition
A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.- Parameters:
text
- The text to be processed.
-