Class PreflightParser


public class PreflightParser extends PDFParser
  • Field Details

    • encoding

      public static final Charset encoding
      Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816
    • dataSource

      protected javax.activation.DataSource dataSource
    • validationResult

      protected ValidationResult validationResult
    • preflightDocument

      protected PreflightDocument preflightDocument
    • ctx

      protected PreflightContext ctx
  • Constructor Details

    • PreflightParser

      public PreflightParser(File file) throws IOException
      Constructor.
      Parameters:
      file -
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(File file, ScratchFile scratch) throws IOException
      Constructor.
      Parameters:
      file -
      scratch -
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(String filename) throws IOException
      Constructor.
      Parameters:
      filename -
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(String filename, ScratchFile scratch) throws IOException
      Constructor.
      Parameters:
      filename -
      scratch -
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(javax.activation.DataSource dataSource) throws IOException
      Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.
      Parameters:
      dataSource - the datasource
      Throws:
      IOException - if there is a reading error.
    • PreflightParser

      public PreflightParser(javax.activation.DataSource dataSource, ScratchFile scratch) throws IOException
      Constructor. This one is slower than the file and the filename constructors, because a temporary file will be created.
      Parameters:
      dataSource - the datasource
      scratch -
      Throws:
      IOException - if there is a reading error.
  • Method Details

    • createUnknownErrorResult

      protected static ValidationResult createUnknownErrorResult()
      Create an instance of ValidationResult with a ValidationError(UNKNOWN_ERROR)
      Returns:
      the ValidationError instance.
    • addValidationError

      protected void addValidationError(ValidationResult.ValidationError error)
      Add the error to the ValidationResult. If the validationResult is null, an instance is created using the isWarning boolean of the ValidationError to know if the ValidationResult must be flagged as Valid.
      Parameters:
      error -
    • addValidationErrors

      protected void addValidationErrors(List<ValidationResult.ValidationError> errors)
    • parse

      public void parse() throws IOException
      Description copied from class: PDFParser
      This will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.
      Overrides:
      parse in class PDFParser
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If there is an error reading from the stream or corrupt data is found.
    • parse

      public void parse(Format format) throws IOException
      Parse the given file and check if it is a confirming file according to the given format.
      Parameters:
      format - format that the document should follow (default Format.PDF_A1B)
      Throws:
      IOException
    • parse

      public void parse(Format format, PreflightConfiguration config) throws IOException
      Parse the given file and check if it is a confirming file according to the given format.
      Parameters:
      format - format that the document should follow (default Format.PDF_A1B)
      config - Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.
      Throws:
      IOException
    • createPdfADocument

      protected void createPdfADocument(Format format, PreflightConfiguration config) throws IOException
      Throws:
      IOException
    • createContext

      protected void createContext()
      Create a validation context. This context is set to the PreflightDocument.
    • getPDDocument

      public PDDocument getPDDocument() throws IOException
      Description copied from class: PDFParser
      This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
      Overrides:
      getPDDocument in class PDFParser
      Returns:
      The document at the PD layer.
      Throws:
      IOException - If there is an error getting the document.
    • getPreflightDocument

      public PreflightDocument getPreflightDocument() throws IOException
      Throws:
      IOException
    • initialParse

      protected void initialParse() throws IOException
      Description copied from class: PDFParser
      The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.
      Overrides:
      initialParse in class PDFParser
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If something went wrong.
    • checkPdfHeader

      protected void checkPdfHeader()
      Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80
    • parseXrefTable

      protected boolean parseXrefTable(long startByteOffset) throws IOException
      Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on
      Overrides:
      parseXrefTable in class COSParser
      Parameters:
      startByteOffset - the offset to start at
      Returns:
      false on parsing error
      Throws:
      IOException - If an IO error occurs.
    • parseCOSStream

      protected COSStream parseCOSStream(COSDictionary dic) throws IOException
      Overrides:
      parseCOSStream in class COSParser
      Parameters:
      dic - dictionary that goes with this stream.
      Returns:
      parsed pdf stream.
      Throws:
      IOException - if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
    • checkStreamKeyWord

      protected void checkStreamKeyWord() throws IOException
      'stream' must be followed by <CR><LF> or only <LF>
      Throws:
      IOException
    • checkEndstreamKeyWord

      protected void checkEndstreamKeyWord() throws IOException
      'endstream' must be preceded by an EOL
      Throws:
      IOException
    • nextIsEOL

      private boolean nextIsEOL() throws IOException
      Throws:
      IOException
    • parseCOSArray

      protected COSArray parseCOSArray() throws IOException
      Description copied from class: BaseParser
      This will parse a PDF array object.
      Overrides:
      parseCOSArray in class BaseParser
      Returns:
      The parsed PDF array.
      Throws:
      IOException - If there is an error parsing the stream.
    • parseCOSName

      protected COSName parseCOSName() throws IOException
      Description copied from class: BaseParser
      This will parse a PDF name from the stream.
      Overrides:
      parseCOSName in class BaseParser
      Returns:
      The parsed PDF name.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseCOSString

      protected COSString parseCOSString() throws IOException
      Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and call BaseParser.parseCOSString()
      Overrides:
      parseCOSString in class BaseParser
      Returns:
      The parsed PDF string.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseDirObject

      protected COSBase parseDirObject() throws IOException
      Call BaseParser.parseDirObject() check limit range for Float, Integer and number of Dictionary entries.
      Overrides:
      parseDirObject in class BaseParser
      Returns:
      The parsed object.
      Throws:
      IOException - if there is an error during parsing.
    • parseObjectDynamically

      protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws IOException
      Description copied from class: COSParser
      This will parse the next object from the stream and add it to the local state. It's reduced to parsing an indirect object.
      Overrides:
      parseObjectDynamically in class COSParser
      Parameters:
      objNr - object number of object to be parsed
      objGenNr - object generation number of object to be parsed
      requireExistingNotCompressedObj - if true the object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)
      Returns:
      the parsed object (which is also added to document object)
      Throws:
      IOException - If an IO error occurs.
    • lastIndexOf

      protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)
      Description copied from class: COSParser
      Searches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.
      Overrides:
      lastIndexOf in class COSParser
      Parameters:
      pattern - pattern to search for
      buf - buffer to search pattern in
      endOff - offset (exclusive) where lookup starts at
      Returns:
      start offset of pattern within buffer or -1 if pattern could not be found