Class PdfBoxUtilities

java.lang.Object
net.sourceforge.tess4j.util.PdfBoxUtilities

public class PdfBoxUtilities extends Object
PDF utilities based on PDFBox.
Author:
Robert Drysdale, Quan Nguyen
  • Constructor Details

    • PdfBoxUtilities

      public PdfBoxUtilities()
  • Method Details

    • convertPdf2Tiff

      public static File convertPdf2Tiff(File inputPdfFile) throws IOException
      Converts PDF to TIFF format.
      Parameters:
      inputPdfFile - input file
      Returns:
      a multi-page TIFF image
      Throws:
      IOException
    • convertPdf2Png

      public static File[] convertPdf2Png(File inputPdfFile) throws IOException
      Converts PDF to PNG format.
      Parameters:
      inputPdfFile - input file
      Returns:
      an array of PNG images
      Throws:
      IOException
    • splitPdf

      public static void splitPdf(File inputPdfFile, File outputPdfFile, int firstPage, int lastPage)
      Splits PDF.
      Parameters:
      inputPdfFile - input file
      outputPdfFile - output file
      firstPage - begin page
      lastPage - end page
    • getPdfPageCount

      public static int getPdfPageCount(File inputPdfFile)
      Gets PDF Page Count.
      Parameters:
      inputPdfFile - input file
      Returns:
      number of pages
    • mergePdf

      public static void mergePdf(File[] inputPdfFiles, File outputPdfFile)
      Merges PDF files.
      Parameters:
      inputPdfFiles - array of input files
      outputPdfFile - output file
    • mergeHocrIntoAPdf

      public static void mergeHocrIntoAPdf(String inputHocr, String inputPdfStr, String outputPdfStr, boolean visible) throws Exception
      Merge text from hocr file into a pdf
      Parameters:
      inputHocr - input hocr file
      inputPdfStr - input pdf file
      outputPdfStr - ouput pdf file result of merging
      visible - does the text are visible or not
      Throws:
      Exception