net.sourceforge.tess4j
Class Tesseract

java.lang.Object
  extended by net.sourceforge.tess4j.Tesseract

public class Tesseract
extends java.lang.Object

An object layer on top of TessDllAPI, provides character recognition support for common image formats, and multi-page TIFF images beyond the uncompressed, binary TIFF format supported by Tesseract OCR engine. The extended capabilities are provided by the Java Advanced Imaging Image I/O Tools.

Support for PDF documents is available through Ghost4J, a JNA wrapper for GPL Ghostscript, which should be installed and included in system path.

Any program that uses the library will need to ensure that the required libraries (the .jar files for jna, jai-imageio, and ghost4j) are in its compile and run-time classpath.


Method Summary
 java.lang.String doOCR(java.awt.image.BufferedImage bi)
          Performs OCR operation.
 java.lang.String doOCR(java.awt.image.BufferedImage bi, java.awt.Rectangle rect)
          Performs OCR operation.
 java.lang.String doOCR(java.io.File imageFile)
          Performs OCR operation.
 java.lang.String doOCR(java.io.File imageFile, java.awt.Rectangle rect)
          Performs OCR operation.
 java.lang.String doOCR(int xsize, int ysize, java.nio.ByteBuffer buf, java.awt.Rectangle rect, int bpp)
          Performs OCR operation.
 java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList, java.awt.Rectangle rect)
          Performs OCR operation.
static Tesseract getInstance()
          Gets an instance of the class library.
 void setLanguage(java.lang.String language)
          Sets language for OCR.
static byte[] wrapperListToByteArray(java.util.List<java.lang.Byte> list)
          A utility method to convert a generic Byte list to a byte array.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getInstance

public static Tesseract getInstance()
Gets an instance of the class library.

Returns:
instance

setLanguage

public void setLanguage(java.lang.String language)
Sets language for OCR.

Parameters:
language - the language code, which follows ISO 639-3 standard.

doOCR

public java.lang.String doOCR(java.io.File imageFile)
                       throws TesseractException
Performs OCR operation.

Parameters:
imageFile - an image file
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.io.File imageFile,
                              java.awt.Rectangle rect)
                       throws TesseractException
Performs OCR operation.

Parameters:
imageFile - an image file
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.awt.image.BufferedImage bi)
                       throws TesseractException
Performs OCR operation.

Parameters:
bi - a buffered image
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.awt.image.BufferedImage bi,
                              java.awt.Rectangle rect)
                       throws TesseractException
Performs OCR operation.

Parameters:
bi - a buffered image
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList,
                              java.awt.Rectangle rect)
                       throws TesseractException
Performs OCR operation.

Parameters:
imageList - a list of IIOImage objects
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(int xsize,
                              int ysize,
                              java.nio.ByteBuffer buf,
                              java.awt.Rectangle rect,
                              int bpp)
                       throws TesseractException
Performs OCR operation.

Parameters:
xsize - width of image
ysize - height of image
buf - pixel data
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
bpp - bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.
Returns:
the recognized text
Throws:
TesseractException

wrapperListToByteArray

public static byte[] wrapperListToByteArray(java.util.List<java.lang.Byte> list)
A utility method to convert a generic Byte list to a byte array.

Parameters:
list - a List
Returns:
an array of bytes