net.sourceforge.tess4j
Class Tesseract1

java.lang.Object
  extended by net.sourceforge.tess4j.TessDllAPI1
      extended by net.sourceforge.tess4j.Tesseract1
All Implemented Interfaces:
com.sun.jna.Library

public class Tesseract1
extends TessDllAPI1

An object layer on top of TessDllAPI1, provides character recognition support for common image formats, and multi-page TIFF images beyond the uncompressed, binary TIFF format supported by Tesseract OCR engine. The extended capabilities are provided by the Java Advanced Imaging Image I/O Tools.

Support for PDF documents is available through Ghost4J, a JNA wrapper for GPL Ghostscript, which should be installed and included in system path.

Any program that uses the library will need to ensure that the required libraries (the .jar files for jna, jai-imageio, and ghost4j) are in its compile and run-time classpath.


Nested Class Summary
 
Nested classes/interfaces inherited from class net.sourceforge.tess4j.TessDllAPI1
TessDllAPI1.CANCEL_FUNC
 
Nested classes/interfaces inherited from interface com.sun.jna.Library
com.sun.jna.Library.Handler
 
Field Summary
 
Fields inherited from class net.sourceforge.tess4j.TessDllAPI1
LIB_NAME
 
Fields inherited from interface com.sun.jna.Library
OPTION_ALLOW_OBJECTS, OPTION_CALLING_CONVENTION, OPTION_FUNCTION_MAPPER, OPTION_INVOCATION_MAPPER, OPTION_STRUCTURE_ALIGNMENT, OPTION_TYPE_MAPPER
 
Constructor Summary
Tesseract1()
           
 
Method Summary
 java.lang.String doOCR(java.awt.image.BufferedImage bi)
          Performs OCR operation.
 java.lang.String doOCR(java.awt.image.BufferedImage bi, java.awt.Rectangle rect)
          Performs OCR operation.
 java.lang.String doOCR(java.io.File imageFile)
          Performs OCR operation.
 java.lang.String doOCR(java.io.File imageFile, java.awt.Rectangle rect)
          Performs OCR operation.
 java.lang.String doOCR(int xsize, int ysize, java.nio.ByteBuffer buf, java.awt.Rectangle rect, int bpp)
          Performs OCR operation.
 java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList, java.awt.Rectangle rect)
          Performs OCR operation.
 void setLanguage(java.lang.String language)
          Sets language for OCR.
static byte[] wrapperListToByteArray(java.util.List<java.lang.Byte> list)
          A utility method to convert a generic Byte list to a byte array.
 
Methods inherited from class net.sourceforge.tess4j.TessDllAPI1
TessDllBeginPage, TessDllBeginPageBPP, TessDllBeginPageLang, TessDllBeginPageLangBPP, TessDllBeginPageUpright, TessDllBeginPageUprightBPP, TessDllEndPage, TessDllRecognize_a_Block, TessDllRecognize_all_Words, TessDllRelease
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tesseract1

public Tesseract1()
Method Detail

setLanguage

public void setLanguage(java.lang.String language)
Sets language for OCR.

Parameters:
language - the language code, which follows ISO 639-3 standard.

doOCR

public java.lang.String doOCR(java.io.File imageFile)
                       throws TesseractException
Performs OCR operation.

Parameters:
imageFile - an image file
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.io.File imageFile,
                              java.awt.Rectangle rect)
                       throws TesseractException
Performs OCR operation.

Parameters:
imageFile - an image file
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.awt.image.BufferedImage bi)
                       throws TesseractException
Performs OCR operation.

Parameters:
bi - a buffered image
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.awt.image.BufferedImage bi,
                              java.awt.Rectangle rect)
                       throws TesseractException
Performs OCR operation.

Parameters:
bi - a buffered image
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList,
                              java.awt.Rectangle rect)
                       throws TesseractException
Performs OCR operation.

Parameters:
imageList - a list of IIOImage objects
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
Returns:
the recognized text
Throws:
TesseractException

doOCR

public java.lang.String doOCR(int xsize,
                              int ysize,
                              java.nio.ByteBuffer buf,
                              java.awt.Rectangle rect,
                              int bpp)
                       throws TesseractException
Performs OCR operation.

Parameters:
xsize - width of image
ysize - height of image
buf - pixel data
rect - the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension or null indicates the whole image.
bpp - bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.
Returns:
the recognized text
Throws:
TesseractException

wrapperListToByteArray

public static byte[] wrapperListToByteArray(java.util.List<java.lang.Byte> list)
A utility method to convert a generic Byte list to a byte array.

Parameters:
list - a List
Returns:
an array of bytes