public class Tesseract
extends java.lang.Object
TessAPI, provides character recognition support for common image
 formats, and multi-page TIFF images beyond the uncompressed, binary TIFF
 format supported by Tesseract OCR engine. The extended capabilities are
 provided by the
 Java Advanced Imaging Image I/O Tools. Ghost4J, a
 JNA wrapper for
 GPL Ghostscript, which should be installed and included in
 system path. .jar files for
 jna,
 jai-imageio, and
 ghost4j) are in its compile and run-time
 classpath.| Modifier and Type | Field and Description | 
|---|---|
static java.lang.String | 
htmlBeginTag  | 
static java.lang.String | 
htmlEndTag  | 
| Modifier and Type | Method and Description | 
|---|---|
java.lang.String | 
doOCR(java.awt.image.BufferedImage bi)
Performs OCR operation. 
 | 
java.lang.String | 
doOCR(java.awt.image.BufferedImage bi,
     java.awt.Rectangle rect)
Performs OCR operation. 
 | 
java.lang.String | 
doOCR(java.io.File imageFile)
Performs OCR operation. 
 | 
java.lang.String | 
doOCR(java.io.File imageFile,
     java.awt.Rectangle rect)
Performs OCR operation. 
 | 
java.lang.String | 
doOCR(int xsize,
     int ysize,
     java.nio.ByteBuffer buf,
     java.awt.Rectangle rect,
     int bpp)
Performs OCR operation. 
 | 
java.lang.String | 
doOCR(java.util.List<javax.imageio.IIOImage> imageList,
     java.awt.Rectangle rect)
Performs OCR operation. 
 | 
static Tesseract | 
getInstance()
Gets an instance of the class library. 
 | 
void | 
setDatapath(java.lang.String datapath)
Sets tessdata path. 
 | 
void | 
setHocr(boolean hocr)
Enables hocr output. 
 | 
void | 
setLanguage(java.lang.String language)
Sets language for OCR. 
 | 
void | 
setOcrEngineMode(int ocrEngineMode)
Sets OCR engine mode. 
 | 
void | 
setPageSegMode(int mode)
Sets page segmentation mode. 
 | 
void | 
setTessVariable(java.lang.String key,
               java.lang.String value)
Set the value of Tesseract's internal parameter. 
 | 
public static final java.lang.String htmlBeginTag
public static final java.lang.String htmlEndTag
public static Tesseract getInstance()
public void setDatapath(java.lang.String datapath)
datapath - the tessdata path to setpublic void setLanguage(java.lang.String language)
language - the language code, which follows ISO 639-3 standard.public void setOcrEngineMode(int ocrEngineMode)
ocrEngineMode - the OcrEngineMode to setpublic void setPageSegMode(int mode)
mode - the page segmentation mode to setpublic void setHocr(boolean hocr)
hocr - to enable or disable hocr outputpublic void setTessVariable(java.lang.String key,
                   java.lang.String value)
key - variable name, e.g.,
 tessedit_create_hocr,
 tessedit_char_whitelist, etc.value - value for corresponding variable, e.g., "1", "0",
 "0123456789", etc.public java.lang.String doOCR(java.io.File imageFile)
                       throws TesseractException
imageFile - an image fileTesseractExceptionpublic java.lang.String doOCR(java.io.File imageFile,
                     java.awt.Rectangle rect)
                       throws TesseractException
imageFile - an image filerect - the bounding rectangle defines the region of the image to be
 recognized. A rectangle of zero dimension or
 null indicates the whole image.TesseractExceptionpublic java.lang.String doOCR(java.awt.image.BufferedImage bi)
                       throws TesseractException
bi - a buffered imageTesseractExceptionpublic java.lang.String doOCR(java.awt.image.BufferedImage bi,
                     java.awt.Rectangle rect)
                       throws TesseractException
bi - a buffered imagerect - the bounding rectangle defines the region of the image to be
 recognized. A rectangle of zero dimension or
 null indicates the whole image.TesseractExceptionpublic java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList,
                     java.awt.Rectangle rect)
                       throws TesseractException
imageList - a list of
 IIOImage objectsrect - the bounding rectangle defines the region of the image to be
 recognized. A rectangle of zero dimension or
 null indicates the whole image.TesseractExceptionpublic java.lang.String doOCR(int xsize,
                     int ysize,
                     java.nio.ByteBuffer buf,
                     java.awt.Rectangle rect,
                     int bpp)
                       throws TesseractException
SetImage, (optionally)
 SetRectangle, and one or more of the
 Get*Text functions.xsize - width of imageysize - height of imagebuf - pixel datarect - the bounding rectangle defines the region of the image to be
 recognized. A rectangle of zero dimension or
 null indicates the whole image.bpp - bits per pixel, represents the bit depth of the image, with 1
 for binary bitmap, 8 for gray, and 24 for color RGB.TesseractException