public class Tesseract extends java.lang.Object implements ITesseract
TessAPI, provides character
recognition support for common image formats, and multi-page TIFF images
beyond the uncompressed, binary TIFF format supported by Tesseract OCR
engine. The extended capabilities are provided by the
Java Advanced Imaging Image I/O Tools.Ghost4J, a
JNA wrapper for GPL Ghostscript, which should be
installed and included in system path..jar files for jna,
jai-imageio, and ghost4j) are in its compile and
run-time classpath.htmlBeginTag, htmlEndTag| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
doOCR(java.awt.image.BufferedImage bi)
Performs OCR operation.
|
java.lang.String |
doOCR(java.awt.image.BufferedImage bi,
java.awt.Rectangle rect)
Performs OCR operation.
|
java.lang.String |
doOCR(java.io.File imageFile)
Performs OCR operation.
|
java.lang.String |
doOCR(java.io.File imageFile,
java.awt.Rectangle rect)
Performs OCR operation.
|
java.lang.String |
doOCR(int xsize,
int ysize,
java.nio.ByteBuffer buf,
java.awt.Rectangle rect,
int bpp)
Performs OCR operation.
|
java.lang.String |
doOCR(java.util.List<javax.imageio.IIOImage> imageList,
java.awt.Rectangle rect)
Performs OCR operation.
|
static Tesseract |
getInstance()
Gets an instance of the class library.
|
void |
setConfigs(java.util.List<java.lang.String> configs)
Sets configs to be passed to Tesseract's
Init method. |
void |
setDatapath(java.lang.String datapath)
Sets path to
tessdata. |
void |
setHocr(boolean hocr)
Enables hocr output.
|
void |
setLanguage(java.lang.String language)
Sets language for OCR.
|
void |
setOcrEngineMode(int ocrEngineMode)
Sets OCR engine mode.
|
void |
setPageSegMode(int mode)
Sets page segmentation mode.
|
void |
setTessVariable(java.lang.String key,
java.lang.String value)
Set the value of Tesseract's internal parameter.
|
public static Tesseract getInstance()
public void setDatapath(java.lang.String datapath)
tessdata.setDatapath in interface ITesseractdatapath - the tessdata path to setpublic void setLanguage(java.lang.String language)
setLanguage in interface ITesseractlanguage - the language code, which follows ISO 639-3 standard.public void setOcrEngineMode(int ocrEngineMode)
setOcrEngineMode in interface ITesseractocrEngineMode - the OcrEngineMode to setpublic void setPageSegMode(int mode)
setPageSegMode in interface ITesseractmode - the page segmentation mode to setpublic void setHocr(boolean hocr)
hocr - to enable or disable hocr outputpublic void setTessVariable(java.lang.String key,
java.lang.String value)
setTessVariable in interface ITesseractkey - variable name, e.g., tessedit_create_hocr,
tessedit_char_whitelist, etc.value - value for corresponding variable, e.g., "1", "0",
"0123456789", etc.public void setConfigs(java.util.List<java.lang.String> configs)
Init method.setConfigs in interface ITesseractconfigs - list of config filenames, e.g., "digits", "bazaar", "quiet"public java.lang.String doOCR(java.io.File imageFile)
throws TesseractException
doOCR in interface ITesseractimageFile - an image fileTesseractExceptionpublic java.lang.String doOCR(java.io.File imageFile,
java.awt.Rectangle rect)
throws TesseractException
doOCR in interface ITesseractimageFile - an image filerect - the bounding rectangle defines the region of the image to be
recognized. A rectangle of zero dimension or null indicates
the whole image.TesseractExceptionpublic java.lang.String doOCR(java.awt.image.BufferedImage bi)
throws TesseractException
doOCR in interface ITesseractbi - a buffered imageTesseractExceptionpublic java.lang.String doOCR(java.awt.image.BufferedImage bi,
java.awt.Rectangle rect)
throws TesseractException
doOCR in interface ITesseractbi - a buffered imagerect - the bounding rectangle defines the region of the image to be
recognized. A rectangle of zero dimension or null indicates
the whole image.TesseractExceptionpublic java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList,
java.awt.Rectangle rect)
throws TesseractException
doOCR in interface ITesseractimageList - a list of IIOImage objectsrect - the bounding rectangle defines the region of the image to be
recognized. A rectangle of zero dimension or null indicates
the whole image.TesseractExceptionpublic java.lang.String doOCR(int xsize,
int ysize,
java.nio.ByteBuffer buf,
java.awt.Rectangle rect,
int bpp)
throws TesseractException
SetImage, (optionally)
SetRectangle, and one or more of the Get*Text
functions.doOCR in interface ITesseractxsize - width of imageysize - height of imagebuf - pixel datarect - the bounding rectangle defines the region of the image to be
recognized. A rectangle of zero dimension or null indicates
the whole image.bpp - bits per pixel, represents the bit depth of the image, with 1
for binary bitmap, 8 for gray, and 24 for color RGB.TesseractException