Package net.sourceforge.tess4j
Interface ITesseract
-
- All Known Implementing Classes:
Tesseract,Tesseract1
public interface ITesseractAn interface represents common OCR methods.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static classITesseract.RenderedFormatRendered formats supported by Tesseract.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringhtmlBeginTagstatic java.lang.StringhtmlEndTag
-
Method Summary
Modifier and Type Method Description voidcreateDocuments(java.lang.String[] filenames, java.lang.String[] outputbases, java.util.List<ITesseract.RenderedFormat> formats)Creates documents for given renderers.voidcreateDocuments(java.lang.String filename, java.lang.String outputbase, java.util.List<ITesseract.RenderedFormat> formats)Creates documents for given renderers.java.util.List<OCRResult>createDocumentsWithResults(java.awt.image.BufferedImage[] bis, java.lang.String[] filenames, java.lang.String[] outputbases, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel)Creates documents with OCR results for given renderers at specified page iterator level.OCRResultcreateDocumentsWithResults(java.awt.image.BufferedImage bi, java.lang.String filename, java.lang.String outputbase, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel)Creates documents with OCR result for given renderers at specified page iterator level.java.util.List<OCRResult>createDocumentsWithResults(java.lang.String[] filenames, java.lang.String[] outputbases, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel)Creates documents with OCR results for given renderers at specified page iterator level.OCRResultcreateDocumentsWithResults(java.lang.String filename, java.lang.String outputbase, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel)Creates documents with OCR result for given renderers at specified page iterator level.java.lang.StringdoOCR(int xsize, int ysize, java.nio.ByteBuffer buf, java.awt.Rectangle rect, int bpp)Performs OCR operation.java.lang.StringdoOCR(int xsize, int ysize, java.nio.ByteBuffer buf, java.lang.String filename, java.awt.Rectangle rect, int bpp)Performs OCR operation.java.lang.StringdoOCR(java.awt.image.BufferedImage bi)Performs OCR operation.java.lang.StringdoOCR(java.awt.image.BufferedImage bi, java.awt.Rectangle rect)Performs OCR operation.java.lang.StringdoOCR(java.io.File imageFile)Performs OCR operation.java.lang.StringdoOCR(java.io.File imageFile, java.awt.Rectangle rect)Performs OCR operation.java.lang.StringdoOCR(java.util.List<javax.imageio.IIOImage> imageList, java.awt.Rectangle rect)Performs OCR operation.java.lang.StringdoOCR(java.util.List<javax.imageio.IIOImage> imageList, java.lang.String filename, java.awt.Rectangle rect)Performs OCR operation.java.util.List<java.awt.Rectangle>getSegmentedRegions(java.awt.image.BufferedImage bi, int pageIteratorLevel)Gets segmented regions at specified page iterator level.java.util.List<Word>getWords(java.awt.image.BufferedImage bi, int pageIteratorLevel)Gets recognized words at specified page iterator level.voidsetConfigs(java.util.List<java.lang.String> configs)Sets configs to be passed to Tesseract'sInitmethod.voidsetDatapath(java.lang.String datapath)Sets tessdata path.voidsetLanguage(java.lang.String language)Sets language for OCR.voidsetOcrEngineMode(int ocrEngineMode)Sets OCR engine mode.voidsetPageSegMode(int mode)Sets page segmentation mode.voidsetTessVariable(java.lang.String key, java.lang.String value)Sets the value of Tesseract's internal parameter.
-
-
-
Field Detail
-
htmlBeginTag
static final java.lang.String htmlBeginTag
- See Also:
- Constant Field Values
-
htmlEndTag
static final java.lang.String htmlEndTag
- See Also:
- Constant Field Values
-
-
Method Detail
-
doOCR
java.lang.String doOCR(java.io.File imageFile) throws TesseractExceptionPerforms OCR operation.- Parameters:
imageFile- an image file- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(java.io.File imageFile, java.awt.Rectangle rect) throws TesseractExceptionPerforms OCR operation.- Parameters:
imageFile- an image filerect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(java.awt.image.BufferedImage bi) throws TesseractExceptionPerforms OCR operation.- Parameters:
bi- a buffered image- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(java.awt.image.BufferedImage bi, java.awt.Rectangle rect) throws TesseractExceptionPerforms OCR operation.- Parameters:
bi- a buffered imagerect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList, java.awt.Rectangle rect) throws TesseractExceptionPerforms OCR operation.- Parameters:
imageList- a list ofIIOImageobjectsrect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(java.util.List<javax.imageio.IIOImage> imageList, java.lang.String filename, java.awt.Rectangle rect) throws TesseractExceptionPerforms OCR operation.- Parameters:
imageList- a list ofIIOImageobjectsfilename- input file name. Needed only for training and reading a UNLV zone file.rect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(int xsize, int ysize, java.nio.ByteBuffer buf, java.awt.Rectangle rect, int bpp) throws TesseractExceptionPerforms OCR operation. UseSetImage, (optionally)SetRectangle, and one or more of theGet*Textfunctions.- Parameters:
xsize- width of imageysize- height of imagebuf- pixel datarect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.bpp- bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
java.lang.String doOCR(int xsize, int ysize, java.nio.ByteBuffer buf, java.lang.String filename, java.awt.Rectangle rect, int bpp) throws TesseractExceptionPerforms OCR operation. UseSetImage, (optionally)SetRectangle, and one or more of theGet*Textfunctions.- Parameters:
xsize- width of imageysize- height of imagebuf- pixel datafilename- input file name. Needed only for training and reading a UNLV zone file.rect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.bpp- bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.- Returns:
- the recognized text
- Throws:
TesseractException
-
setDatapath
void setDatapath(java.lang.String datapath)
Sets tessdata path.- Parameters:
datapath- the tessdata path to set
-
setLanguage
void setLanguage(java.lang.String language)
Sets language for OCR.- Parameters:
language- the language code, which follows ISO 639-3 standard.
-
setOcrEngineMode
void setOcrEngineMode(int ocrEngineMode)
Sets OCR engine mode.- Parameters:
ocrEngineMode- the OcrEngineMode to set
-
setPageSegMode
void setPageSegMode(int mode)
Sets page segmentation mode.- Parameters:
mode- the page segmentation mode to set
-
setTessVariable
void setTessVariable(java.lang.String key, java.lang.String value)Sets the value of Tesseract's internal parameter.- Parameters:
key- variable name, e.g.,tessedit_create_hocr,tessedit_char_whitelist, etc.value- value for corresponding variable, e.g., "1", "0", "0123456789", etc.
-
setConfigs
void setConfigs(java.util.List<java.lang.String> configs)
Sets configs to be passed to Tesseract'sInitmethod.- Parameters:
configs- list of config filenames, e.g., "digits", "bazaar", "quiet"
-
createDocuments
void createDocuments(java.lang.String filename, java.lang.String outputbase, java.util.List<ITesseract.RenderedFormat> formats) throws TesseractExceptionCreates documents for given renderers.- Parameters:
filename- input imageoutputbase- output filename without extensionformats- types of renderers- Throws:
TesseractException
-
createDocuments
void createDocuments(java.lang.String[] filenames, java.lang.String[] outputbases, java.util.List<ITesseract.RenderedFormat> formats) throws TesseractExceptionCreates documents for given renderers.- Parameters:
filenames- array of input filesoutputbases- array of output filenames without extensionformats- types of renderers- Throws:
TesseractException
-
createDocumentsWithResults
OCRResult createDocumentsWithResults(java.awt.image.BufferedImage bi, java.lang.String filename, java.lang.String outputbase, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException
Creates documents with OCR result for given renderers at specified page iterator level.- Parameters:
bi- input buffered imagefilename- filename (optional)outputbase- output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- OCR result
- Throws:
TesseractException
-
createDocumentsWithResults
java.util.List<OCRResult> createDocumentsWithResults(java.awt.image.BufferedImage[] bis, java.lang.String[] filenames, java.lang.String[] outputbases, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException
Creates documents with OCR results for given renderers at specified page iterator level.- Parameters:
bis- array of input buffered imagesfilenames- array of filenamesoutputbases- array of output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of OCR results
- Throws:
TesseractException
-
createDocumentsWithResults
OCRResult createDocumentsWithResults(java.lang.String filename, java.lang.String outputbase, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException
Creates documents with OCR result for given renderers at specified page iterator level.- Parameters:
filename- input fileoutputbase- output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- OCR result
- Throws:
TesseractException
-
createDocumentsWithResults
java.util.List<OCRResult> createDocumentsWithResults(java.lang.String[] filenames, java.lang.String[] outputbases, java.util.List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException
Creates documents with OCR results for given renderers at specified page iterator level.- Parameters:
filenames- array of input filesoutputbases- array of output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of OCR results
- Throws:
TesseractException
-
getSegmentedRegions
java.util.List<java.awt.Rectangle> getSegmentedRegions(java.awt.image.BufferedImage bi, int pageIteratorLevel) throws TesseractExceptionGets segmented regions at specified page iterator level.- Parameters:
bi- input buffered imagepageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of
Rectangle - Throws:
TesseractException
-
getWords
java.util.List<Word> getWords(java.awt.image.BufferedImage bi, int pageIteratorLevel)
Gets recognized words at specified page iterator level.- Parameters:
bi- input buffered imagepageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of
Word
-
-