Package net.sourceforge.tess4j
Interface ITesseract
- All Known Implementing Classes:
Tesseract,Tesseract1
public interface ITesseract
An interface represents common OCR methods.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enumRendered formats supported by Tesseract. -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionvoidcreateDocuments(String[] filenames, String[] outputbases, List<ITesseract.RenderedFormat> formats) Creates documents for given renderers.default voidcreateDocuments(String filename, String outputbase, List<ITesseract.RenderedFormat> formats) Creates documents for given renderers.createDocumentsWithResults(BufferedImage[] bis, String[] filenames, String[] outputbases, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) Creates documents with OCR results for given renderers at specified page iterator level.createDocumentsWithResults(BufferedImage bi, String filename, String outputbase, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) Creates documents with OCR result for given renderers at specified page iterator level.createDocumentsWithResults(String[] filenames, String[] outputbases, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) Creates documents with OCR results for given renderers at specified page iterator level.createDocumentsWithResults(String filename, String outputbase, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) Creates documents with OCR result for given renderers at specified page iterator level.doOCR(int xsize, int ysize, ByteBuffer buf, int bpp, String filename, List<Rectangle> rects) Performs OCR operation.default StringdoOCR(int xsize, int ysize, ByteBuffer buf, Rectangle rect, int bpp) Deprecated.default StringdoOCR(int xsize, int ysize, ByteBuffer buf, String filename, Rectangle rect, int bpp) Deprecated.default StringdoOCR(BufferedImage bi) Performs OCR operation.default StringdoOCR(BufferedImage bi, Rectangle rect) Deprecated.default StringdoOCR(BufferedImage bi, String filename, List<Rectangle> rects) Performs OCR operation.default StringPerforms OCR operation.default StringDeprecated.Performs OCR operation.default StringDeprecated.default StringDeprecated.Performs OCR operation.getOSD(BufferedImage bi) Gets the detected orientation of the input image and apparent script (alphabet).Gets the detected orientation of the input image and apparent script (alphabet).getSegmentedRegions(BufferedImage bi, int pageIteratorLevel) Gets segmented regions at specified page iterator level.getWords(BufferedImage bi, int pageIteratorLevel) Gets recognized words at specified page iterator level.getWords(List<BufferedImage> biList, int pageIteratorLevel) Gets recognized words at specified page iterator level.voidsetConfigs(List<String> configs) Sets configs to be passed to Tesseract'sInitmethod.voidsetDatapath(String datapath) Sets tessdata path.voidsetLanguage(String language) Sets language for OCR.voidsetOcrEngineMode(int ocrEngineMode) Sets OCR engine mode.voidsetPageSegMode(int mode) Sets page segmentation mode.default voidsetTessVariable(String key, String value) Deprecated.voidsetVariable(String key, String value) Sets the value of Tesseract's internal parameter.
-
Field Details
-
htmlBeginTag
- See Also:
-
htmlEndTag
- See Also:
-
PAGE_SEPARATOR
- See Also:
-
DOCUMENT_TITLE
- See Also:
-
-
Method Details
-
doOCR
Performs OCR operation.- Parameters:
imageFile- an image file- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
Deprecated.Performs OCR operation.- Parameters:
inputFile- an image filerect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
Performs OCR operation.- Parameters:
imageFile- an image filerects- list of the bounding rectangles defines the regions of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
Performs OCR operation.- Parameters:
bi- a buffered image- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
Deprecated.Performs OCR operation.- Parameters:
bi- a buffered imagerect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
default String doOCR(BufferedImage bi, String filename, List<Rectangle> rects) throws TesseractException Performs OCR operation.- Parameters:
bi- a buffered imagefilename- input file name. Needed only for training and reading a UNLV zone file.rects- list of the bounding rectangles defines the regions of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
@Deprecated default String doOCR(List<IIOImage> imageList, Rectangle rect) throws TesseractException Deprecated.Performs OCR operation.- Parameters:
imageList- a list ofIIOImageobjectsrect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
@Deprecated default String doOCR(List<IIOImage> imageList, String filename, Rectangle rect) throws TesseractException Deprecated.Performs OCR operation.- Parameters:
imageList- a list ofIIOImageobjectsfilename- input file name. Needed only for training and reading a UNLV zone file.rect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
String doOCR(List<IIOImage> imageList, String filename, List<List<Rectangle>> roiss) throws TesseractException Performs OCR operation.- Parameters:
imageList- a list ofIIOImageobjectsfilename- input file name. Needed only for training and reading a UNLV zone file.roiss- list of list of the bounding rectangles defines the regions of the images to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
@Deprecated default String doOCR(int xsize, int ysize, ByteBuffer buf, Rectangle rect, int bpp) throws TesseractException Deprecated.Performs OCR operation. UseSetImage, (optionally)SetRectangle, and one or more of theGet*Textfunctions.- Parameters:
xsize- width of imageysize- height of imagebuf- pixel datarect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.bpp- bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
@Deprecated default String doOCR(int xsize, int ysize, ByteBuffer buf, String filename, Rectangle rect, int bpp) throws TesseractException Deprecated.Performs OCR operation. UseSetImage, (optionally)SetRectangle, and one or more of theGet*Textfunctions.- Parameters:
xsize- width of imageysize- height of imagebuf- pixel datafilename- input file name. Needed only for training and reading a UNLV zone file.rect- the bounding rectangle defines the region of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.bpp- bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.- Returns:
- the recognized text
- Throws:
TesseractException
-
doOCR
String doOCR(int xsize, int ysize, ByteBuffer buf, int bpp, String filename, List<Rectangle> rects) throws TesseractException Performs OCR operation. UseSetImage, (optionally)SetRectangle, and one or more of theGet*Textfunctions.- Parameters:
xsize- width of imageysize- height of imagebuf- pixel databpp- bits per pixel, represents the bit depth of the image, with 1 for binary bitmap, 8 for gray, and 24 for color RGB.filename- input file name. Needed only for training and reading a UNLV zone file.rects- list of the bounding rectangles defines the regions of the image to be recognized. A rectangle of zero dimension ornullindicates the whole image.- Returns:
- the recognized text
- Throws:
TesseractException
-
setDatapath
Sets tessdata path.- Parameters:
datapath- the tessdata path to set
-
setLanguage
Sets language for OCR.- Parameters:
language- the language code, which follows ISO 639-3 standard.
-
setOcrEngineMode
void setOcrEngineMode(int ocrEngineMode) Sets OCR engine mode.- Parameters:
ocrEngineMode- the OcrEngineMode to set
-
setPageSegMode
void setPageSegMode(int mode) Sets page segmentation mode.- Parameters:
mode- the page segmentation mode to set
-
setTessVariable
Deprecated.UsesetVariable(java.lang.String,java.lang.String)instead.Sets the value of Tesseract's internal parameter.- Parameters:
key- variable name, e.g.,tessedit_create_hocr,tessedit_char_whitelist, etc.value- value for corresponding variable, e.g., "1", "0", "0123456789", etc.
-
setVariable
Sets the value of Tesseract's internal parameter.- Parameters:
key- variable name, e.g.,tessedit_create_hocr,tessedit_char_whitelist, etc.value- value for corresponding variable, e.g., "1", "0", "0123456789", etc.
-
setConfigs
Sets configs to be passed to Tesseract'sInitmethod.- Parameters:
configs- list of config filenames, e.g., "digits", "bazaar", "quiet"
-
createDocuments
default void createDocuments(String filename, String outputbase, List<ITesseract.RenderedFormat> formats) throws TesseractException Creates documents for given renderers.- Parameters:
filename- input imageoutputbase- output filename without extensionformats- types of renderers- Throws:
TesseractException
-
createDocuments
void createDocuments(String[] filenames, String[] outputbases, List<ITesseract.RenderedFormat> formats) throws TesseractException Creates documents for given renderers.- Parameters:
filenames- array of input filesoutputbases- array of output filenames without extensionformats- types of renderers- Throws:
TesseractException
-
createDocumentsWithResults
OCRResult createDocumentsWithResults(BufferedImage bi, String filename, String outputbase, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException Creates documents with OCR result for given renderers at specified page iterator level.- Parameters:
bi- input buffered imagefilename- filename (optional)outputbase- output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- OCR result
- Throws:
TesseractException
-
createDocumentsWithResults
List<OCRResult> createDocumentsWithResults(BufferedImage[] bis, String[] filenames, String[] outputbases, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException Creates documents with OCR results for given renderers at specified page iterator level.- Parameters:
bis- array of input buffered imagesfilenames- array of filenamesoutputbases- array of output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of OCR results
- Throws:
TesseractException
-
createDocumentsWithResults
OCRResult createDocumentsWithResults(String filename, String outputbase, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException Creates documents with OCR result for given renderers at specified page iterator level.- Parameters:
filename- input fileoutputbase- output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- OCR result
- Throws:
TesseractException
-
createDocumentsWithResults
List<OCRResult> createDocumentsWithResults(String[] filenames, String[] outputbases, List<ITesseract.RenderedFormat> formats, int pageIteratorLevel) throws TesseractException Creates documents with OCR results for given renderers at specified page iterator level.- Parameters:
filenames- array of input filesoutputbases- array of output filenames without extensionformats- types of rendererpageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of OCR results
- Throws:
TesseractException
-
getSegmentedRegions
List<Rectangle> getSegmentedRegions(BufferedImage bi, int pageIteratorLevel) throws TesseractException Gets segmented regions at specified page iterator level.- Parameters:
bi- input buffered imagepageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of
Rectangle - Throws:
TesseractException
-
getWords
Gets recognized words at specified page iterator level.- Parameters:
bi- input buffered imagepageIteratorLevel- TessPageIteratorLevel enum- Returns:
- list of
Word
-
getWords
Gets recognized words at specified page iterator level.- Parameters:
biList- list of input buffered imagespageIteratorLevel-- Returns:
- list of
Word
-
getOSD
Gets the detected orientation of the input image and apparent script (alphabet).- Parameters:
imageFile- an image file- Returns:
- image orientation and script name
-
getOSD
Gets the detected orientation of the input image and apparent script (alphabet).- Parameters:
bi- a buffered image- Returns:
- image orientation and script name
-
setVariable(java.lang.String,java.lang.String)instead.