Package net.sourceforge.tess4j
Interface TessAPI
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface net.sourceforge.tess4j.ITessAPI
ITessAPI.CANCEL_FUNC, ITessAPI.EANYCODE_CHAR, ITessAPI.ETEXT_DESC, ITessAPI.TessBaseAPI, ITessAPI.TessCancelFunc, ITessAPI.TessChoiceIterator, ITessAPI.TessMutableIterator, ITessAPI.TessOcrEngineMode, ITessAPI.TessOrientation, ITessAPI.TessPageIterator, ITessAPI.TessPageIteratorLevel, ITessAPI.TessPageSegMode, ITessAPI.TessParagraphJustification, ITessAPI.TessPolyBlockType, ITessAPI.TessProgressFunc, ITessAPI.TessResultIterator, ITessAPI.TessResultRenderer, ITessAPI.TessTextlineOrder, ITessAPI.TessWritingDirection, ITessAPI.TimeVal
-
-
Method Summary
Modifier and Type Method Description ITessAPI.TessResultRenderer
TessAltoRendererCreate(java.lang.String outputbase)
int
TessBaseAPIAdaptToWordStr(ITessAPI.TessBaseAPI handle, int mode, java.lang.String wordstr)
Applies the given word to the adaptive classifier if possible.com.sun.jna.ptr.IntByReference
TessBaseAPIAllWordConfidences(ITessAPI.TessBaseAPI handle)
Returns an array of all word confidences, terminated by -1.ITessAPI.TessPageIterator
TessBaseAPIAnalyseLayout(ITessAPI.TessBaseAPI handle)
Runs page layout analysis in the mode set bySetPageSegMode
.void
TessBaseAPIClear(ITessAPI.TessBaseAPI handle)
Free up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload.void
TessBaseAPIClearAdaptiveClassifier(ITessAPI.TessBaseAPI handle)
Call between pages or documents etc to free up memory and forget adaptive data.void
TessBaseAPIClearPersistentCache(ITessAPI.TessBaseAPI handle)
Clear any library-level memory caches.ITessAPI.TessBaseAPI
TessBaseAPICreate()
Creates an instance of the base class for all Tesseract APIs.void
TessBaseAPIDelete(ITessAPI.TessBaseAPI handle)
Disposes the TesseractAPI instance.int
TessBaseAPIDetectOrientationScript(ITessAPI.TessBaseAPI handle, java.nio.IntBuffer orient_deg, java.nio.FloatBuffer orient_conf, com.sun.jna.ptr.PointerByReference script_name, java.nio.FloatBuffer script_conf)
Detect the orientation of the input image and apparent script (alphabet).void
TessBaseAPIEnd(ITessAPI.TessBaseAPI handle)
Close down tesseract and free up all memory.com.sun.jna.Pointer
TessBaseAPIGetAltoText(ITessAPI.TessBaseAPI handle, int page_number)
Make an XML-formatted string with Alto markup from the internal data structures.com.sun.jna.ptr.PointerByReference
TessBaseAPIGetAvailableLanguagesAsVector(ITessAPI.TessBaseAPI handle)
Returns the available languages in the vector of STRINGs.int
TessBaseAPIGetBoolVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.nio.IntBuffer value)
Get the value of an internal bool parameter.com.sun.jna.Pointer
TessBaseAPIGetBoxText(ITessAPI.TessBaseAPI handle, int page_number)
The recognized text is returned as a char* which is coded as a UTF8 box file and must be freed with the delete [] operator.net.sourceforge.lept4j.Boxa
TessBaseAPIGetComponentImages(ITessAPI.TessBaseAPI handle, int level, int text_only, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids)
Get the given level kind of components (block, textline, word etc.) as a Leptonica-styleBoxa
,Pixa
pair, in reading order.net.sourceforge.lept4j.Boxa
TessBaseAPIGetComponentImages1(ITessAPI.TessBaseAPI handle, int level, int text_only, int raw_image, int raw_padding, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids, com.sun.jna.ptr.PointerByReference paraids)
Get the given level kind of components (block, textline, word etc.) as a Leptonica-styleBoxa
,Pixa
pair, in reading order.net.sourceforge.lept4j.Boxa
TessBaseAPIGetConnectedComponents(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference cc)
Gets the individual connected (text) components (created after pages segmentation step, but before recognition) as a Leptonica-styleBoxa
,Pixa
pair, in reading order.java.lang.String
TessBaseAPIGetDatapath(ITessAPI.TessBaseAPI handle)
int
TessBaseAPIGetDoubleVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.nio.DoubleBuffer value)
Get the value of an internal double parameter.com.sun.jna.Pointer
TessBaseAPIGetHOCRText(ITessAPI.TessBaseAPI handle, int page_number)
Make a HTML-formatted string with hOCR markup from the internal data structures.java.lang.String
TessBaseAPIGetInitLanguagesAsString(ITessAPI.TessBaseAPI handle)
Returns the languages string used in the last valid initialization.net.sourceforge.lept4j.Pix
TessBaseAPIGetInputImage(ITessAPI.TessBaseAPI handle)
java.lang.String
TessBaseAPIGetInputName(ITessAPI.TessBaseAPI handle)
These functions are required for searchable PDF output.int
TessBaseAPIGetIntVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.nio.IntBuffer value)
Get the value of an internal int parameter.ITessAPI.TessResultIterator
TessBaseAPIGetIterator(ITessAPI.TessBaseAPI handle)
Get a reading-order iterator to the results of LayoutAnalysis and/orRecognize
.com.sun.jna.ptr.PointerByReference
TessBaseAPIGetLoadedLanguagesAsVector(ITessAPI.TessBaseAPI handle)
Returns the loaded languages in the vector of STRINGs.com.sun.jna.Pointer
TessBaseAPIGetLSTMBoxText(ITessAPI.TessBaseAPI handle, int page_number)
Create a UTF8 box file for LSTM training from the internal data structures.ITessAPI.TessMutableIterator
TessBaseAPIGetMutableIterator(ITessAPI.TessBaseAPI handle)
Get a mutable iterator to the results of LayoutAnalysis and/orRecognize
.int
TessBaseAPIGetPageSegMode(ITessAPI.TessBaseAPI handle)
Return the current page segmentation mode.net.sourceforge.lept4j.Boxa
TessBaseAPIGetRegions(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa)
Get the result of page layout analysis as a Leptonica-styleBoxa
,Pixa
pair, in reading order.int
TessBaseAPIGetSourceYResolution(ITessAPI.TessBaseAPI handle)
java.lang.String
TessBaseAPIGetStringVariable(ITessAPI.TessBaseAPI handle, java.lang.String name)
Get the value of an internal string parameter.net.sourceforge.lept4j.Boxa
TessBaseAPIGetStrips(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids)
Get textlines and strips of image regions as a Leptonica-styleBoxa
,Pixa
pair, in reading order.int
TessBaseAPIGetTextDirection(ITessAPI.TessBaseAPI handle, java.nio.IntBuffer out_offset, java.nio.FloatBuffer out_slope)
Gets text direction.net.sourceforge.lept4j.Boxa
TessBaseAPIGetTextlines(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids)
Get the textlines as a Leptonica-styleBoxa
,Pixa
pair, in reading order.net.sourceforge.lept4j.Boxa
TessBaseAPIGetTextlines1(ITessAPI.TessBaseAPI handle, int raw_image, int raw_padding, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids, com.sun.jna.ptr.PointerByReference paraids)
Get the textlines as a Leptonica-styleBoxa
,Pixa
pair, in reading order.net.sourceforge.lept4j.Pix
TessBaseAPIGetThresholdedImage(ITessAPI.TessBaseAPI handle)
ONLY available afterSetImage
if you have Leptonica installed.int
TessBaseAPIGetThresholdedImageScaleFactor(ITessAPI.TessBaseAPI handle)
com.sun.jna.Pointer
TessBaseAPIGetTsvText(ITessAPI.TessBaseAPI handle, int page_number)
Make a TSV-formatted string from the internal data structures.java.lang.String
TessBaseAPIGetUnichar(ITessAPI.TessBaseAPI handle, int unichar_id)
Gets the string of the specified unichar.com.sun.jna.Pointer
TessBaseAPIGetUNLVText(ITessAPI.TessBaseAPI handle)
The recognized text is returned as a char* which is coded as UNLV format Latin-1 with specific reject and suspect codes and must be freed with the delete [] operator.com.sun.jna.Pointer
TessBaseAPIGetUTF8Text(ITessAPI.TessBaseAPI handle)
The recognized text is returned as a char* which is coded as UTF-8 and must be freed with the delete [] operator.net.sourceforge.lept4j.Boxa
TessBaseAPIGetWords(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa)
Get the words as a Leptonica-styleBoxa
,Pixa
pair, in reading order.com.sun.jna.Pointer
TessBaseAPIGetWordStrBoxText(ITessAPI.TessBaseAPI handle, int page_number)
Create a UTF8 box file with WordStr strings from the internal data structures.int
TessBaseAPIInit1(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language, int oem, com.sun.jna.ptr.PointerByReference configs, int configs_size)
Instances are now mostly thread-safe and totally independent, but some global parameters remain.int
TessBaseAPIInit2(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language, int oem)
int
TessBaseAPIInit3(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language)
int
TessBaseAPIInit4(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language, int oem, com.sun.jna.ptr.PointerByReference configs, int configs_size, com.sun.jna.ptr.PointerByReference vars_vec, com.sun.jna.ptr.PointerByReference vars_values, com.ochafik.lang.jnaerator.runtime.NativeSize vars_vec_size, int set_only_non_debug_params)
void
TessBaseAPIInitForAnalysePage(ITessAPI.TessBaseAPI handle)
Init only for page layout analysis.int
TessBaseAPIInitLangMod(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language)
Init only the lang model component of Tesseract.int
TessBaseAPIIsValidWord(ITessAPI.TessBaseAPI handle, java.lang.String word)
Check whether a word is valid according to Tesseract's language model.int
TessBaseAPIMeanTextConf(ITessAPI.TessBaseAPI handle)
Returns the average word confidence for Tesseract page result.void
TessBaseAPIPrintVariablesToFile(ITessAPI.TessBaseAPI handle, java.lang.String filename)
Print Tesseract parameters to the given file.
Note: Must not be the first method called after instance create.int
TessBaseAPIProcessPage(ITessAPI.TessBaseAPI handle, net.sourceforge.lept4j.Pix pix, int page_index, java.lang.String filename, java.lang.String retry_config, int timeout_millisec, ITessAPI.TessResultRenderer renderer)
int
TessBaseAPIProcessPages(ITessAPI.TessBaseAPI handle, java.lang.String filename, java.lang.String retry_config, int timeout_millisec, ITessAPI.TessResultRenderer renderer)
Recognizes all the pages in the named file, as a multi-page tiff or list of filenames, or single image, and gets the appropriate kind of text according to parameters:tessedit_create_boxfile
,tessedit_make_boxes_from_boxes
,tessedit_write_unlv
,tessedit_create_hocr
.void
TessBaseAPIReadConfigFile(ITessAPI.TessBaseAPI handle, java.lang.String filename, int init_only)
Read a "config" file containing a set of param, value pairs.int
TessBaseAPIRecognize(ITessAPI.TessBaseAPI handle, ITessAPI.ETEXT_DESC monitor)
Recognize the image fromSetAndThresholdImage
, generating Tesseract internal structures.int
TessBaseAPIRecognizeForChopTest(ITessAPI.TessBaseAPI handle, ITessAPI.ETEXT_DESC monitor)
Variant onRecognize
used for testing chopper.com.sun.jna.Pointer
TessBaseAPIRect(ITessAPI.TessBaseAPI handle, java.nio.ByteBuffer imagedata, int bytes_per_pixel, int bytes_per_line, int left, int top, int width, int height)
Recognize a rectangle from an image and return the result as a string.void
TessBaseAPISetImage(ITessAPI.TessBaseAPI handle, java.nio.ByteBuffer imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line)
Provide an image for Tesseract to recognize.void
TessBaseAPISetImage2(ITessAPI.TessBaseAPI handle, net.sourceforge.lept4j.Pix pix)
Provide an image for Tesseract to recognize.void
TessBaseAPISetInputImage(ITessAPI.TessBaseAPI handle, net.sourceforge.lept4j.Pix pix)
void
TessBaseAPISetInputName(ITessAPI.TessBaseAPI handle, java.lang.String name)
Set the name of the input file.void
TessBaseAPISetOutputName(ITessAPI.TessBaseAPI handle, java.lang.String name)
Set the name of the bonus output files.void
TessBaseAPISetPageSegMode(ITessAPI.TessBaseAPI handle, int mode)
Set the current page segmentation mode.void
TessBaseAPISetRectangle(ITessAPI.TessBaseAPI handle, int left, int top, int width, int height)
Restrict recognition to a sub-rectangle of the image.void
TessBaseAPISetSourceResolution(ITessAPI.TessBaseAPI handle, int ppi)
Set the resolution of the source image in pixels per inch so font size information can be calculated in results.int
TessBaseAPISetVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.lang.String value)
Set the value of an internal "parameter." Supply the name of the parameter and the value as a string, just as you would in a config file.ITessAPI.TessResultRenderer
TessBoxTextRendererCreate(java.lang.String outputbase)
float
TessChoiceIteratorConfidence(ITessAPI.TessChoiceIterator handle)
void
TessChoiceIteratorDelete(ITessAPI.TessChoiceIterator handle)
java.lang.String
TessChoiceIteratorGetUTF8Text(ITessAPI.TessChoiceIterator handle)
int
TessChoiceIteratorNext(ITessAPI.TessChoiceIterator handle)
void
TessDeleteIntArray(java.nio.IntBuffer arr)
Deallocates the memory block occupied by integer array.void
TessDeleteResultRenderer(ITessAPI.TessResultRenderer renderer)
void
TessDeleteText(com.sun.jna.Pointer text)
Deallocates the memory block occupied by text.void
TessDeleteTextArray(com.sun.jna.ptr.PointerByReference arr)
Deallocates the memory block occupied by text array.ITessAPI.TessResultRenderer
TessHOcrRendererCreate(java.lang.String outputbase)
ITessAPI.TessResultRenderer
TessHOcrRendererCreate2(java.lang.String outputbase, int font_info)
ITessAPI.TessResultRenderer
TessLSTMBoxRendererCreate(java.lang.String outputbase)
ITessAPI.ETEXT_DESC
TessMonitorCreate()
void
TessMonitorDelete(ITessAPI.ETEXT_DESC monitor)
com.sun.jna.Pointer
TessMonitorGetCancelThis(ITessAPI.ETEXT_DESC monitor)
int
TessMonitorGetProgress(ITessAPI.ETEXT_DESC monitor)
void
TessMonitorSetCancelFunc(ITessAPI.ETEXT_DESC monitor, ITessAPI.TessCancelFunc cancelFunc)
void
TessMonitorSetCancelThis(ITessAPI.ETEXT_DESC monitor, com.sun.jna.Pointer cancelThis)
void
TessMonitorSetDeadlineMSecs(ITessAPI.ETEXT_DESC monitor, int deadline)
void
TessMonitorSetProgressFunc(ITessAPI.ETEXT_DESC monitor, ITessAPI.TessProgressFunc progressFunc)
int
TessPageIteratorBaseline(ITessAPI.TessPageIterator handle, int level, java.nio.IntBuffer x1, java.nio.IntBuffer y1, java.nio.IntBuffer x2, java.nio.IntBuffer y2)
Returns the baseline of the current object at the given level.void
TessPageIteratorBegin(ITessAPI.TessPageIterator handle)
Resets the iterator to point to the start of the page.int
TessPageIteratorBlockType(ITessAPI.TessPageIterator handle)
Returns the type of the current block.int
TessPageIteratorBoundingBox(ITessAPI.TessPageIterator handle, int level, java.nio.IntBuffer left, java.nio.IntBuffer top, java.nio.IntBuffer right, java.nio.IntBuffer bottom)
Returns the bounding rectangle of the current object at the given level in coordinates of the original image.ITessAPI.TessPageIterator
TessPageIteratorCopy(ITessAPI.TessPageIterator handle)
Creates a copy of the specified PageIterator instance.void
TessPageIteratorDelete(ITessAPI.TessPageIterator handle)
Deletes the specified PageIterator instance.net.sourceforge.lept4j.Pix
TessPageIteratorGetBinaryImage(ITessAPI.TessPageIterator handle, int level)
Returns a binary image of the current object at the given level.net.sourceforge.lept4j.Pix
TessPageIteratorGetImage(ITessAPI.TessPageIterator handle, int level, int padding, net.sourceforge.lept4j.Pix original_image, java.nio.IntBuffer left, java.nio.IntBuffer top)
Returns an image of the current object at the given level in greyscale if available in the input.int
TessPageIteratorIsAtBeginningOf(ITessAPI.TessPageIterator handle, int level)
Returns TRUE if the iterator is at the start of an object at the given level.int
TessPageIteratorIsAtFinalElement(ITessAPI.TessPageIterator handle, int level, int element)
Returns whether the iterator is positioned at the last element in a given level.int
TessPageIteratorNext(ITessAPI.TessPageIterator handle, int level)
Moves to the start of the next object at the given level in the page hierarchy, and returns false if the end of the page was reached.void
TessPageIteratorOrientation(ITessAPI.TessPageIterator handle, java.nio.IntBuffer orientation, java.nio.IntBuffer writing_direction, java.nio.IntBuffer textline_order, java.nio.FloatBuffer deskew_angle)
Returns the orientation.void
TessPageIteratorParagraphInfo(ITessAPI.TessPageIterator handle, java.nio.IntBuffer justification, java.nio.IntBuffer is_list_item, java.nio.IntBuffer is_crown, java.nio.IntBuffer first_line_indent)
Gets paragraph information.ITessAPI.TessResultRenderer
TessPDFRendererCreate(java.lang.String outputbase, java.lang.String datadir, int textonly)
float
TessResultIteratorConfidence(ITessAPI.TessResultIterator handle, int level)
Returns the mean confidence of the current object at the given level.ITessAPI.TessResultIterator
TessResultIteratorCopy(ITessAPI.TessResultIterator handle)
Creates a copy of the specified ResultIterator instance.void
TessResultIteratorDelete(ITessAPI.TessResultIterator handle)
Deletes the specified ResultIterator handle.ITessAPI.TessChoiceIterator
TessResultIteratorGetChoiceIterator(ITessAPI.TessResultIterator handle)
ITessAPI.TessPageIterator
TessResultIteratorGetPageIterator(ITessAPI.TessResultIterator handle)
Gets the PageIterator of the specified ResultIterator instance.ITessAPI.TessPageIterator
TessResultIteratorGetPageIteratorConst(ITessAPI.TessResultIterator handle)
Gets the PageIterator of the specified ResultIterator instance.com.sun.jna.Pointer
TessResultIteratorGetUTF8Text(ITessAPI.TessResultIterator handle, int level)
Returns the null terminated UTF-8 encoded text string for the current object at the given level.int
TessResultIteratorNext(ITessAPI.TessResultIterator handle, int level)
int
TessResultIteratorSymbolIsDropcap(ITessAPI.TessResultIterator handle)
Returns TRUE if the current symbol is a dropcap.int
TessResultIteratorSymbolIsSubscript(ITessAPI.TessResultIterator handle)
Returns TRUE if the current symbol is a subscript.int
TessResultIteratorSymbolIsSuperscript(ITessAPI.TessResultIterator handle)
Returns TRUE if the current symbol is a superscript.java.lang.String
TessResultIteratorWordFontAttributes(ITessAPI.TessResultIterator handle, java.nio.IntBuffer is_bold, java.nio.IntBuffer is_italic, java.nio.IntBuffer is_underlined, java.nio.IntBuffer is_monospace, java.nio.IntBuffer is_serif, java.nio.IntBuffer is_smallcaps, java.nio.IntBuffer pointsize, java.nio.IntBuffer font_id)
Returns the font attributes of the current word.int
TessResultIteratorWordIsFromDictionary(ITessAPI.TessResultIterator handle)
Returns TRUE if the current word was found in a dictionary.int
TessResultIteratorWordIsNumeric(ITessAPI.TessResultIterator handle)
Returns TRUE if the current word is numeric.java.lang.String
TessResultIteratorWordRecognitionLanguage(ITessAPI.TessResultIterator handle)
int
TessResultRendererAddImage(ITessAPI.TessResultRenderer renderer, com.sun.jna.ptr.PointerByReference api)
int
TessResultRendererBeginDocument(ITessAPI.TessResultRenderer renderer, java.lang.String title)
int
TessResultRendererEndDocument(ITessAPI.TessResultRenderer renderer)
com.sun.jna.Pointer
TessResultRendererExtention(ITessAPI.TessResultRenderer renderer)
int
TessResultRendererImageNum(ITessAPI.TessResultRenderer renderer)
void
TessResultRendererInsert(ITessAPI.TessResultRenderer renderer, ITessAPI.TessResultRenderer next)
ITessAPI.TessResultRenderer
TessResultRendererNext(ITessAPI.TessResultRenderer renderer)
com.sun.jna.Pointer
TessResultRendererTitle(ITessAPI.TessResultRenderer renderer)
ITessAPI.TessResultRenderer
TessTextRendererCreate(java.lang.String outputbase)
ITessAPI.TessResultRenderer
TessTsvRendererCreate(java.lang.String outputbase)
ITessAPI.TessResultRenderer
TessUnlvRendererCreate(java.lang.String outputbase)
java.lang.String
TessVersion()
Gets the version identifier.ITessAPI.TessResultRenderer
TessWordStrBoxRendererCreate(java.lang.String outputbase)
-
-
-
Field Detail
-
INSTANCE
static final TessAPI INSTANCE
An instance of the class library.
-
-
Method Detail
-
TessVersion
java.lang.String TessVersion()
Gets the version identifier.- Returns:
- the version identifier
-
TessDeleteText
void TessDeleteText(com.sun.jna.Pointer text)
Deallocates the memory block occupied by text.- Parameters:
text
- the pointer to text
-
TessDeleteTextArray
void TessDeleteTextArray(com.sun.jna.ptr.PointerByReference arr)
Deallocates the memory block occupied by text array.- Parameters:
arr
- text array pointer reference
-
TessDeleteIntArray
void TessDeleteIntArray(java.nio.IntBuffer arr)
Deallocates the memory block occupied by integer array.- Parameters:
arr
- int array
-
TessTextRendererCreate
ITessAPI.TessResultRenderer TessTextRendererCreate(java.lang.String outputbase)
-
TessHOcrRendererCreate
ITessAPI.TessResultRenderer TessHOcrRendererCreate(java.lang.String outputbase)
-
TessHOcrRendererCreate2
ITessAPI.TessResultRenderer TessHOcrRendererCreate2(java.lang.String outputbase, int font_info)
-
TessAltoRendererCreate
ITessAPI.TessResultRenderer TessAltoRendererCreate(java.lang.String outputbase)
-
TessTsvRendererCreate
ITessAPI.TessResultRenderer TessTsvRendererCreate(java.lang.String outputbase)
-
TessPDFRendererCreate
ITessAPI.TessResultRenderer TessPDFRendererCreate(java.lang.String outputbase, java.lang.String datadir, int textonly)
-
TessUnlvRendererCreate
ITessAPI.TessResultRenderer TessUnlvRendererCreate(java.lang.String outputbase)
-
TessBoxTextRendererCreate
ITessAPI.TessResultRenderer TessBoxTextRendererCreate(java.lang.String outputbase)
-
TessLSTMBoxRendererCreate
ITessAPI.TessResultRenderer TessLSTMBoxRendererCreate(java.lang.String outputbase)
-
TessWordStrBoxRendererCreate
ITessAPI.TessResultRenderer TessWordStrBoxRendererCreate(java.lang.String outputbase)
-
TessDeleteResultRenderer
void TessDeleteResultRenderer(ITessAPI.TessResultRenderer renderer)
-
TessResultRendererInsert
void TessResultRendererInsert(ITessAPI.TessResultRenderer renderer, ITessAPI.TessResultRenderer next)
-
TessResultRendererNext
ITessAPI.TessResultRenderer TessResultRendererNext(ITessAPI.TessResultRenderer renderer)
-
TessResultRendererBeginDocument
int TessResultRendererBeginDocument(ITessAPI.TessResultRenderer renderer, java.lang.String title)
-
TessResultRendererAddImage
int TessResultRendererAddImage(ITessAPI.TessResultRenderer renderer, com.sun.jna.ptr.PointerByReference api)
-
TessResultRendererEndDocument
int TessResultRendererEndDocument(ITessAPI.TessResultRenderer renderer)
-
TessResultRendererExtention
com.sun.jna.Pointer TessResultRendererExtention(ITessAPI.TessResultRenderer renderer)
-
TessResultRendererTitle
com.sun.jna.Pointer TessResultRendererTitle(ITessAPI.TessResultRenderer renderer)
-
TessResultRendererImageNum
int TessResultRendererImageNum(ITessAPI.TessResultRenderer renderer)
-
TessBaseAPICreate
ITessAPI.TessBaseAPI TessBaseAPICreate()
Creates an instance of the base class for all Tesseract APIs.- Returns:
- the TesseractAPI instance
-
TessBaseAPIDelete
void TessBaseAPIDelete(ITessAPI.TessBaseAPI handle)
Disposes the TesseractAPI instance.- Parameters:
handle
- the TesseractAPI instance
-
TessBaseAPISetInputName
void TessBaseAPISetInputName(ITessAPI.TessBaseAPI handle, java.lang.String name)
Set the name of the input file. Needed only for training and reading a UNLV zone file, and for searchable PDF output.- Parameters:
handle
- the TesseractAPI instancename
- name of the input file
-
TessBaseAPIGetInputName
java.lang.String TessBaseAPIGetInputName(ITessAPI.TessBaseAPI handle)
These functions are required for searchable PDF output. We need our hands on the input file so that we can include it in the PDF without transcoding. If that is not possible, we need the original image. Finally, resolution metadata is stored in the PDF so we need that as well.- Parameters:
handle
- the TesseractAPI instance- Returns:
- input file name
-
TessBaseAPISetInputImage
void TessBaseAPISetInputImage(ITessAPI.TessBaseAPI handle, net.sourceforge.lept4j.Pix pix)
-
TessBaseAPIGetInputImage
net.sourceforge.lept4j.Pix TessBaseAPIGetInputImage(ITessAPI.TessBaseAPI handle)
-
TessBaseAPIGetSourceYResolution
int TessBaseAPIGetSourceYResolution(ITessAPI.TessBaseAPI handle)
-
TessBaseAPIGetDatapath
java.lang.String TessBaseAPIGetDatapath(ITessAPI.TessBaseAPI handle)
-
TessBaseAPISetOutputName
void TessBaseAPISetOutputName(ITessAPI.TessBaseAPI handle, java.lang.String name)
Set the name of the bonus output files. Needed only for debugging.- Parameters:
handle
- the TesseractAPI instancename
- name of the output file
-
TessBaseAPISetVariable
int TessBaseAPISetVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.lang.String value)
Set the value of an internal "parameter." Supply the name of the parameter and the value as a string, just as you would in a config file. Returns false if the name lookup failed. E.g.,SetVariable("tessedit_char_blacklist", "xyz");
to ignore x, y and z. OrSetVariable("classify_bln_numeric_mode", "1");
to set numeric-only mode.SetVariable
may be used beforeInit
, but settings will revert to defaults onEnd()
.
Note: Must be called afterInit()
. Only works for non-init variables (init variables should be passed toInit()
).- Parameters:
handle
- the TesseractAPI instancename
- name of the inputvalue
- variable value- Returns:
- 1 on success
-
TessBaseAPIGetIntVariable
int TessBaseAPIGetIntVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.nio.IntBuffer value)
Get the value of an internal int parameter.- Parameters:
handle
- the TesseractAPI instancename
- name of the inputvalue
- pass the int buffer value- Returns:
- 1 on success
-
TessBaseAPIGetBoolVariable
int TessBaseAPIGetBoolVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.nio.IntBuffer value)
Get the value of an internal bool parameter.- Parameters:
handle
- the TesseractAPI instancename
- pass the name of the variablevalue
- pass the int buffer value- Returns:
- 1 on success
-
TessBaseAPIGetDoubleVariable
int TessBaseAPIGetDoubleVariable(ITessAPI.TessBaseAPI handle, java.lang.String name, java.nio.DoubleBuffer value)
Get the value of an internal double parameter.- Parameters:
handle
- the TesseractAPI instancename
- pass the name of the variablevalue
- pass the double buffer value- Returns:
- 1 on success
-
TessBaseAPIGetStringVariable
java.lang.String TessBaseAPIGetStringVariable(ITessAPI.TessBaseAPI handle, java.lang.String name)
Get the value of an internal string parameter.- Parameters:
handle
- the TesseractAPI instancename
- pass the name of the variable- Returns:
- the string value
-
TessBaseAPIPrintVariablesToFile
void TessBaseAPIPrintVariablesToFile(ITessAPI.TessBaseAPI handle, java.lang.String filename)
Print Tesseract parameters to the given file.
Note: Must not be the first method called after instance create.- Parameters:
handle
- the TesseractAPI instancefilename
- name of the file where the variables will be persisted
-
TessBaseAPIInit1
int TessBaseAPIInit1(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language, int oem, com.sun.jna.ptr.PointerByReference configs, int configs_size)
Instances are now mostly thread-safe and totally independent, but some global parameters remain. Basically it is safe to use multiple TessBaseAPIs in different threads in parallel, UNLESS you useSetVariable
on some of the Params in classify and textord. If you do, then the effect will be to change it for all your instances.
Start tesseract. Returns zero on success and -1 on failure. NOTE that the only members that may be called beforeInit
are those listed above here in the class definition.
It is entirely safe (and eventually will be efficient too) to callInit
multiple times on the same instance to change language, or just to reset the classifier. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. E.g., ifhin
were set to loadeng
by default, thenhin+~eng
would force loading onlyhin
. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words. WARNING: On changing languages, all Tesseract parameters are reset back to their default values. (Which may vary between languages.) If you have a rare need to set a Variable that controls initialization for a second call toInit
you should explicitly callEnd()
and then useSetVariable
beforeInit
.
This is only a very rare use case, since there are very few uses that require any parameters to be set beforeInit
.
Ifset_only_non_debug_params
is true, only params that do not contain "debug" in the name will be set.- Parameters:
handle
- the TesseractAPI instancedatapath
- Thedatapath
must be the name of the parent directory oftessdata
and must end in /. Any name after the last / will be stripped.language
- The language is (usually) anISO 639-3
string orNULL
will default toeng
. The language may be a string of the form [~]<lang>[+[~]<lang>] indicating that multiple languages are to be loaded. E.g.,hin+eng
will load Hindi and English.oem
- ocr engine modeconfigs
- pointer configurationconfigs_size
- pointer configuration size- Returns:
- 0 on success and -1 on initialization failure
-
TessBaseAPIInit2
int TessBaseAPIInit2(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language, int oem)
- Parameters:
handle
- the TesseractAPI instancedatapath
- Thedatapath
must be the name of the parent directory oftessdata
and must end in /. Any name after the last / will be stripped.language
- The language is (usually) anISO 639-3
string orNULL
will default toeng
. The language may be a string of the form [~]<lang>[+[~]<lang>] indicating that multiple languages are to be loaded. E.g.,hin+eng
will load Hindi and English.oem
- ocr engine mode- Returns:
- 0 on success and -1 on initialization failure
-
TessBaseAPIInit3
int TessBaseAPIInit3(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language)
- Parameters:
handle
- the TesseractAPI instancedatapath
- Thedatapath
must be the name of the parent directory oftessdata
and must end in /. Any name after the last / will be stripped.language
- The language is (usually) anISO 639-3
string orNULL
will default toeng
. The language may be a string of the form [~]<lang>[+[~]<lang>] indicating that multiple languages are to be loaded. E.g.,hin+eng
will load Hindi and English.- Returns:
- 0 on success and -1 on initialization failure
-
TessBaseAPIInit4
int TessBaseAPIInit4(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language, int oem, com.sun.jna.ptr.PointerByReference configs, int configs_size, com.sun.jna.ptr.PointerByReference vars_vec, com.sun.jna.ptr.PointerByReference vars_values, com.ochafik.lang.jnaerator.runtime.NativeSize vars_vec_size, int set_only_non_debug_params)
- Parameters:
handle
- the TesseractAPI instancedatapath
- Thedatapath
must be the name of the parent directory oftessdata
and must end in /. Any name after the last / will be stripped.language
- The language is (usually) anISO 639-3
string orNULL
will default toeng
. The language may be a string of the form [~]<lang>[+[~]<lang>] indicating that multiple languages are to be loaded. E.g.,hin+eng
will load Hindi and English.oem
- ocr engine modeconfigs
- pointer configurationconfigs_size
- pointer configuration sizevars_vec
-vars_values
-vars_vec_size
-set_only_non_debug_params
-- Returns:
- 0 on success and -1 on initialization failure
-
TessBaseAPIGetInitLanguagesAsString
java.lang.String TessBaseAPIGetInitLanguagesAsString(ITessAPI.TessBaseAPI handle)
Returns the languages string used in the last valid initialization. If the last initialization specified "deu+hin" then that will be returned. Ifhin
loadedeng
automatically as well, then that will not be included in this list. To find the languages actually loaded, useGetLoadedLanguagesAsVector
. The returned string should NOT be deleted.- Parameters:
handle
- the TesseractAPI instance- Returns:
- languages as string
-
TessBaseAPIGetLoadedLanguagesAsVector
com.sun.jna.ptr.PointerByReference TessBaseAPIGetLoadedLanguagesAsVector(ITessAPI.TessBaseAPI handle)
Returns the loaded languages in the vector of STRINGs. Includes all languages loaded by the lastInit
, including those loaded as dependencies of other loaded languages.- Parameters:
handle
- the TesseractAPI instance- Returns:
- loaded languages as vector
-
TessBaseAPIGetAvailableLanguagesAsVector
com.sun.jna.ptr.PointerByReference TessBaseAPIGetAvailableLanguagesAsVector(ITessAPI.TessBaseAPI handle)
Returns the available languages in the vector of STRINGs.- Parameters:
handle
- the TesseractAPI instance- Returns:
- available languages as vector
-
TessBaseAPIInitLangMod
int TessBaseAPIInitLangMod(ITessAPI.TessBaseAPI handle, java.lang.String datapath, java.lang.String language)
Init only the lang model component of Tesseract. The only functions that work after this init areSetVariable
andIsValidWord
. WARNING: temporary! This function will be removed from here and placed in a separate API at some future time.- Parameters:
handle
- the TesseractAPI instancedatapath
- Thedatapath
must be the name of the parent directory oftessdata
and must end in /. Any name after the last / will be stripped.language
- The language is (usually) anISO 639-3
string orNULL
will default to eng. The language may be a string of the form [~]<lang>[+[~]<lang>] indicating that multiple languages are to be loaded. E.g., hin+eng will load Hindi and English.- Returns:
- api init language mode
-
TessBaseAPIInitForAnalysePage
void TessBaseAPIInitForAnalysePage(ITessAPI.TessBaseAPI handle)
Init only for page layout analysis. Use only for calls toSetImage
andAnalysePage
. Calls that attempt recognition will generate an error.- Parameters:
handle
- the TesseractAPI instance
-
TessBaseAPIReadConfigFile
void TessBaseAPIReadConfigFile(ITessAPI.TessBaseAPI handle, java.lang.String filename, int init_only)
Read a "config" file containing a set of param, value pairs. Searches the standard places:tessdata/configs
,tessdata/tessconfigs
and also accepts a relative or absolute path name. Note: only non-init params will be set (init params are set byInit()
).- Parameters:
handle
- the TesseractAPI instancefilename
- relative or absolute path for the "config" file containing a set of param and value pairsinit_only
-
-
TessBaseAPISetPageSegMode
void TessBaseAPISetPageSegMode(ITessAPI.TessBaseAPI handle, int mode)
Set the current page segmentation mode. Defaults toPSM_SINGLE_BLOCK
. The mode is stored as an IntParam so it can also be modified byReadConfigFile
orSetVariable("tessedit_pageseg_mode", mode as string)
.- Parameters:
handle
- the TesseractAPI instancemode
- tesseract page segment mode
-
TessBaseAPIGetPageSegMode
int TessBaseAPIGetPageSegMode(ITessAPI.TessBaseAPI handle)
Return the current page segmentation mode.- Parameters:
handle
- the TesseractAPI instance- Returns:
- page segment mode value
-
TessBaseAPIRect
com.sun.jna.Pointer TessBaseAPIRect(ITessAPI.TessBaseAPI handle, java.nio.ByteBuffer imagedata, int bytes_per_pixel, int bytes_per_line, int left, int top, int width, int height)
Recognize a rectangle from an image and return the result as a string. May be called many times for a singleInit
. Currently has no error checking. Greyscale of 8 and color of 24 or 32 bits per pixel may be given. Palette color images will not work properly and must be converted to 24 bit. Binary images of 1 bit per pixel may also be given but they must be byte packed with the MSB of the first byte being the first pixel, and a 1 represents WHITE. For binary images set bytes_per_pixel=0. The recognized text is returned as a char* which is coded as UTF8 and must be freed with the delete [] operator.
Note thatTesseractRect
is the simplified convenience interface. For advanced uses, useSetImage
, (optionally)SetRectangle
,Recognize
, and one or more of theGet*Text
functions below.- Parameters:
handle
- the TesseractAPI instanceimagedata
- image byte bufferbytes_per_pixel
- bytes per pixelbytes_per_line
- bytes per lineleft
- image lefttop
- image topwidth
- image widthheight
- image height- Returns:
- the pointer to recognized text
-
TessBaseAPIClearAdaptiveClassifier
void TessBaseAPIClearAdaptiveClassifier(ITessAPI.TessBaseAPI handle)
Call between pages or documents etc to free up memory and forget adaptive data.- Parameters:
handle
- the TesseractAPI instance
-
TessBaseAPISetImage
void TessBaseAPISetImage(ITessAPI.TessBaseAPI handle, java.nio.ByteBuffer imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line)
Provide an image for Tesseract to recognize. Format is asTesseractRect
above. Does not copy the image buffer, or take ownership. The source image may be destroyed afterRecognize
is called, either explicitly or implicitly via one of theGet*Text
functions.SetImage
clears all recognition results, and sets the rectangle to the full image, so it may be followed immediately by aGetUTF8Text
, and it will automatically perform recognition.- Parameters:
handle
- the TesseractAPI instanceimagedata
- image byte bufferwidth
- image widthheight
- image heightbytes_per_pixel
- bytes per pixelbytes_per_line
- bytes per line
-
TessBaseAPISetImage2
void TessBaseAPISetImage2(ITessAPI.TessBaseAPI handle, net.sourceforge.lept4j.Pix pix)
Provide an image for Tesseract to recognize. As withSetImage
above, Tesseract doesn't take a copy or ownership orpixDestroy
the image, so it must persist until afterRecognize
.Pix
vs raw, which to use? UsePix
where possible. A future version of Tesseract may choose to usePix
as its internal representation and discardIMAGE
altogether. Because of that, an implementation that sources and targetsPix
may end up with less copies than an implementation that does not.- Parameters:
handle
- the TesseractAPI instancepix
- image
-
TessBaseAPISetSourceResolution
void TessBaseAPISetSourceResolution(ITessAPI.TessBaseAPI handle, int ppi)
Set the resolution of the source image in pixels per inch so font size information can be calculated in results. Call this afterSetImage()
.- Parameters:
handle
- the TesseractAPI instanceppi
- source resolution value
-
TessBaseAPISetRectangle
void TessBaseAPISetRectangle(ITessAPI.TessBaseAPI handle, int left, int top, int width, int height)
Restrict recognition to a sub-rectangle of the image. Call afterSetImage
. EachSetRectangle
clears the recognition results so multiple rectangles can be recognized with the same image.- Parameters:
handle
- the TesseractAPI instanceleft
- valuetop
- valuewidth
- valueheight
- value
-
TessBaseAPIGetThresholdedImage
net.sourceforge.lept4j.Pix TessBaseAPIGetThresholdedImage(ITessAPI.TessBaseAPI handle)
ONLY available afterSetImage
if you have Leptonica installed. Get a copy of the internal thresholded image from Tesseract.- Parameters:
handle
- the TesseractAPI instance- Returns:
- internal thresholded image
-
TessBaseAPIGetRegions
net.sourceforge.lept4j.Boxa TessBaseAPIGetRegions(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa)
Get the result of page layout analysis as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
.- Parameters:
handle
- the TesseractAPI instancepixa
- array of Pix- Returns:
- array of Box
-
TessBaseAPIGetTextlines
net.sourceforge.lept4j.Boxa TessBaseAPIGetTextlines(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids)
Get the textlines as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
. Ifblockids
is notNULL
, the block-id of each line is also returned as an array of one element per line. delete [] after use. Ifparaids
is notNULL
, the paragraph-id of each line within its block is also returned as an array of one element per line. delete [] after use.
Helper method to extract from the thresholded image (most common usage).- Parameters:
handle
- the TesseractAPI instancepixa
- array of Pixblockids
-- Returns:
- array of Box
-
TessBaseAPIGetTextlines1
net.sourceforge.lept4j.Boxa TessBaseAPIGetTextlines1(ITessAPI.TessBaseAPI handle, int raw_image, int raw_padding, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids, com.sun.jna.ptr.PointerByReference paraids)
Get the textlines as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
. Ifblockids
is notNULL
, the block-id of each line is also returned as an array of one element per line. delete [] after use. Ifparaids
is notNULL
, the paragraph-id of each line within its block is also returned as an array of one element per line. delete [] after use.- Parameters:
handle
- the TesseractAPI instanceraw_image
-raw_padding
-pixa
- array of Pixblockids
-paraids
-- Returns:
- array of Box
-
TessBaseAPIGetStrips
net.sourceforge.lept4j.Boxa TessBaseAPIGetStrips(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids)
Get textlines and strips of image regions as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Enables downstream handling of non-rectangular regions. Can be called before or afterRecognize
. Ifblockids
is not NULL, the block-id of each line is also returned as an array of one element per line. delete [] after use.- Parameters:
handle
- the TesseractAPI instancepixa
- array of Pixblockids
-- Returns:
- array of Box
-
TessBaseAPIGetWords
net.sourceforge.lept4j.Boxa TessBaseAPIGetWords(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference pixa)
Get the words as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
.- Parameters:
handle
- the TesseractAPI instancepixa
- array of Pix- Returns:
- array of Box
-
TessBaseAPIGetConnectedComponents
net.sourceforge.lept4j.Boxa TessBaseAPIGetConnectedComponents(ITessAPI.TessBaseAPI handle, com.sun.jna.ptr.PointerByReference cc)
Gets the individual connected (text) components (created after pages segmentation step, but before recognition) as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
.- Parameters:
handle
- the TesseractAPI instancecc
- array of Pix- Returns:
- array of Box
-
TessBaseAPIGetComponentImages
net.sourceforge.lept4j.Boxa TessBaseAPIGetComponentImages(ITessAPI.TessBaseAPI handle, int level, int text_only, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids)
Get the given level kind of components (block, textline, word etc.) as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
. Ifblockids
is notNULL
, the block-id of each component is also returned as an array of one element per component. delete [] after use. Iftext_only
is true, then only text components are returned. Helper function to get binary images with no padding (most common usage).- Parameters:
handle
- the TesseractAPI instancelevel
- PageIteratorLeveltext_only
-pixa
- array of Pixblockids
-- Returns:
- array of Box
-
TessBaseAPIGetComponentImages1
net.sourceforge.lept4j.Boxa TessBaseAPIGetComponentImages1(ITessAPI.TessBaseAPI handle, int level, int text_only, int raw_image, int raw_padding, com.sun.jna.ptr.PointerByReference pixa, com.sun.jna.ptr.PointerByReference blockids, com.sun.jna.ptr.PointerByReference paraids)
Get the given level kind of components (block, textline, word etc.) as a Leptonica-styleBoxa
,Pixa
pair, in reading order. Can be called before or afterRecognize
. Ifblockids
is notNULL
, the block-id of each component is also returned as an array of one element per component. delete [] after use. Ifparaids
is notNULL
, the paragraph-id of each component with its block is also returned as an array of one element per component. delete [] after use. Ifraw_image
is true, then portions of the original image are extracted instead of the thresholded image and padded with raw_padding. Iftext_only
is true, then only text components are returned.- Parameters:
handle
- the TesseractAPI instancelevel
- PageIteratorLeveltext_only
-raw_image
-raw_padding
-pixa
- array of Pixblockids
-paraids
-- Returns:
-
TessBaseAPIGetThresholdedImageScaleFactor
int TessBaseAPIGetThresholdedImageScaleFactor(ITessAPI.TessBaseAPI handle)
- Parameters:
handle
- the TesseractAPI instance- Returns:
- Scale factor from original image.
-
TessBaseAPIAnalyseLayout
ITessAPI.TessPageIterator TessBaseAPIAnalyseLayout(ITessAPI.TessBaseAPI handle)
Runs page layout analysis in the mode set bySetPageSegMode
. May optionally be called prior toRecognize
to get access to just the page layout results. Returns an iterator to the results. ReturnsNULL
on error. The returned iterator must be deleted after use. WARNING! This class points to data held within theTessBaseAPI
class, and therefore can only be used while theTessBaseAPI
class still exists and has not been subjected to a call ofInit
,SetImage
,Recognize
,Clear
,End
, DetectOS, or anything else that changes the internalPAGE_RES
.- Parameters:
handle
- the TesseractAPI instance- Returns:
- returns an iterator to the results. Returns NULL on error. The returned iterator must be deleted after use.
-
TessBaseAPIRecognize
int TessBaseAPIRecognize(ITessAPI.TessBaseAPI handle, ITessAPI.ETEXT_DESC monitor)
Recognize the image fromSetAndThresholdImage
, generating Tesseract internal structures. Returns 0 on success. Optional. TheGet*Text
functions below will callRecognize
if needed. AfterRecognize
, the output is kept internally until the nextSetImage
.- Parameters:
handle
- the TesseractAPI instancemonitor
- the result as Tesseract internal structures- Returns:
- 0 on success
-
TessBaseAPIRecognizeForChopTest
int TessBaseAPIRecognizeForChopTest(ITessAPI.TessBaseAPI handle, ITessAPI.ETEXT_DESC monitor)
Variant onRecognize
used for testing chopper.- Parameters:
handle
- the TesseractAPI instancemonitor
- the result as Tesseract internal structures- Returns:
- 0 on success
-
TessBaseAPIGetIterator
ITessAPI.TessResultIterator TessBaseAPIGetIterator(ITessAPI.TessBaseAPI handle)
Get a reading-order iterator to the results of LayoutAnalysis and/orRecognize
. The returned iterator must be deleted after use. WARNING! This class points to data held within theTessBaseAPI
class, and therefore can only be used while theTessBaseAPI
class still exists and has not been subjected to a call ofInit
,SetImage
,Recognize
,Clear
,End
, DetectOS, or anything else that changes the internal PAGE_RES.- Parameters:
handle
- the TesseractAPI instance- Returns:
- the result iterator
-
TessBaseAPIGetMutableIterator
ITessAPI.TessMutableIterator TessBaseAPIGetMutableIterator(ITessAPI.TessBaseAPI handle)
Get a mutable iterator to the results of LayoutAnalysis and/orRecognize
. The returned iterator must be deleted after use. WARNING! This class points to data held within theTessBaseAPI
class, and therefore can only be used while theTessBaseAPI
class still exists and has not been subjected to a call ofInit
,SetImage
,Recognize
,Clear
,End
, DetectOS, or anything else that changes the internalPAGE_RES
.- Parameters:
handle
- the TesseractAPI instance- Returns:
- the mutable iterator
-
TessBaseAPIProcessPages
int TessBaseAPIProcessPages(ITessAPI.TessBaseAPI handle, java.lang.String filename, java.lang.String retry_config, int timeout_millisec, ITessAPI.TessResultRenderer renderer)
Recognizes all the pages in the named file, as a multi-page tiff or list of filenames, or single image, and gets the appropriate kind of text according to parameters:tessedit_create_boxfile
,tessedit_make_boxes_from_boxes
,tessedit_write_unlv
,tessedit_create_hocr
. Calls ProcessPage on each page in the input file, which may be a multi-page tiff, single-page other file format, or a plain text list of images to read. If tessedit_page_number is non-negative, processing begins at that page of a multi-page tiff file, or filelist. The text is returned in text_out. Returns false on error. If non-zero timeout_millisec terminates processing after the timeout on a single page. If non-NULL and non-empty, and some page fails for some reason, the page is reprocessed with the retry_config config file. Useful for interactively debugging a bad page.- Parameters:
handle
- the TesseractAPI instancefilename
- multi-page tiff or list of filenamesretry_config
- retry config valuestimeout_millisec
- timeout valuerenderer
- result renderer- Returns:
- the status
-
TessBaseAPIProcessPage
int TessBaseAPIProcessPage(ITessAPI.TessBaseAPI handle, net.sourceforge.lept4j.Pix pix, int page_index, java.lang.String filename, java.lang.String retry_config, int timeout_millisec, ITessAPI.TessResultRenderer renderer)
-
TessBaseAPIGetUTF8Text
com.sun.jna.Pointer TessBaseAPIGetUTF8Text(ITessAPI.TessBaseAPI handle)
The recognized text is returned as a char* which is coded as UTF-8 and must be freed with the delete [] operator.- Parameters:
handle
- the TesseractAPI instance- Returns:
- the pointer to output text
-
TessBaseAPIGetHOCRText
com.sun.jna.Pointer TessBaseAPIGetHOCRText(ITessAPI.TessBaseAPI handle, int page_number)
Make a HTML-formatted string with hOCR markup from the internal data structures. page_number is 0-based but will appear in the output as 1-based.- Parameters:
handle
- the TesseractAPI instancepage_number
- page number- Returns:
- the pointer to hOCR text
-
TessBaseAPIGetAltoText
com.sun.jna.Pointer TessBaseAPIGetAltoText(ITessAPI.TessBaseAPI handle, int page_number)
Make an XML-formatted string with Alto markup from the internal data structures.- Parameters:
handle
- the TesseractAPI instancepage_number
- page number- Returns:
- the pointer to Alto text
-
TessBaseAPIGetTsvText
com.sun.jna.Pointer TessBaseAPIGetTsvText(ITessAPI.TessBaseAPI handle, int page_number)
Make a TSV-formatted string from the internal data structures. page_number is 0-based but will appear in the output as 1-based. Returned string must be freed with the delete [] operator.- Parameters:
handle
- the TesseractAPI instancepage_number
- page number- Returns:
- the pointer to TSV text
-
TessBaseAPIGetBoxText
com.sun.jna.Pointer TessBaseAPIGetBoxText(ITessAPI.TessBaseAPI handle, int page_number)
The recognized text is returned as a char* which is coded as a UTF8 box file and must be freed with the delete [] operator. page_number is a 0-base page index that will appear in the box file.- Parameters:
handle
- the TesseractAPI instancepage_number
- number of the page- Returns:
- the pointer to box text
-
TessBaseAPIGetLSTMBoxText
com.sun.jna.Pointer TessBaseAPIGetLSTMBoxText(ITessAPI.TessBaseAPI handle, int page_number)
Create a UTF8 box file for LSTM training from the internal data structures. page_number is a 0-base page index that will appear in the box file. Returned string must be freed with the delete [] operator.- Parameters:
handle
- the TesseractAPI instancepage_number
- page number- Returns:
- the pointer to LSTM Box text
-
TessBaseAPIGetWordStrBoxText
com.sun.jna.Pointer TessBaseAPIGetWordStrBoxText(ITessAPI.TessBaseAPI handle, int page_number)
Create a UTF8 box file with WordStr strings from the internal data structures. page_number is a 0-base page index that will appear in the box file. Returned string must be freed with the delete [] operator.- Parameters:
handle
- the TesseractAPI instancepage_number
- page number- Returns:
- the pointer to WordStr Box text
-
TessBaseAPIGetUNLVText
com.sun.jna.Pointer TessBaseAPIGetUNLVText(ITessAPI.TessBaseAPI handle)
The recognized text is returned as a char* which is coded as UNLV format Latin-1 with specific reject and suspect codes and must be freed with the delete [] operator.- Parameters:
handle
- the TesseractAPI instance- Returns:
- the pointer to UNLV text
-
TessBaseAPIMeanTextConf
int TessBaseAPIMeanTextConf(ITessAPI.TessBaseAPI handle)
Returns the average word confidence for Tesseract page result.- Parameters:
handle
- the TesseractAPI instance- Returns:
- the (average) confidence value between 0 and 100.
-
TessBaseAPIAllWordConfidences
com.sun.jna.ptr.IntByReference TessBaseAPIAllWordConfidences(ITessAPI.TessBaseAPI handle)
Returns an array of all word confidences, terminated by -1. The calling function must delete [] after use. The number of confidences should correspond to the number of space-delimited words inGetUTF8Text
.- Parameters:
handle
- the TesseractAPI instance- Returns:
- all word confidences (between 0 and 100) in an array, terminated by -1
-
TessBaseAPIAdaptToWordStr
int TessBaseAPIAdaptToWordStr(ITessAPI.TessBaseAPI handle, int mode, java.lang.String wordstr)
Applies the given word to the adaptive classifier if possible. The word must be SPACE-DELIMITED UTF-8 - l i k e t h i s , so it can tell the boundaries of the graphemes. Assumes thatSetImage
/SetRectangle
have been used to set the image to the given word. The mode arg should bePSM_SINGLE_WORD
orPSM_CIRCLE_WORD
, as that will be used to control layout analysis. The currently set PageSegMode is preserved.- Parameters:
handle
- the TesseractAPI instancemode
- tesseract page segment modewordstr
- The word must be SPACE-DELIMITED UTF-8 - l i k e t h i s , so it can tell the boundaries of the graphemes.- Returns:
- false if adaption was not possible for some reason.
-
TessBaseAPIClear
void TessBaseAPIClear(ITessAPI.TessBaseAPI handle)
Free up recognition results and any stored image data, without actually freeing any recognition data that would be time-consuming to reload. Afterwards, you must callSetImage
orTesseractRect
before doing anyRecognize
orGet*
operation.- Parameters:
handle
- the TesseractAPI instance
-
TessBaseAPIEnd
void TessBaseAPIEnd(ITessAPI.TessBaseAPI handle)
Close down tesseract and free up all memory.End()
is equivalent to destructing and reconstructing your TessBaseAPI. OnceEnd()
has been used, none of the other API functions may be used other thanInit
and anything declared above it in the class definition.- Parameters:
handle
- the TesseractAPI instance
-
TessBaseAPIIsValidWord
int TessBaseAPIIsValidWord(ITessAPI.TessBaseAPI handle, java.lang.String word)
Check whether a word is valid according to Tesseract's language model.- Parameters:
handle
- the TesseractAPI instanceword
- word value- Returns:
- 0 if the word is invalid, non-zero if valid
-
TessBaseAPIGetTextDirection
int TessBaseAPIGetTextDirection(ITessAPI.TessBaseAPI handle, java.nio.IntBuffer out_offset, java.nio.FloatBuffer out_slope)
Gets text direction.- Parameters:
handle
- the TesseractAPI instanceout_offset
- offsetout_slope
- slope- Returns:
- TRUE if text direction is valid
-
TessBaseAPIClearPersistentCache
void TessBaseAPIClearPersistentCache(ITessAPI.TessBaseAPI handle)
Clear any library-level memory caches. There are a variety of expensive-to-load constant data structures (mostly language dictionaries) that are cached globally -- surviving theInit()
andEnd()
of individual TessBaseAPI's. This function allows the clearing of these caches.- Parameters:
handle
- the TesseractAPI instance
-
TessBaseAPIDetectOrientationScript
int TessBaseAPIDetectOrientationScript(ITessAPI.TessBaseAPI handle, java.nio.IntBuffer orient_deg, java.nio.FloatBuffer orient_conf, com.sun.jna.ptr.PointerByReference script_name, java.nio.FloatBuffer script_conf)
Detect the orientation of the input image and apparent script (alphabet).orient_deg
is the detected clockwise rotation of the input image in degrees (0, 90, 180, 270);orient_conf
is the confidence (15.0 is reasonably confident);script_name
is an ASCII string, the name of the script, e.g. "Latin";script_conf
is confidence level in the script.- Returns:
- TRUE on success and writes values to each parameter as an output
-
TessBaseAPIGetUnichar
java.lang.String TessBaseAPIGetUnichar(ITessAPI.TessBaseAPI handle, int unichar_id)
Gets the string of the specified unichar.- Parameters:
handle
- the TesseractAPI instanceunichar_id
- the unichar id- Returns:
- the string form of the specified unichar.
-
TessPageIteratorDelete
void TessPageIteratorDelete(ITessAPI.TessPageIterator handle)
Deletes the specified PageIterator instance.- Parameters:
handle
- the TessPageIterator instance
-
TessPageIteratorCopy
ITessAPI.TessPageIterator TessPageIteratorCopy(ITessAPI.TessPageIterator handle)
Creates a copy of the specified PageIterator instance.- Parameters:
handle
- the TessPageIterator instance- Returns:
- page iterator copy
-
TessPageIteratorBegin
void TessPageIteratorBegin(ITessAPI.TessPageIterator handle)
Resets the iterator to point to the start of the page.- Parameters:
handle
- the TessPageIterator instance
-
TessPageIteratorNext
int TessPageIteratorNext(ITessAPI.TessPageIterator handle, int level)
Moves to the start of the next object at the given level in the page hierarchy, and returns false if the end of the page was reached. NOTE (CHANGED!) that ALL PageIteratorLevel level values will visit each non-text block at least once.
Think of non text blocks as containing a single para, with at least one line, with a single imaginary word, containing a single symbol. The bounding boxes mark out any polygonal nature of the block, andPTIsTextType(BLockType())
is false for non-text blocks.
Calls to Next with different levels may be freely intermixed. This function iterates words in right-to-left scripts correctly, if the appropriate language has been loaded into Tesseract.- Parameters:
handle
- the TessPageIterator instancelevel
- tesseract page level- Returns:
- next iterator object
-
TessPageIteratorIsAtBeginningOf
int TessPageIteratorIsAtBeginningOf(ITessAPI.TessPageIterator handle, int level)
Returns TRUE if the iterator is at the start of an object at the given level. Possible uses include determining if a call to Next(RIL_WORD) moved to the start of a RIL_PARA.- Parameters:
handle
- the TessPageIterator instancelevel
- tesseract page level- Returns:
- 1 if true
-
TessPageIteratorIsAtFinalElement
int TessPageIteratorIsAtFinalElement(ITessAPI.TessPageIterator handle, int level, int element)
Returns whether the iterator is positioned at the last element in a given level. (e.g. the last word in a line, the last line in a block).- Parameters:
handle
- the TessPageIterator instancelevel
- tesseract page levelelement
- page iterator level- Returns:
- 1 if true
-
TessPageIteratorBoundingBox
int TessPageIteratorBoundingBox(ITessAPI.TessPageIterator handle, int level, java.nio.IntBuffer left, java.nio.IntBuffer top, java.nio.IntBuffer right, java.nio.IntBuffer bottom)
Returns the bounding rectangle of the current object at the given level in coordinates of the original image.- Parameters:
handle
- the TessPageIterator instancelevel
- tesseract page levelleft
- int buffer positiontop
- int buffer positionright
- int buffer positionbottom
- int buffer position- Returns:
- FALSE if there is no such object at the current position
-
TessPageIteratorBlockType
int TessPageIteratorBlockType(ITessAPI.TessPageIterator handle)
Returns the type of the current block.- Parameters:
handle
- the TessPageIterator instance- Returns:
- TessPolyBlockType value
-
TessPageIteratorGetBinaryImage
net.sourceforge.lept4j.Pix TessPageIteratorGetBinaryImage(ITessAPI.TessPageIterator handle, int level)
Returns a binary image of the current object at the given level. The position and size match the return from BoundingBoxInternal, and so this could be upscaled with respect to the original input image. UsepixDestroy
to delete the image after use. The following methods are used to generate the images:RIL_BLOCK
: mask the page image with the block polygon.RIL_TEXTLINE
: Clip the rectangle of the line box from the page image. TODO(rays) fix this to generate and use a line polygon.RIL_WORD
: Clip the rectangle of the word box from the page image.RIL_SYMBOL
: Render the symbol outline to an image for cblobs (prior to recognition) or the bounding box otherwise. A reconstruction of the original image (using xor to check for double representation) should be reasonably accurate, apart from removed noise, at the block level. Below the block level, the reconstruction will be missing images and line separators. At the symbol level, kerned characters will be invade the bounding box if rendered after recognition, making an xor reconstruction inaccurate, but an or construction better. Before recognition, symbol-level reconstruction should be good, even with xor, since the images come from the connected components.- Parameters:
handle
- the TessPageIterator instancelevel
- PageIteratorLevel- Returns:
-
TessPageIteratorGetImage
net.sourceforge.lept4j.Pix TessPageIteratorGetImage(ITessAPI.TessPageIterator handle, int level, int padding, net.sourceforge.lept4j.Pix original_image, java.nio.IntBuffer left, java.nio.IntBuffer top)
Returns an image of the current object at the given level in greyscale if available in the input. To guarantee a binary image use BinaryImage. NOTE that in order to give the best possible image, the bounds are expanded slightly over the binary connected component, by the supplied padding, so the top-left position of the returned image is returned in (left,top). These will most likely not match the coordinates returned by BoundingBox. If you do not supply an original image, you will get a binary one. UsepixDestroy
to delete the image after use.- Parameters:
handle
- the TessPageIterator instancelevel
- PageIteratorLevelpadding
-original_image
-left
-top
-- Returns:
-
TessPageIteratorBaseline
int TessPageIteratorBaseline(ITessAPI.TessPageIterator handle, int level, java.nio.IntBuffer x1, java.nio.IntBuffer y1, java.nio.IntBuffer x2, java.nio.IntBuffer y2)
Returns the baseline of the current object at the given level. The baseline is the line that passes through (x1, y1) and (x2, y2).
WARNING: with vertical text, baselines may be vertical!- Parameters:
handle
- the TessPageIterator instancelevel
- PageIteratorLevelx1
- int buffer positiony1
- int buffer positionx2
- int buffer positiony2
- int buffer position- Returns:
- TRUE if the baseline is valid
-
TessPageIteratorOrientation
void TessPageIteratorOrientation(ITessAPI.TessPageIterator handle, java.nio.IntBuffer orientation, java.nio.IntBuffer writing_direction, java.nio.IntBuffer textline_order, java.nio.FloatBuffer deskew_angle)
Returns the orientation.- Parameters:
handle
- the TessPageIterator instanceorientation
- orientation valuewriting_direction
- writing direction valuetextline_order
- text line orderdeskew_angle
- deskew angle
-
TessPageIteratorParagraphInfo
void TessPageIteratorParagraphInfo(ITessAPI.TessPageIterator handle, java.nio.IntBuffer justification, java.nio.IntBuffer is_list_item, java.nio.IntBuffer is_crown, java.nio.IntBuffer first_line_indent)
Gets paragraph information.- Parameters:
handle
- the TessPageIterator instancejustification
- justification typeis_list_item
- list itemis_crown
- very first or continuationfirst_line_indent
- first line indentation
-
TessResultIteratorDelete
void TessResultIteratorDelete(ITessAPI.TessResultIterator handle)
Deletes the specified ResultIterator handle.- Parameters:
handle
- the TessResultIterator instance
-
TessResultIteratorCopy
ITessAPI.TessResultIterator TessResultIteratorCopy(ITessAPI.TessResultIterator handle)
Creates a copy of the specified ResultIterator instance.- Parameters:
handle
- the TessResultIterator instance- Returns:
- the copy object
-
TessResultIteratorGetPageIterator
ITessAPI.TessPageIterator TessResultIteratorGetPageIterator(ITessAPI.TessResultIterator handle)
Gets the PageIterator of the specified ResultIterator instance.- Parameters:
handle
- the TessResultIterator instance- Returns:
- the page iterator
-
TessResultIteratorGetPageIteratorConst
ITessAPI.TessPageIterator TessResultIteratorGetPageIteratorConst(ITessAPI.TessResultIterator handle)
Gets the PageIterator of the specified ResultIterator instance.- Parameters:
handle
- the TessResultIterator instance- Returns:
- the page iterator constant
-
TessResultIteratorNext
int TessResultIteratorNext(ITessAPI.TessResultIterator handle, int level)
-
TessResultIteratorGetUTF8Text
com.sun.jna.Pointer TessResultIteratorGetUTF8Text(ITessAPI.TessResultIterator handle, int level)
Returns the null terminated UTF-8 encoded text string for the current object at the given level. Use delete [] to free after use.- Parameters:
handle
- the TessResultIterator instancelevel
- tesseract page level- Returns:
- the pointer to recognized text
-
TessResultIteratorConfidence
float TessResultIteratorConfidence(ITessAPI.TessResultIterator handle, int level)
Returns the mean confidence of the current object at the given level. The number should be interpreted as a percent probability (0.0f-100.0f).- Parameters:
handle
- the TessResultIterator instancelevel
- tesseract page level- Returns:
- confidence value
-
TessResultIteratorWordRecognitionLanguage
java.lang.String TessResultIteratorWordRecognitionLanguage(ITessAPI.TessResultIterator handle)
-
TessResultIteratorWordFontAttributes
java.lang.String TessResultIteratorWordFontAttributes(ITessAPI.TessResultIterator handle, java.nio.IntBuffer is_bold, java.nio.IntBuffer is_italic, java.nio.IntBuffer is_underlined, java.nio.IntBuffer is_monospace, java.nio.IntBuffer is_serif, java.nio.IntBuffer is_smallcaps, java.nio.IntBuffer pointsize, java.nio.IntBuffer font_id)
Returns the font attributes of the current word. If iterating at a higher level object than words, e.g., textlines, then this will return the attributes of the first word in that textline. The actual return value is a string representing a font name. It points to an internal table and SHOULD NOT BE DELETED. Lifespan is the same as the iterator itself, ie rendered invalid by various members of TessBaseAPI, includingInit
,SetImage
,End
or deleting the TessBaseAPI. Pointsize is returned in printers points (1/72 inch).- Parameters:
handle
- the TessResultIterator instanceis_bold
- font attributeis_italic
- font attributeis_underlined
- font attributeis_monospace
- font attributeis_serif
- font attributeis_smallcaps
- font attributepointsize
- font attributefont_id
- font attribute- Returns:
- font name
-
TessResultIteratorWordIsFromDictionary
int TessResultIteratorWordIsFromDictionary(ITessAPI.TessResultIterator handle)
Returns TRUE if the current word was found in a dictionary.- Parameters:
handle
- the TessResultIterator instance- Returns:
- 1 if word is from dictionary
-
TessResultIteratorWordIsNumeric
int TessResultIteratorWordIsNumeric(ITessAPI.TessResultIterator handle)
Returns TRUE if the current word is numeric.- Parameters:
handle
- the TessResultIterator instance- Returns:
- 1 if word is numeric
-
TessResultIteratorSymbolIsSuperscript
int TessResultIteratorSymbolIsSuperscript(ITessAPI.TessResultIterator handle)
Returns TRUE if the current symbol is a superscript. If iterating at a higher level object than symbols, e.g., words, then this will return the attributes of the first symbol in that word.- Parameters:
handle
- the TessResultIterator instance- Returns:
- 1 if symbol is superscript
-
TessResultIteratorSymbolIsSubscript
int TessResultIteratorSymbolIsSubscript(ITessAPI.TessResultIterator handle)
Returns TRUE if the current symbol is a subscript. If iterating at a higher level object than symbols, e.g., words, then this will return the attributes of the first symbol in that word.- Parameters:
handle
- the TessResultIterator instance- Returns:
- 1 if symbol is subscript
-
TessResultIteratorSymbolIsDropcap
int TessResultIteratorSymbolIsDropcap(ITessAPI.TessResultIterator handle)
Returns TRUE if the current symbol is a dropcap. If iterating at a higher level object than symbols, e.g., words, then this will return the attributes of the first symbol in that word.- Parameters:
handle
- the TessResultIterator instance- Returns:
- 1 if symbol is dropcap
-
TessResultIteratorGetChoiceIterator
ITessAPI.TessChoiceIterator TessResultIteratorGetChoiceIterator(ITessAPI.TessResultIterator handle)
-
TessChoiceIteratorDelete
void TessChoiceIteratorDelete(ITessAPI.TessChoiceIterator handle)
-
TessChoiceIteratorNext
int TessChoiceIteratorNext(ITessAPI.TessChoiceIterator handle)
-
TessChoiceIteratorGetUTF8Text
java.lang.String TessChoiceIteratorGetUTF8Text(ITessAPI.TessChoiceIterator handle)
-
TessChoiceIteratorConfidence
float TessChoiceIteratorConfidence(ITessAPI.TessChoiceIterator handle)
-
TessMonitorCreate
ITessAPI.ETEXT_DESC TessMonitorCreate()
-
TessMonitorDelete
void TessMonitorDelete(ITessAPI.ETEXT_DESC monitor)
-
TessMonitorSetCancelFunc
void TessMonitorSetCancelFunc(ITessAPI.ETEXT_DESC monitor, ITessAPI.TessCancelFunc cancelFunc)
-
TessMonitorSetCancelThis
void TessMonitorSetCancelThis(ITessAPI.ETEXT_DESC monitor, com.sun.jna.Pointer cancelThis)
-
TessMonitorGetCancelThis
com.sun.jna.Pointer TessMonitorGetCancelThis(ITessAPI.ETEXT_DESC monitor)
-
TessMonitorSetProgressFunc
void TessMonitorSetProgressFunc(ITessAPI.ETEXT_DESC monitor, ITessAPI.TessProgressFunc progressFunc)
-
TessMonitorGetProgress
int TessMonitorGetProgress(ITessAPI.ETEXT_DESC monitor)
-
TessMonitorSetDeadlineMSecs
void TessMonitorSetDeadlineMSecs(ITessAPI.ETEXT_DESC monitor, int deadline)
-
-