All Classes and Interfaces
Class
Description
An interface represents common TessAPI classes/constants.
Callback for
cancel_func
.It should be noted that the format for char_code for version 2.0 and
beyond is UTF-8, which means that ASCII characters will come out as one
structure but other characters will be returned in two or more instances
of this structure with a single byte of the UTF-8 code in each, but each
will have the same bounding box.
Programs which want to handle languages with different characters sets will need to handle extended characters appropriately, but all code needs to be prepared to receive UTF-8 coded characters for characters such as bullet and fancy quotes.
Programs which want to handle languages with different characters sets will need to handle extended characters appropriately, but all code needs to be prepared to receive UTF-8 coded characters for characters such as bullet and fancy quotes.
Description of the output of the OCR engine.
Base class for all tesseract APIs.
MutableIterator adds access to internal data structures.
When Tesseract/Cube is initialized we can choose to instantiate/load/run
only the Tesseract part, only the Cube part or both along with the
combiner.
+------------------+
| 1 Aaaa Aaaa Aaaa |
| Aaa aa aaa aa |
| aaaaaa A aa aaa.
Class to iterate over tesseract page structure, providing access to all
levels of the page hierarchy, without including any tesseract headers or
having to handle any tesseract structures.
WARNING! This class points to data held within the TessBaseAPI class, and therefore can only be used while the TessBaseAPI class still exists and has not been subjected to a call of
WARNING! This class points to data held within the TessBaseAPI class, and therefore can only be used while the TessBaseAPI class still exists and has not been subjected to a call of
Init
,
SetImage
, Recognize
, Clear
,
End
DetectOS
, or anything else that changes the
internal PAGE_RES
.Enum of the elements of the page hierarchy, used in
ResultIterator
to provide functions that operate on each
level without having to have 5x as many functions.Possible modes for page layout analysis.
NOTA BENE: Fully justified paragraphs (text aligned to both left and
right margins) are marked by Tesseract with JUSTIFICATION_LEFT if their
text is written with a left-to-right script and with JUSTIFICATION_RIGHT
if their text is written in a right-to-left script.
Interpretation for text read in vertical lines: "Left" is wherever the starting reading position is.
Interpretation for text read in vertical lines: "Left" is wherever the starting reading position is.
Possible types for a POLY_BLOCK or ColPartition.
Iterator for tesseract results that is capable of iterating in proper
reading order over Bi Directional (e.g.
Interface for rendering tesseract results into a document, such as text,
HOCR or pdf.
The text lines are read in the given sequence.
In English, the order is top-to-bottom.
In English, the order is top-to-bottom.
The grapheme clusters within a line of text are laid out logically in
this direction, judged when looking at the text line rotated so that its
Orientation is "page up".
For English text, the writing direction is left-to-right.
For English text, the writing direction is left-to-right.
An interface represents common OCR methods.
Rendered formats supported by Tesseract.
Loads native libraries from JAR or project folder.
Helper for logging.
Encapsulates Tesseract OCR results at file level.
PDF utilities based on PDFBox.
PDF utilities based on Ghostscript.
PDF utilities based on Ghostscript or PDFBox with Ghostscript as default.
A Java wrapper for
Tesseract OCR 4.1.0 API
using
JNA Interface Mapping
.A Java wrapper for
Tesseract OCR 4.1.0 API
using
JNA Direct Mapping
.An object layer on top of
TessAPI
, provides character
recognition support for common image formats, and multi-page TIFF images
beyond the uncompressed, binary TIFF format supported by Tesseract OCR
engine.An object layer on top of
TessAPI1
, provides character
recognition support for common image formats, and multi-page TIFF images
beyond the uncompressed, binary TIFF format supported by Tesseract OCR
engine.Encapsulates Tesseract OCR results at certain page iterator level.