NAME
tesseract - command line OCR tool
SYNOPSIS
tesseract imagename outputbase [configfile] [-l <langid>]
DESCRIPTION
This manual page documents briefly the tesseract command.
tesseract is a commercial quality OCR engine originally developed at HP
between 1985 and 1995. In 1995, this engine was among the top 3
evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.
OPTIONS
imagename must be a TIF image with a .tif extension.
outputbase is the text file created with the OCR output
configfile is a file of control parameters used for debugging or
modifying tesseract’s behaviour. They are stored in
/usr/share/tesseract-ocr/tessdata/configs/
The -l <langid> option must come last. At the time of writing, there
are language packages available for English (eng), German (deu), German
fraktur (deu-f), French (fra), Italian (ita), Dutch (nld), Portuguese
(por), Spanish (spa), and Vietnamese (vie).
SEE ALSO
feh(1), convert(1), mftraining(1), cntraining(1),
unicharset_extractor(1), wordlist2dawg(1).
AUTHOR
tesseract was written by Ray Smith.
This manual page was written by Jeffrey Ratcliffe
<Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be used
by others).
December 16, 2009