Man Linux: Main Page and Category List

NAME

       tesseract - command line OCR tool

SYNOPSIS

       tesseract imagename outputbase [configfile] [-l <langid>]

DESCRIPTION

       This manual page documents briefly the tesseract command.

       tesseract is a commercial quality OCR engine originally developed at HP
       between 1985 and 1995. In  1995,  this  engine  was  among  the  top  3
       evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

OPTIONS

       imagename must be a TIF image with a .tif extension.

       outputbase is the text file created with the OCR output

       configfile  is  a  file  of  control  parameters  used for debugging or
       modifying    tesseract’s    behaviour.     They    are    stored     in
       /usr/share/tesseract-ocr/tessdata/configs/

       The  -l  <langid>  option must come last. At the time of writing, there
       are language packages available for English (eng), German (deu), German
       fraktur  (deu-f),  French (fra), Italian (ita), Dutch (nld), Portuguese
       (por), Spanish (spa), and Vietnamese (vie).

SEE ALSO

       feh(1),        convert(1),        mftraining(1),         cntraining(1),
       unicharset_extractor(1), wordlist2dawg(1).

AUTHOR

       tesseract was written by Ray Smith.

       This     manual    page    was    written    by    Jeffrey    Ratcliffe
       <Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be  used
       by others).

                               December 16, 2009