tesseract - command line OCR tool

NAME

       tesseract - command line OCR tool

SYNOPSIS

       Part  of  the  process  to train tesseract for a new language. When the
       character features of all the training pages have  been  extracted,  we
       need  to  cluster  them  to  create the prototypes. The character shape
       features can be clustered using the mftraining and cntraining programs:

       cntraining fontfile_1.tr fontfile_2.tr ...

       This  will  output the normproto data file (the character normalization
       sensitivity prototypes).

DESCRIPTION

       This manual page documents briefly the cntraining command.

       tesseract is a commercial quality OCR engine originally developed at HP
       between  1985  and  1995.  In  1995,  this  engine  was among the top 3
       evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

AUTHOR

       tesseract was written by Ray Smith.

       This     manual    page    was    written    by    Jeffrey    Ratcliffe
       <Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be  used
       by others).

                                August 21, 2007

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR