Man Linux: Main Page and Category List


       tesseract - command line OCR tool


       Part  of  the  process  to train tesseract for a new language. When the
       character features of all the training pages have  been  extracted,  we
       need  to  cluster  them  to  create the prototypes. The character shape
       features can be clustered using the mftraining and cntraining programs:

       cntraining ...

       This  will  output the normproto data file (the character normalization
       sensitivity prototypes).


       This manual page documents briefly the cntraining command.

       tesseract is a commercial quality OCR engine originally developed at HP
       between  1985  and  1995.  In  1995,  this  engine  was among the top 3
       evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.


       feh(1),        convert(1),         mftraining(1),         tesseract(1),
       unicharset_extractor(1), wordlist2dawg(1).


       tesseract was written by Ray Smith.

       This     manual    page    was    written    by    Jeffrey    Ratcliffe
       <>, for the Debian project (but may be  used
       by others).

                                August 21, 2007