Man Linux: Main Page and Category List

NAME

       tesseract - command line OCR tool

SYNOPSIS

       Part  of  the  process  to train tesseract for a new language. When the
       character features of all the training pages have  been  extracted,  we
       need  to  cluster  them  to  create the prototypes. The character shape
       features can be clustered using the mftraining and cntraining programs:

       mftraining fontfile_1.tr fontfile_2.tr ...

       This  will  output  two  data files: inttemp (the shape prototypes) and
       pffmtable (the number of expected  features  for  each  character).  (A
       third  file called Microfeat is also written by this program, but it is
       not used.)

DESCRIPTION

       This manual page documents briefly the mftraining command.

       tesseract is a commercial quality OCR engine originally developed at HP
       between  1985  and  1995.  In  1995,  this  engine  was among the top 3
       evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

SEE ALSO

       feh(1),        convert(1),         tesseract(1),         cntraining(1),
       unicharset_extractor(1), wordlist2dawg(1).

AUTHOR

       tesseract was written by Ray Smith.

       This     manual    page    was    written    by    Jeffrey    Ratcliffe
       <Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be  used
       by others).

                                August 21, 2007