Man Linux: Main Page and Category List

NAME

       ocropus - command line OCR tool

SYNOPSIS

       ocroscript <script> <arguments>

DESCRIPTION

       You  can  see  a  list  of  all  available  commands  by looking in the
       $OCROSCRIPTS (/usr/share/ocropus/scripts/ by default) path.

       The ‘recognize’ script uses tesseract for  recognition  and  sends  the
       html-based  hOCR ouput to stdout. Tesseract is probably the most mature
       text recognizer within  OCRopus  at  the  moment.  Natively,  Tesseract
       doesn’t  do  layout analysis, but combined with OCRopus, it makes for a
       pretty good OCR system:
              $ ocroscript recognize page.png > page.html

       Here is  a  brief  summary  of  the  remaining  command  line  commands
       available.  You will need to look at the script to see what the command
       line arguments are:

       degrade.lua
              Simple document image degradation

       hocr-to-text.lua
              Convert hOCR output to plain text.

       line-clean.lua
              Given a line image, remove marginal noise  and  fix  some  other
              problems.

       sauvola.lua
              Perform Sauvola thresholding.

SEE ALSO

       tesseract(1),

AUTHOR

       ocroscript was written by Thomas Breuel.

       This     manual    page    was    written    by    Jeffrey    Ratcliffe
       <Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be  used
       by others).

                                 June 06, 2008                   ocroscript(1)