Man Linux: Main Page and Category List

NAME

       hocr2djvused - hOCR to djvused script converter

SYNOPSIS

       hocr2djvused [option...]

DESCRIPTION

       hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or
       Cuneiform[3]) from the standard input and converts it to a djvused
       script.

OPTIONS

   Text segmentation options
       -t lines, --details lines
           Record location of every line. Don't record locations of particular
           words or characters.

       -t words, --details=words
           Record location of every line and every word. Don't record
           locations of particular characters.

           This is the default.

       -t chars, --details=chars
           Record location of every line, every word and every character.

       --word-segmentation=simple
           Consider each non-empty sequence of non-whitespace characters a
           single word.

           This is the default, despite being linguistically incorrect.

       --word-segmentation=uax29
           Use the Unicode Text Segmentation[4] algorithm to break lines into
           words.

           This options break assumptions of some DjVu tools that words are
           separated by spaces, and therefore is it not recommended.

   Other options
       --rotation=n
           Assume that DjVu pages are rotated by n degrees.

       --page-size=widthxheight
           Specifies that page size is width pixels × height pixels.

           This option is required for hOCR generated by Cuneiform and
           superfluous otherwise.

       --version
           Output version information and exit.

       -h, --help
           Display help and exit.

SEE ALSO

       ocrodjvu(1), djvused(1)

AUTHOR

       Jakub Wilk <jwilk@jwilk.net>
           Author.

COPYRIGHT

       Copyright © 2008, 2009, 2010 Jakub Wilk

NOTES

        1. hOCR
           http://docs.google.com/View?docid=dfxcv4vc_67g844kf

        2. OCRopus
           http://ocropus.googlecode.com/

        3. Cuneiform
           http://launchpad.net/cuneiform-linux

        4. Unicode Text Segmentation
           http://unicode.org/reports/tr29/