NAME
hocr2djvused - hOCR to djvused script converter
SYNOPSIS
hocr2djvused [option...]
DESCRIPTION
hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or
Cuneiform[3]) from the standard input and converts it to a djvused
script.
OPTIONS
Text segmentation options
-t lines, --details lines
Record location of every line. Don't record locations of particular
words or characters.
-t words, --details=words
Record location of every line and every word. Don't record
locations of particular characters.
This is the default.
-t chars, --details=chars
Record location of every line, every word and every character.
--word-segmentation=simple
Consider each non-empty sequence of non-whitespace characters a
single word.
This is the default, despite being linguistically incorrect.
--word-segmentation=uax29
Use the Unicode Text Segmentation[4] algorithm to break lines into
words.
This options break assumptions of some DjVu tools that words are
separated by spaces, and therefore is it not recommended.
Other options
--rotation=n
Assume that DjVu pages are rotated by n degrees.
--page-size=widthxheight
Specifies that page size is width pixels × height pixels.
This option is required for hOCR generated by Cuneiform and
superfluous otherwise.
--version
Output version information and exit.
-h, --help
Display help and exit.
SEE ALSO
ocrodjvu(1), djvused(1)
AUTHOR
Jakub Wilk <jwilk@jwilk.net>
Author.
COPYRIGHT
Copyright © 2008, 2009, 2010 Jakub Wilk
NOTES
1. hOCR
http://docs.google.com/View?docid=dfxcv4vc_67g844kf
2. OCRopus
http://ocropus.googlecode.com/
3. Cuneiform
http://launchpad.net/cuneiform-linux
4. Unicode Text Segmentation
http://unicode.org/reports/tr29/