pdf2djvu - creates DjVu files from PDF files

NAME

       pdf2djvu - creates DjVu files from PDF files

SYNOPSIS

       pdf2djvu [{-o | --output} output-djvu-file] [option...] pdf-file

       pdf2djvu {-i | --indirect} index-djvu-file  [option...] pdf-file

       pdf2djvu {--version | --help | -h}

DESCRIPTION

       This program creates a DjVu file from the Portable Document Format file
       pdf-file.

OPTIONS

       pdf2djvu accepts the following options:

   Document type, file names
       -o, --output=output-djvu-file
           Generate a bundled multi-page document. Write the file into
           output-djvu-file instead of standard output.

       -i, --indirect=index-djvu-file
           Generate an indirect multi-page document. Use index-djvu-file as
           the index file name; put the component files into the same
           directory. The directory must exist and be writable.

       --pageid-template=template
           Specifies the naming scheme for page identifiers. Consult the
           "TEMPLATE LANGUAGE" section for the template language description.

           The default template is "p{page:04*}.djvu".

           For portability reasons, page identifiers:

           o   must consist only of lowercase ASCII letters, digits, _, +, -
               and dot,

           o   cannot start with a dot,

           o   cannot contain two consecutive dots,

           o   must end with the .djvu or the .djv extension.

       --pageid-prefix=prefix
           Equivalent to "--pageid-template=prefix{page:04*}.djvu".

       --page-title-template=template
           Specifies the template for page titles. Consult the "TEMPLATE
           LANGUAGE" section for the template language description.

           The default is to set no page titles.

   Resolution, page size
       -d, --dpi=resolution
           Specifies the desired resolution to resolution dots per inch. The
           default is 300 dpi. The allowed range is: 72 <= resolution <= 6000.

       --media-box
           Use MediaBox to determine page size.  CropBox is used by default.

       --page-size=widthxheight
           Specifies the preferred page size to width pixels x height pixels.
           The actual page size may be altered in order to respect aspect
           ratio and DjVu limitations on resolution. (This option takes
           precedence over -d/--dpi.)

       --guess-dpi
           Try to guess native resolution by inspecting embedded images. Use
           with care.

   Image quality
       --bg-slices=n+...+n, --bg-slices=n,...,n
           Specifies the encoding quality of the IW44 background layer. This
           option is similar to the -slice option of c44. Consult the c44(1)
           manual page for details. The default is 72+11+10+10.

       --bg-subsample=n
           Specifies the background subsampling ratio. The default is 3. Valid
           values are integers between 1 and 12, inclusive.

       --fg-colors=default
           Try to preserve all the foreground layer colors. This is the
           default.

       --fg-colors=web
           Reduce foreground layer colors to the web palette (216 colors).
           This option is not recommended.

       --fg-colors=n
           Use GraphicsMagick to reduce number of distinct colors in the
           foreground layer to n. Valid values are integers between 1 and
           4080. This option is not recommended.

       --fg-colors=black
           Discard any color information from the foreground layer.

       --monochrome
           Render pages as monochrome bitmaps. With this option, --bg-...  and
           --fg-...  options are not respected.

       --loss-level=n
           Specifies the aggressiveness of the lossy compression. The default
           is 0 (lossless). Valid values are integers between 0 and 200,
           inclusive. This option is similar to the -losslevel option of cjb2;
           consult the cjb2(1) manual page for details. This option is
           respected only along with the --monochrome option.

       --lossy
           Synonym for --loss-level=100.

       --anti-alias
           Enable font and vector anti-aliasing. This option is not
           recommended.

   Extraction
       --no-metadata
           Don't extract the metadata.

           By default:

           o   The following entries of the document information dictionary
               are extracted: Title, Author, Subject, Creator, Producer,
               CreationDate, ModDate. Timestamps are formatted according to
               RFC 3999[1], with date and time components separated by a
               single space.

               The XMP metadata is extracted (or created) and updated
               accordingly.

       --verbatim-metadata
           Keep the original metadata intact.

       --no-outline
           Don't extract the document outline.

       --hyperlinks=border-avis
           Make hyperlink borders always visible.

           By default, a hyperlink border is visible only when the mouse is
           over the hyperlink.

       --hyperlinks=#RRGGBB
           Force the specified border color for hyperlinks.

       --no-hyperlinks, --hyperlinks=none
           Don't extract hyperlinks.

       --no-text
           Don't extract the text.

       --words
           Extract the text. Record the location of every word. This is the
           default.

       --lines
           Extract the text. Record the location of every line, rather that
           every word.

       --crop-text
           Extract no text outside the page boundary.

       --no-nfkc
           Don't NFKC[2]-normalize the text.

       --filter-text=command-line
           Filter the text through the command-line. The provided filter must
           preserve whitespace, control characters and decimal digits.

           This option implies --no-nfkc.

       -p, --pages=page-range
           Specifies pages to convert.  page-range is a comma-separated list
           of sub-ranges. Each sub-range is either a single page (e.g. 17) or
           a contiguous range of pages (e.g. 37-42). Pages are numbered from
           1.

           The default is to convert all pages.

   Performance
       -j, --jobs=n
           Use n threads to perform conversion. The default is to use one
           thread.

       -j0, --jobs=0
           Determine automatically how many threads to use to perform
           conversion.

   Verbosity, help
       -v, --verbose
           Display more informational messages while converting the file.

       -q, --quiet
           Don't display informational messages while converting the file.

       --version
           Output version information and exit.

       -h, --help
           Display help and exit.

ENVIRONMENT

       OMP_*
           Details of runtime behaviour with respect to parallelism can be
           controlled by several environment variables. Please refer to the
           OpenMP API specification[3] for details.

TEMPLATE LANGUAGE

   Template syntax
       The template language is roughly modelled on the Python string
       formatting syntax[4].

       A template is a piece of text which contains fields, surrounded by
       curly braces {}. Fields are replaced with appropriately formatted
       values when the template is evaluated. Moreover, {{ is replaced with a
       single { and }} is replaced with a single }.

   Field syntax
       Each field consists of a variable name, optionally followed by a shift,
       optionally followed by a format specification.

       The shift is a signed (i.e. starting with a + or - character) integer.

       The format specification consists of a colon, followed by a width
       specification.

       The width specification is a decimal integer defining the minimum field
       width. If not specified, then the field width will be determined by the
       content. Preceding the width specification with a zero (0) character
       enables zero-padding.

       The width specification is optionally followed by an asterisk (*)
       character, which increases the minimum field width to the width of the
       longest possible content of the variable.

   Available variables
       page, spage
           Page number in the PDF document.

       dpage
           Page number in the DjVu document.

IMPLEMENTATION DETAILS

   Layer separation algorithm
       Unless the --monochrome option is on, pdf2djvu uses the following naive
       layer separation algorithm:

        1. For each page, do the following:

            1. Raster the page into a pixmap, in the usual manner.

            2. Raster the page into another pixmap, omitting the following
               page elements:

               o   text,

               o   1 bit-per-pixel raster images,

               o   vector elements (except fills of large areas).

            3. Compare both pixmaps, pixel by pixel:

                1. If their colors match, classify the pixel as a part of the
                   background layer.

                2. Otherwise, classify the pixel as a part of the foreground
                   layer.

BUG REPORTS

       If you find a bug in pdf2djvu, please report it at the issue
       tracker[5].

AUTHOR

       Jakub Wilk <jwilk@jwilk.net>
           Author.

COPYRIGHT

       Copyright (C) 2007, 2008, 2009, 2010 Jakub Wilk

NOTES

        1. RFC 3999
           http://www.ietf.org/rfc/rfc3339

        2. NFKC
           http://unicode.org/reports/tr15/

        3. OpenMP API specification
           http://openmp.org/wp/openmp-specifications/

        4. Python string formatting syntax
           http://docs.python.org/library/string.html#format-string-syntax

        5. the issue tracker
           http://code.google.com/p/pdf2djvu/issues/

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

ENVIRONMENT

TEMPLATE LANGUAGE

IMPLEMENTATION DETAILS

BUG REPORTS

SEE ALSO

AUTHOR

COPYRIGHT

NOTES