csepdjvu - DjVu encoder for separated data files.

NAME

       csepdjvu - DjVu encoder for separated data files.

SYNOPSIS

       csepdjvu  [options] [sepfiles]... outputdjvufile

DESCRIPTION

       This  program creates a DjVuDocument file outputdjvufile from separated
       data files sepfiles.  It can read  separated  data  from  the  standard
       input  when  given  a  single  dash  instead of the separated data file
       names.  This feature is intended for pre-processing programs that  push
       separated data into csepdjvu via a pipe.

       Each  separated data file represents one or more page images.  When the
       program arguments specify multiple pages, all the pages are encoded and
       saved  as  a  bundled  multi-page document.  When the program arguments
       specify a single page, the page is encoded and saved as a  single  page
       file.

OPTIONS

       -d n   Specify  the resolution information encoded into the output file
              expressed in dots per inch. The resolution  information  encoded
              in  DjVu  files  determine how the decoder scales the image on a
              particular display.  Meaningful resolutions  range  from  25  to
              6000.  The default value is 300 dpi.

       -q n,...,n

       -q n+...+n
              Specify  the  encoding  quality  of  the IW44 encoded background
              layer.  The option argument contain several  integers  (one  per
              chunk)  separated  by  either  commas or pluses.  This option is
              similar to option -slice of program c44.  Please  refer  to  the
              c44(1)  man  page  for  additional details.  The default quality
              specification is -q 72,83,93,103.

              This option does not apply to uniformly  white  background  that
              were  not  specified by the separated data but are called for by
              the DjVu specification.  Such background images always  come  at
              the  lowest  possible  resolution  and  with  a standard quality
              setting that ensures the color uniformity.

       -t     Program csepdjvu interprets certain comments  in  the  separated
              file  to  construct  a  hidden text layer in the DjVu file. This
              layer records the location of each word for  hiliting  purposes.
              This  option  reduces  the  file  size  by  simply recording the
              location of each line.

       -v     Display a brief message describing each page.

       -vv    Display extensive informational messages during encoding.

SEPARATED DATA FILE FORMAT

       Each separated data file  contains  a  concatenation  of  one  or  more
       separated  page  images.   Each  page  is  logically  represented  by a
       foreground image with a transparent color and  by  a  background  image
       visible  through  the  transparent pixels.  The data for each separated
       page image is the concatenation of the following data blocks:

       *  A foreground image encoded using either the "Color  RLE  format"  or
          the "Bitonal RLE format".  These formats are described later in this
          section.

       *  An optional background image encoded as a "Portable Pixmap" ( PPM ).
          This  well  known  format  is summarized later in this section.  The
          absence of a background image  simply  indicates  that  a  uniformly
          white background should be assumed.

       *  An arbitrary number of comment lines starting with character "#" and
          terminated by a linefeed character. Comment lines whose  first  word
          starts  with a capital letter have special meanings documented later
          in this document.

       The dimensions (width and height)  of  the  background  image  must  be
       obtained by rounding up the quotient of the foreground image dimensions
       by an integer reduction factor ranging  from  1  to  12.   Assume,  for
       instance,  that  the  width of the foreground is 2507 and the reduction
       factor is 3.  The width of the background image  will  be  the  integer
       ratio (2507+2)/3.

   Color RLE format
       The  Color  RLE format is a simple run-length encoding scheme for color
       images with a limited number of distinct colors.  The data always begin
       with  a  text header composed of the two characters "R6", the number of
       columns, the number of rows, and the number of color  palette  entries.
       All  numbers  are  expressed  in  decimal  ASCII.  These four items are
       separated  by  blank  characters  (space,  tab,  carriage  return,   or
       linefeed)  or  by  comment lines introduced by character "#".  The last
       number is followed by exactly one character which usually is a linefeed
       character.

       The  header is followed by the color palette containing three bytes per
       color entry.  The bytes represent the red, green, and  blue  components
       of the color.

       The  palette  is  followed by a collection of four bytes integers (most
       significant bit first) representing runs of pixels  with  an  identical
       color.  The twelve upper bits of this integer indicate the index of the
       run color in the palette entry.  The twenty lower bits of  the  integer
       indicate  the  run  length.   Color  indices  greater  than  0xff0  are
       reserved.  Color index 0xfff is used for transparent runs.  Each row is
       represented  by  a  sequence  of runs whose lengths add up to the image
       width.  Rows are encoded starting with  the  top  row  and  progressing
       toward the bottom row.

   Bitonal RLE format
       The  Bitonal  RLE  format  is  a  simple run-length encoding scheme for
       bitonal images.  The data always begin with a text header  composed  of
       the two characters "R4", the number of columns, and the number of rows.
       All numbers are expressed in decimal  ASCII.   These  three  items  are
       separated   by  blank  characters  (space,  tab,  carriage  return,  or
       linefeed) or by comment lines introduced by character  "#".   The  last
       number is followed by exactly one character which usually is a linefeed
       character.

       The rest of the file encodes a sequence  of  numbers  representing  the
       lengths of alternating runs of transparent and black pixels.  Lines are
       encoded starting with the top line and progressing  toward  the  bottom
       line.  Each line starts with a white run. The decoder knows that a line
       is finished when the sum of the run lengths for that line is  equal  to
       the  number  of  columns  in  the image.  Numbers in range 0 to 191 are
       represented by a single byte in range 0x00 to 0xbf.  Numbers  in  range
       192 to 16383 are represented by a two byte sequence: the first byte, in
       range 0xc0 to 0xff, encodes  the  six  most  significant  bits  of  the
       number, the second byte encodes the remaining eight bits of the number.
       This scheme allows for runs of length zero, which  are  useful  when  a
       line  starts with a black pixel, and when a very long run (whose length
       exceeds 16383) must be split into smaller runs.

   Portable Pixmap (PPM) format
       The Portable Pixmap format is a  well  known  format  for  representing
       color images.  Check the ppm(1) man page for complete information.

       The data always begin with a text header composed of the two characters
       "P6", the number of columns, the number of rows, and the maximal  value
       of  a  color  component  (usually  255).   All numbers are expressed in
       decimal ASCII.  These three items are  separated  by  blank  characters
       (space,  tab,  carriage  return,  or  linefeed)  or  by  comment  lines
       introduced by character "#".  The last number is  followed  by  exactly
       one character which usually is a linefeed character.

       The rest of the file encodes all the pixels.  Each pixel is represented
       by three bytes representing the red, green and blue  component  of  the
       pixel.  Pixels are ordered in left to right, top to bottom.

   Comments in separated files
       Each  page is followed by an arbitrary number of comment lines starting
       with character "#" and terminated by  a  linefeed  character.   Comment
       lines  whose  first  word  starts  with  a  capital letter have special
       meanings. The following constructs are currently defined:

       *  # T px:py dx:dy wxh+x+y (string)
          This constructs indicates that the piece  of  text  string  must  be
          associated  with an area of size wxh at position x,y relative to the
          lower left corner of the page.  The string is UTF-8 encoded. Special
          characters  can  be  escaped  as  in  PostScript using the backslash
          character.  Integers px,  and  py  represent  the  position  of  the
          current  point  on  the text baseline before the text was drawn. The
          drawing operation then moves the current point by dx, and dy pixels.
          When  such  comments  are  present,  csepdjvu produces a hidden text
          layer for the corresponding pages.

       *  # L wxh+x+y (url)
          This construct indicates that an hyperlink  to  url  url  should  be
          associated  with  area  of  size  wxh  at  position  x,y.  When such
          comments are present, csepdjvu produces  pages  with  an  annotation
          chunk containing the specified hyperlinks.

       *  # B count (string) (#pageno)
          This  constructs  provides outline information for the document.  An
          outline entry  entitled  string  is  associated  with  page  pageno.
          Integer  count  indicates  how many of the following outline entries
          must be attached to the current  entry  as  subentries.   When  such
          comments  are  present  in  the  first  page  csepdjvu  produces  an
          navigation chunk with the specified outline.

CREDITS

       This    program    was    initially    written    by    Leon     Bottou
       <leonb@users.sourceforge.net>   and   was   improved  by  Bill  Riemers
       <docbill@sourceforge.net> and many others.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEPARATED DATA FILE FORMAT

CREDITS

SEE ALSO