NAME
plucker-build - generate a document (e-book) in Plucker format
SYNOPSIS
plucker-build [--alt-maxheight=pixel-height] [--alt-maxwidth=pixel-
width] [--author=string] [--backup] [--beamable] [--bpp=image-depth]
[--category=default-category-name] [--charset=charset-indicator]
[--compression=compression-type] [--depth-first] [--doc-file=name-
prefix] [--doc-name=document-name] [--doc-compression] [--exclusion-
list=filename] [--extra-section=section-name] [--help] [--home-
url=base-URL] [--icon=image-filename] [--launchable] [--maxdepth=depth]
[--maxheight=pixel-height] [--maxwidth=pixel-width] [--no-backup]
[--noimages] [--not-beamable] [--not-launchable] [--no-urlinfo]
[--owner-id=name] [--pluckerdir=output-directory]
[--pluckerhome=plucker-home-directory] [--quiet] [--referrer=string]
[--status-file=filename] [--staybelow=url-prefix] [--stayonhost]
[--title=string] [--update-cache] [--url-pattern=pattern] [--user-
agent=string] [--verbosity=verbosity-level] [--zlib-compression] [HOME-
URL]
DESCRIPTION
plucker-build creates a Plucker binary document, which is a kind of e-
book, from a URL. This document is formatted for the Plucker viewer
program, which currently runs on Palm devices. The normal mode of
operation is to take a home URL and ’pluck’ it to produce a Plucker
document, either to stdout, or to a file if --doc-file is specified.
Alternatively, specifying the option --update-cache will update a cache
of Plucker records (though it’s not clear what this is good for). The
Plucker document format is specified at
http://www.plkr.org/index.pl/cvs/docs/DBFormat.html?rev=HEAD.
OPTIONS
Many options are also available as parameters in the configuration file
$HOME/.pluckerrc, or in the default configuration file. Where
applicable, the name of the configuration file parameter is shown after
the documentation on the option. An option given on the command line
will override any configuration file parameter. For more on
configuration files, see below.
--alt-maxheight=pixel-height
Specifies the maximum height, in pixels, of the alternate
rendition of an image. (When inline images are too large to be
included full-size, they are converted into smaller versions,
with sizes governed by the MAXHEIGHT and MAXWIDTH parameters,
and are linked to larger renditions of the images, called the
alternate rendition.) [alt_maxheight]
--alt-maxwidth=pixel-width
Specifies the maximum width, in pixels, of the alternate
rendition of an image. (When inline images are too large to be
included full-size, they are converted into smaller versions,
with sizes governed by the MAXHEIGHT and MAXWIDTH parameters,
and are linked to larger renditions of the images, called the
alternate rendition.) [alt_maxwidth]
--author=string
Sets the author of the document to string, which is assumed to
be in the charset of the document (see --charset), or ASCII if
no charset is specified. [author_md]
--backup
Sets the bit in the output file that causes the document to be
backed up on Palm HotSync. By default, the document is backed
up. [backup_bit]
--beamable
Sets the bit in the output file that allows the document to be
beamed. By default, the document is beamable.
[copyprevention_bit]
--bpp=image-depth
Specifies the number of bits-per-pixel to be used for images.
Valid values as of Plucker 1.1 are 0, 1 (the default), 2, 4, or
8. If 0 is specified, no images will be included in the
document. See also --noimages. [bpp]
--category=default-category-name
Specifies a default Plucker category or categories to include in
the document. If more than one category is specified, the
category names should be separated by semicolons. [category]
--charset=charset-indicator
Specifies the default character set encoding used in the text of
the documents being plucked. charset-indicator is either a
charset name (from a small list; see
src/parser/python/PyPlucker/__init__.py.in for a list of valid
names), or a decimal integer indicating the charset’s MIBenum
value, as shown in the table at
http://www.iana.org/assignments/character-sets.
[default_charset]
--compression=compression-type
Specifies the type of compression to use in the document. There
are two possible values for compression-type: doc or zlib. The
default is doc, which is the same compression system used in
Palm DOC-format documents. zlib compression usually results in
smaller documents. See also --zlib-compression and --doc-
compression. [compression]
--depth-first
Specifies a depth-first traversal of the web graph, rather than
the default breadth-first traversal. This often works better on
bushy acyclic graph structures than the breadth-first traversal.
[depth_first]
--doc-file=name-prefix (or -f name-prefix)
also as -f name-prefix. Specifies the name of the document
output file, without the directory (specified with --pluckerdir)
or extension (always .pdb). If not specified, and if stdout is
not a tty, the document will be written to stdout. [doc_file]
--doc-name=document-name (or -N document-name)
Specifies the short name by which the document will be
identified in the viewer. Defaults to value of --doc-file. If
--doc-file is not specified, the document name defaults to the
home URL. This name should be limited to 26 characters.
[doc_name]
--doc-compression
Specifies that Doc compression, the compression scheme developed
for the Palm DOC format, should be used for the parts of this
document. This is the default. See also --zlib-compression and
--compression.
--exclusion-list=filename (or -E filename)
Used to add additional files to the the exclusion list, a list
of files containing information on URLs to exclude from the
document. See the User’s Guide for more information on
exclusion lists. [exclusion_lists]
--extra-section=section-name (or -s section-name)
Used to add additional sections to the list to searched sections
in the configuration files. A section is a named set of
configuration information. By default, the DEFAULT section will
be searched, then any operating-system-specific sections, then
any sections specified on the command line.
--help (or -h)
Outputs help on command-line parameters.
--home-url=base-url (or -H base-URL)
Specifies the URL from which the document is to be constructed.
This may also be specified as a single argument on the command
line. If a home URL is not specified, it will default to
file:/$HOME/.plucker/home.html. This default may be changed in
your .pluckerrc file. Note that this value must be a valid
absolute URL. A special URL scheme is supported, plucker:.
This specifies files on the Plucker search path, which consists
of PluckerDir (the Plucker current working directory) followed
by PluckerHome (the Plucker home directory). [home_url]
--icon=image-filename
If the output file is launchable, this switch can be used to
specify the large icon shown in the launcher for the document.
If not specified, a default icon is used. If the output file is
not launchable, this switch has no effect. See also
--launchable. [big_icon]
--launchable
Specifies that the output document should be shown as an icon in
the system launcher. Clicking on the icon will start Plucker
and select this document. By default, documents are not
launchable. [launchable_bit]
--maxdepth=depth (or -M depth)
This specifies the number of levels of links the parser will
traverse when converting the input. It is best to keep this
value small, or the size of your document can get very large.
If you want just a page, but none of the pages pointed to by
that page, use a value of 1. [home_maxdepth]
--maxheight=pixel-height
Specifies the maximum height, in pixels, for an inline image.
Overrides the MAXHEIGHT parameter in the configuration file, but
is in turn overridden by any height specification in the image
link itself. [maxheight]
--maxwidth=pixel-width
Specifies the maximum width, in pixels, for an inline image.
Overrides the MAXWIDTH parameter in the configuration file, but
is in turn overridden by any width specification in the image
link itself. [maxwidth]
--no-backup
Clears the bit in the output file that causes the document to be
backed up on Palm HotSync. By default, the document is backed
up. [backup_bit]
--noimages
Specifies that no images will be included. Identical to
--bpp=0. See also --bpp.
--not-beamable
Sets the bit in the output file that prevents the document from
being beamed. By default, the document is beamable.
[copyprevention_bit]
--not-launchable
Specifies that the output document should not be shown as an
icon in the system launcher. By default, documents are not
launchable. [launchable_bit]
--no-urlinfo
Specifies that no URL information will be included in the
document. When links are included in documents, the information
about the actual URL is included by default. This is often
handy for external references (links to documents not included
in the document). Use of this option may result in a slightly
smaller document. [no_urlinfo]
--owner-id=name
Specifies an owner-id for the document. This causes the
document to be lightly encrypted in such a way that it will only
open on a device with a matching owner-id. With the PalmOS
viewer, the HotSync UserName is used as the owner-id.
[owner_id_build]
--pluckerhome=plucker-home-directory (or -P plucker-home-directory)
Overrides the default value for PluckerHome, which is
$HOME/.plucker/. Can also be specified by setting the
environment variable PLUCKERHOME. An explicit value for
--pluckerhome overrides any setting of PLUCKERHOME.
[PLUCKERHOME]
--pluckerdir=output-directory (or -p output-directory)
Overrides the default value for PluckerDir, which defaults to
PluckerHome (see --pluckerhome). PluckerDir is the default
directory to which output documents will be written, and which
will be searched for input files if the plucker: URL scheme is
used. [pluckerdir]
--quiet (or -q)
Same as --verbosity=0.
--referrer=string
When using HTTP to gather input, send string as the value of the
Referrer HTTP header. Default is to send no referrer header.
[referrer]
--status-file=filename
Gives the name of a file to read to get an estimate for the
total number of pages that have to be processed, and to
continually write with a single line giving the number of pages
collected so far, the number of links still to process, and the
estimated number of total pages that will be gathered (or zero
if this is not known). The three values are written as space-
separated ASCII numbers. The status line in the file is
continually over-written as the pluck progresses, so the file
will always contain only a single line. [status_file]
--staybelow=url-prefix
Automatically excludes all URLs that do not start with url-
prefix. A handy way to process a subtree. [home_staybelow]
--stayondomain
Specifies that no web hosts other than those in the same domain
as the original base URL will be visited for parts of the
document. [home_stayondomain]
--stayonhost
Specifies that no web hosts other than that named in the
original base URL will be visited for parts of the document.
[home_stayonhost]
--title=string
Sets the title of the document to string. This is different
from the name of the document (see --doc-name=) in that it may
be relatively long. The string is assumed to be in the charset
of the document (see --charset), or ASCII if no charset is
specified. [title_md]
--update-cache (or -c)
Update the Plucker cache of records, rather than build a
document. [use_cache]
--url-pattern=pattern
Automatically excludes all URLs that do not match the regular
expression pattern. The regular expression language used is
that of the Python ’re’ module, as specified in
http://www.python.org/doc/current/lib/re-syntax.html.
[home_url_pattern]
--user-agent=string
When using HTTP to gather input, send string as the value of the
User-Agent HTTP header. Default is to send "Plucker/Py-XX",
where XX is the Plucker version. [user_agent]
--verbosity=verbosity-level (or -V verbosity-level)
Sets the level of status information output to the value
specified by verbosity-level. Appropriate values are 0, for
total silence, 1, for standard progress status (the default
value), and 2, for lots of output about gathering and parsing
the input (usually reserved for debugging). Values larger than
2 will also work, but tend to give profuse output that’s only
useful to developers. See also --quiet. [verbosity]
--zlib-compression
Specifies that Zlib compression should be used for the parts of
this document. This is considerably more efficient than the
default compression format, Doc compression. See also --doc-
compression and --compression.
EXAMPLES
To build a pocket version of the weekly cafeteria menu at the foo.com
cafeteria, available on the Web at
http://www.foo.com/ops/cafe/weeklymenu.html, without following any
links, and without including any images, and naming the document
"Cafeteria Menu", and putting the document in a file named
/tmp/Menu.pdb, one would say:
% plucker-build http://www.foo.com/cafe/weeklymenu.html >/tmp/Menu.pdb
Or alternatively,
% plucker-build --pluckerdir=/tmp \
--doc-name="Cafeteria Menu" \
--doc-file=Menu \
--home-url="http://www.foo.com/cafe/weeklymenu.html" \
--maxdepth=1 \
--bpp=0
Pluckerdir is ’/tmp’...
---- 0 collected, 1 to do ----
Processing http://www.foo.com/cafe/weeklymenu.html...
Retrieved ok.
Parsed ok.
---- all pages retrieved and parsed ----
Writing out collected data...
Writing document ’Cafeteria Menu’ to file /tmp/Menu.pdb
Converting http://www.foo.com/cafe/weeklymenu.html...
Wrote 1 <= plucker:/~special~/index
Wrote 2 <= http://www.foo.com/cafe/weeklymenu.html
Wrote 3 <= plucker:/~special~/pluckerlinks
Wrote 5 <= plucker:/~special~/metadata
Wrote 11 <= plucker:/~special~/links1
Done!
% ls -l /tmp/Menu.pdb
-rw-rw-r-- 1 user somegroup 2646 Nov 2 21:19 /tmp/Menu.pdb
%
ENVIRONMENT VARIABLES
HOME Used to determine the location of the user’s configuration file.
If not set, the system-wide configuration file is used.
HTTP_PROXY, HTTP_PROXY_USER, HTTP_PROXY_PASS
If set, will be used to retrieve URLs with the http URL scheme.
PLUCKERHOME
Specifies value for PluckerHome. See the option --pluckerhome
for more details.
PLUCKERDIR
Specifies value for PluckerDir. See the option --pluckerdir for
more details.
CONFIGURATION FILES
Two configuration files are examined for customized settings of the
various plucker-build parameters. The first is a system-wide
configuration file, by default /usr/local/etc/pluckerrc, or
/etc/pluckerrc in your Debian system. Any settings in this may be
overridden with a personal configuration file, $HOME/.pluckerrc. Both
files contain any number of sections, each of which may contain any
number of configuration parameter settings. Each section has a name,
which is enclosed in square brackets, followed by parameter settings.
Normally, only the section called "default" will be examined. Extra
sections may be specified with the --extra-section option to plucker-
build; settings in these sections will override values in the default
section.
Parameter settings have the form form name = value, where name is the
name of a plucker-build parameter, and value is a string, integer,
floating-point, or boolean value. A colon character (:) may be used
instead of the equals sign to separate name and value. Comments may be
expressed by starting any line with the characters "rem", or with the
character "#", or with the character ";". Boolean values of True may
be expressed with "TRUE", "true", "True", "on", or "1". Boolean values
of False may be expressed with "FALSE", "false", "False", "off", or
"0".
Configuration sections are often useful for specific often-used groups
of options. It’s possible to define these options in a section of the
configuration file, and then just specify the section as the argument
to plucker-build; the other options can all be drawn from the section.
The following parameters are understood:
PLUCKERHOME
See option --pluckerhome.
alt_maxheight
See option --alt-maxheight.
alt_maxwidth
See option --alt-maxwidth.
anchor_color
A color to draw all links in, expressed as one of the 16
standard Web color names, or in the Web standard RGB color
notation. See the HTTP 4.0.1 specification for more details on
allowed color names and RGB notation.
author_md
See option --author.
auto_scale_images
A boolean; if true, plucker-build will automatically attempt to
convert images which are too large to include in the document,
to a smaller form which will fit in the document. Defaults to
false.
backup_bit
See option --backup.
big_icon
See option --icon.
bmp_to_tbmp
Name of the bmp2tbmp program in Windows. Defaults to
Bmp2Tbmp.exe.
bmp_to_tbmp_parameter
Parameter for the bmp2tbmp program in the Windows ImageMagick
image parser.
bpp See option --bpp.
cache_dir_name
Specify the subdirectory of PluckerDir to use for cache storage.
The default is "cache".
category
See option --category.
color_paragraphs
Boolean; if set, will insert a specific foreground color at
beginning of every paragraph. Shouldn’t be necessary, and
defaults to off.
compression
See option --compression.
convert_program
If using the deprecated imagemagick image parser, the name of
the convert program. Defaults to convert (convert.exe for
Windows).
convert_program_parameter
Parameter for the Windows ImageMagick image parser’s use of
convert.
copyprevention_bit
See option --beamable.
db_file
Deprecated alternative to doc_file. May disappear in any
release.
db_name
Deprecated alternative to doc_name. May disappear in any
release.
default_charset
See option --charset.
depth_first
See option --depth-first.
djpeg_program
Name of the djpeg program. Defaults to djpeg. Used by the
netpbm2 image parser.
doc_file
See option --doc-file.
doc_name
See option --doc-name.
exclusion_lists
See option --exclusion-list. If multiple files are specified
here, they should be separated by the appropriate separator
character for your operating system (a colon on Unix platforms,
a semicolon on Windows platforms).
filename_extension
Extension to use for the filename. Defaults to pdb. Another
possibility is plkr.
giftopnm_program
Name of program used to convert GIF image files to PNM image
files. Used by the netpbm and netpbm2 image parsers. Defaults
to giftopnm.
guess_tbmp_size
Boolean, defaults to on. Used by the Windows image parser.
home_maxdepth
See option --maxdepth.
home_staybelow
See option --staybelow.
home_stayondomain
See option --stayondomain.
home_stayonhost
See option --stayonhost.
home_url
See option --home-url.
home_url_pattern
See option --url-pattern.
http_proxy
String giving any HTTP proxy server to use. Sets the
environment variable HTTP_PROXY to this value.
http_proxy_pass
String giving a password for any HTTP proxy. Sets the
environment variable HTTP_PROXY_PASS to this value.
http_proxy_user
String giving a username for any HTTP proxy. Sets the
environment variable HTTP_PROXY_USER to this value.
image_compression_limit
Integer giving the minimum number of image bytes to compress.
Defaults to 0. Images smaller than this will not be compressed.
image_parser
String specifying which image parser to use. If not specified,
a working default will be used. It’s suggested that you not
specify this configuration parameter unless you know what you
are doing. Acceptable values are netpbm2, pil2, imagemagick2,
netpbm (deprecated), pil (deprecated), imagemagick (deprecated),
windowspil, windows (deprecated). This value is ignored in the
Java version of plucker-build.
imagemagick_convert_command
Identifies the ImageMagick convert program in the imagemagick2
image parser. Defaults to convert.
indent_paragraphs
Boolean which when set will cause paragraphs to have leading
indentation, but no extra leading space. Defaults to off.
launchable_bit
See option --launchable.
max_tbmp_size
Integer, maximum size for an image in the windows image parser.
maxheight
See option --maxheight.
maxwidth
See option --maxwidth.
no_dithering_in_java_image_quantization
Boolean, used in the Java plucker-build image parser to turn off
dithering when an image is being quantized to the fixed set of
colors used in Palm grayscale or eight-bit colormaps. Defaults
to false.
no_urlinfo
See option --no-urlinfo.
owner_id_build
See option --owner-id.
palm1bit_graymap_file
String, used by the netpbm2 and netpbm image parsers to get the
location of the Palm colormap file.
palm2bit_graymap_file
String, used by the netpbm2 and netpbm image parsers to get the
location of the Palm colormap file.
palm4bit_graymap_file
String, used by the netpbm2 and netpbm image parsers to get the
location of the Palm colormap file.
palm8bit_stdcolormap_file
String, used by the netpbm2 and netpbm image parsers to get the
location of the Palm colormap file.
palmtopnm_program
String, used by the netpbm2 image parser, giving the location of
the palmtopnm program. Defaults to palmtopnm.
pgmtopbm_program
String, used by the netpbm2 image parser, giving the location of
the pgmtopbm program. Defaults to pgmtopbm.
pluckerdir
See option --pluckerdir.
pngtopnm_program
String, used by the netpbm2 image parser, giving the location of
the pngtopnm program. Defaults to pngtopnm.
pnmcut_program
String, used by the netpbm2 image parser, giving the location of
the pnmcut program. Defaults to pnmcut.
pnmdepth_program
String, used by the netpbm2 image parser, giving the location of
the pnmdepth program. Defaults to pnmdepth.
pnmfile_program
String, used by the netpbm2 image parser, giving the location of
the pnmfile program. Defaults to pnmfile.
pnmscale_program
String, used by the netpbm2 image parser, giving the location of
the pnmscale program. Defaults to pnmscale.
ppmquant_program
String, used by the netpbm2 image parser, giving the location of
the pnmquant program. Defaults to pnmquant.
ppmtoTbmp_program
String, used by various image parsers, giving the location of
either the ppmtoTbmp program (in various deprecated image
parsers), or in netpbm2, the pnmtopalm program. In netpbm2,
defaults to pnmtopalm.
ppmtopgm_program
String, used by the netpbm2 image parser, giving the location of
the ppmtopgm program. Defaults to ppmtopgm.
referrer
See option --referrer.
retrieval_timeout
Integer, used to attempt to set a timeout in seconds on all
retrievals. Will not affect timeouts on Java version of
plucker-build.
small_icon
Filename of file containing a Palm icon to use as the small icon
for the document, if the launchable bit is set. Defaults to a
built-in icon.
status_file
See option --status-file.
status_line_length
Integer, specifying, in characters, the length of status lines
output by the distiller. Defaults to 60. If a line is too
long, some of the characters in the center are elided.
tbmp_compression
Boolean, used by the windows image parser to indicate whether or
not to use Palm compression on images. Defaults to true.
tbmp_compression_type
Apparently also boolean, used by the windows image parser to
indicate whether or not to use Palm compression on images.
Defaults to true. The difference between this parameter and
tbmp_compression is not known.
title_md
See option --title.
try_reduce_bpp
Boolean, controls whether the image parser will attempt to scale
a large picture to fit by reducing the number of bits-per-pixel
of the image. Only valid for netpbm2, imagemagick2, pil2, java,
and windows image parsers. Defaults to off. try_reduce_bpp has
precedence over try_reduce_dimension or auto_scale_image.
try_reduce_dimension
Boolean, controls whether the image parser will attempt to scale
a large picture to fit by reducing the size of the image. Only
valid for netpbm2, imagemagick2, pil2, java, and windows parser.
use_cache
See option --update-cache. Misleadingly named.
user_agent
See option --user-agent.
verbosity
See option --verbosity.
zlib_compression
Specifies that zlib compression should be used. Deprecated in
favor of compression.
SEE ALSO
The Plucker User’s Guide, at http://plkr.org/docs/.
BUGS
Report bugs using Debian BTs and the reportbug tool, or directly
upstream to http://bugs.plkr.org/ or <plucker-bugs@rubberchicken.org>
AUTHORS
Holger Duerer, <holly@starship.python.net>, and Bill Janssen,
<bill@janssen.org>
Plucker 1.2 - http://plkr.org/