NAME
unpaper - post-processing tool for scanned pages
SYNOPSIS
unpaper [options] input-file(s) output-file(s)
DESCRIPTION
unpaper is a post-processing tool for scanned sheets of paper,
especially for book pages that have been scanned from previously
created photocopies.
The main purpose is to make scanned book pages better readable on
screen after conversion to PDF. Additionally, unpaper might be useful
to enhance the quality of scanned pages before performing optical
character recognition (OCR).
OPTIONS
Filenames may contain a formatting placeholder starting with % to
insert a page counter for multi-page processing. E.g.: scan%03d.pbm to
process files scan001.pbm, scan002.pbm, scan003.pbm etc.
-l, --layout single|double|none
Set default layout options for a sheet:
single: One page per sheet.
double: Two pages per sheet, landscape orientation (one page on
the left half, one page on the right half).
none: No auto-layout, mask-scan-points may individually be
specified.
Using single or double automatically sets corresponding
--mask-scan-points. The default is single.
-start, --start-sheet <sheet>
Number of first sheet to process in multi-sheet mode. (default:
1)
-end, --end-sheet <sheet>
Number of last sheet to process in multi-sheet mode. -1
indicates processing until no more input file with the
corresponding page number is available. (default: -1)
-#, --sheet <sheet>{,<sheet>[-<sheet>]}
Optionally specifies which sheets to process in the range
between start-sheet and end-sheet.
-x, --exclude <sheet>{,<sheet>[-<sheet>]}
Excludes sheets from processing in the range between start-sheet
and end-sheet.
--pre-rotate -90|90
Rotates the whole image clockwise (90) or counter-clockwise
(-90) before any other processing.
--post-rotate -90|90
Rotates the whole image clockwise (90) or counter-clockwise
(-90) after any other processing.
-M, --pre-mirror [v[ertical]][,][h[orizontal]]
Mirror the image, after possible pre-rotation. Either v (for
vertical mirroring), h (for horizontal mirroring) or v,h (for
both) can be specified.
--post-mirror [v[ertical]][,][h[orizontal]]
Mirror the image, after any other processing except possible
post-rotation.
--pre-shift <h>,<v>
Shift the image before further processing. Values for h
(horizontal shift) and v (vertical shift) can either be positive
or negative.
--post-shift <h>,<v>
Shift the image after other processing. Values for h (horizontal
shift) and v (vertical shift) can either be positive or
negative.
--pre-wipe <left>,<top>,<right>,<bottom>
Manually wipe out an area before further processing. Any pixel
in a wiped area will be set to white. Multiple areas to be wiped
may be specified by multiple occurrences of this options.
--post-wipe <left>,<top>,<right>,<bottom>
Manually wipe out an area after processing. Any pixel in a wiped
area will be set to white. Multiple areas to be wiped may be
specified by multiple occurrences of this options.
--pre-border <left>,<top>,<right>,<bottom>
Clear the border-area of the sheet before further processing.
Any pixel in the border area will be set to white.
--post-border <left>,<top>,<right>,<bottom>
Clear the border-area after processing. Any pixel in the border
area will be set to white.
--pre-mask <x1>,<y1>,<x2>,<y2>
Specify masks to apply before any other processing. Any pixel
outside a mask will be set to white, unless another mask
includes this pixel. Only pixels inside a mask will remain.
Multiple masks may be specified. No deskewing will be applied to
the masks specified by --pre-mask.
-s, --size <width>,<height>|<size-name>
Change the sheet size before other processing is applied.
Content on the sheet gets zoomed to fit to the appropriate size,
but the aspect ratio is preserved. Instead, if the sheet’s
aspect ratio changes, the zoomed content gets centered on the
sheet. Size-name can also be a standard name as a4, letter, etc.
Possible size names are:
a5
a4
a3
letter
legal
All size names can also be applied in rotated landscape
orientation, use a4-landscape, letter-landscape etc.
--post-size <width>,<height>|<name>
Change the sheet size preserving the content’s aspect ratio
after other processing steps are applied.
--stretch <width>,<height>|<name>
Change the sheet size before other processing is applied.
Content on the sheet gets stretched to the specified size,
possibly changing the aspect ratio.
--post-stretch <width>,<height>|<name>
Change the sheet size after other processing is applied. Content
on the sheet gets stretched to the specified size, possibly
changing the aspect ratio.
-z, --zoom <factor>
Change the sheet size according to the given factor before other
processing is done.
--post-zoom <factor>
Change the sheet size according to the given factor after
processing is done.
-bn, --blackfilter-scan-direction [v[ertical]][,][h[orizontal]]
Directions in which to search for solidly black areas. Either v
(for vertical scanning), h (for horizontal scanning) or v,h (for
both) can be specified. (default: v,h)
-bs, --blackfilter-scan-size <size>|<h-size>,<v-size>
Width of virtual bar used for mask detection. Two values may be
specified to individually set horizontal and vertical size.
(default: 20,20)
-bd, --blackfilter-scan-depth <depth>|<h-depth,v-depth>
Size of virtual bar used for black area detection. (default:
500,500)
-bp, --blackfilter-scan-step <step>|<h-step,v-step>
Steps to move virtual bar for black area detection. (default:
5,5)
-bt, --blackfilter-scan-threshold <t>
Ratio of dark pixels above which a black area gets detected.
(default: 0.95)
-bx, --blackfilter-scan-exclude <left>,<top>,<right>,<bottom>
Area on which the blackfilter should not operate. This can be
useful to prevent the blackfilter from working on inner page
content. May be specified multiple times to set more than one
area.
-bi, --blackfilter-intensity <i>
Intensity with which to delete black areas. Larger values will
leave less noise-pixels around former black areas, but may
delete page content. (default: 20)
-ni, --noisefilter-intensity <n>
Intensity with which to delete individual pixels or tiny
clusters of pixels. Any cluster which only contains n dark
pixels together will be deleted. (default: 4)
-ls, --blurfilter-size <size>|<h-size>,<v-size>
Size of blurfilter area to search for ’lonely’ clusters of
pixels. (default: 100,100)
-lp, --blurfilter-step <step>|<h-step>,<v-step>
Size of ’blurring’ steps in each direction. (default: 50,50)
-li, --blurfilter-intensity <ratio>
Relative intensity with which to delete tiny clusters of pixels.
Any blurred area which contains at most the ratio of dark pixels
will be cleared. (default: 0.01)
-gs, --grayfilter-size <size>|<h-size>,<v-size>
Size of grayfilter mask to search for ’gray-only’ areas of
pixels. (default: 50,50)
-gp, --grayfilter-step <step>|<h-step>,<v-step>
Size of steps moving the grayfilter mask in each direction.
(default: 20,20)
-gt, --grayfilter-threshold <ratio>
Relative intensity of grayness which is accepted before clearing
the grayfilter mask in cases where no black pixel is found in
the mask. (default: 0.5)
-p, --mask-scan-point <x>,<y>
Manually set starting point for mask-detection. Multiple
--mask-scan-point options may be specified to detect multiple
masks.
-m, --mask <x1>,<y1>,<x2>,<y2>
Manually add a mask, in addition to masks automatically detected
around the --mask-scan-point coordinates (unless --no-mask-scan
is specified). Any pixel outside a mask will be set to white,
unless another mask covers this pixel.
-mn, --mask-scan-direction [v[ertical]][,][h[orizontal]]
Directions in which to search for mask borders, starting from
--mask-scan-point coordinates. Either v (for vertical scanning),
h (for horizontal scanning) or v,h (for both) can be specified.
(default: h (v may cut text-paragraphs on single-page sheets))
-ms, --mask-scan-size <size>|<h,v>
Width of the virtual bar used for mask detection. Two values may
be specified to individually set horizontal and vertical size.
(default: 50,50)
-md, --mask-scan-depth <dep>|<h,v>
Height of the virtual bar used for mask detection. (default:
-1,-1, using the total width or height of the sheet)
-mp, --mask-scan-step <step>|<h,v>
Steps to move the virtual bar for mask detection. (default: 5,5)
-mt, --mask-scan-threshold <t>|<h,v>
Ratio of dark pixels below which an edge gets detected, relative
to max. blackness when counting from the start coordinate
heading towards one edge. (default: 0.1)
-mm, --mask-scan-minimum <w>,<h>
Minimum allowed size of an auto-detected mask. Masks detected
below this size will be ignored and set to the size specified by
mask-scan-maximum. (default: 100,100)
-mM, --mask-scan-maximum <w>,<h>
Maximum allowed size of an auto-detected mask. Masks detected
above this size will be shrunk to the maximum value, each
direction individually. (default: sheet size, or page size
derived from --layout option)
-mc, --mask-color <color>
Color value with which to wipe out pixels not covered by any
mask. Maybe useful for testing in order to visualize the effect
of masking. (Note that an RGB-value is expected: R*65536 + G*256
+ B)
-dn, --deskew-scan-direction <left>,<top>,<right>,<bottom>
Edges from which to scan for rotation. Each edge of a mask can
be used to detect the mask’s rotation. If multiple edges are
specified, the average value will be used, unless the
statistical deviation exceeds --deskew-scan-deviation. Use left
for scanning from the left edge, top for scanning from the top
edge, right for scanning from the right edge, bottom for
scanning from the bottom. Multiple directions can be separated
by commas. (default: left,right)
-ds, --deskew-scan-size <pixels>
Size of virtual line for rotation detection. (default: 1500)
-dd, --deskew-scan-depth <ratio>
Amount of dark pixels to accumulate until scanning is stopped,
relative to scan-bar size. (default: 0.5)
-dr, --deskew-scan-range <degrees>
Range in which to search for rotation, from -degrees to +degrees
rotation. (default: 5.0)
-dp, --deskew-scan-step <degrees>
Steps between single rotation-angle detections. Lower numbers
lead to better results but slow down processing. (default: 0.1)
-dv, --deskew-scan-deviation <dev>
Maximum statistical deviation allowed among the results from
detected edges. No rotation if exceeded. (default: 1.0)
-W, --wipe <left>,<top>,<right>,<bottom>
Manually wipe out an area. Any pixel in a wiped area will be set
to white. Multiple --wipe areas may be specified. This is
applied after deskewing and before automatic border-scan.
-mw, --middle-wipe <size>|<left>,<right>
If --layout is set to double, this may specify the size of a
middle area to wipe out between the two pages on the sheet. This
may be useful if the blackfilter fails to remove some black
areas (e.g. resulting from photo-copying in the middle between
two pages).
-B, --border <left>,<top>,<right>,<bottom>
Manually add a border. Any pixel in the border area will be set
to white. This is applied after deskewing and before automatic
border-scan.
-Bn, --border-scan-direction [v[ertical]][,][h[orizontal]]
Directions in which to search for outer border. Either v (for
vertical scanning), h (for horizontal scanning) or v,h (for
both) can be specified. (default: v)
-Bs, --border-scan-size <size>|<h,v>
Width of virtual bar used for border detection. Two values may
be specified to individually set horizontal and vertical size.
(default: 5,5)
-Bp, --border-scan-step <step>|<h,v>
Steps to move virtual bar for border detection. (default: 5,5)
-Bt, --border-scan-threshold <t>
Absolute number of dark pixels covered by the border-scan mask
above which a border is detected. (default: 5)
-Ba, --border-align <left>,<top>,<right>,<bottom>
Direction where to shift the detected border-area. Use
--border-margin to specify horizontal and vertical distances to
be kept from the sheet-edge. (default: none)
-Bm, --border-margin <vertical>,<horizontal>
Distance to keep from the sheet edge when aligning a border
area. May use measurement suffices such as cm, in.
-w, --white-threshold <threshold>
Brightness ratio above which a pixel is considered white.
(default: 0.9)
-b, --black-threshold <threshold>
Brightness ratio below which a pixel is considered black (non-
gray). This is used by the gray-filter. This value is also used
when converting a grayscale image to black-and-white mode
(default: 0.33)
-ip, --input-pages 1|2
If 2 is specified, read two input images instead of one and
internally combine them to a doubled-layout sheet before further
processing. Before internally combining, --pre-rotation is
optionally applied individually to both input images as the very
first processing steps.
-op, --output-pages 1|2
If 2 is specified, write two output images instead of one, as a
result of splitting a doubled-layout sheet after processing.
After splitting the sheet, --post-rotation is optionally applied
individually to both output images as the very last processing
step.
-S, --sheet-size <width>,<height>|<size-name>
Force a fix sheet size. Usually, the sheet size is determined by
the input image size (if input-pages=1), or by the double size
of the first page in a two-page input set (if input-pages=2). If
the input image is smaller than the size specified here, it will
appear centered and surrounded with a white border on the sheet.
If the input image is bigger, it will be centered and the edges
will be cropped. This option may also be helpful to get regular
sized output images if the input image sizes differ. Standard
size-names like a4-landscape, letter, etc. may be used (see
--size). (default: as in input file)
--sheet-background black|white
Sets a color with which the sheet is filled before any image is
loaded and placed onto it. This can be useful when the sheet
size and the image size differ.
--no-blackfilter <sheet>{,<sheet>[-<sheet>]}
Disables black area scan. Individual sheet indices can be
specified.
--no-noisefilter <sheet>{,<sheet>[-<sheet>]}
Disables the noise filter. Individual sheet indices can be
specified.
--no-blurfilter <sheet>{,<sheet>[-<sheet>]}
Disables the blur filter. Individual sheet indices can be
specified.
--no-grayfilter <sheet>{,<sheet>[-<sheet>]}
Disables the gray filter. Individual sheet indices can be
specified.
--no-mask-scan <sheet>{,<sheet>[-<sheet>]}
Disables mask-detection. Masks explicitly set by --mask will
still have effect. Individual sheet indices can be specified.
--no-mask-center <sheet>{,<sheet>[-<sheet>]}
Disables auto-centering of each mask. Auto-centering is
performed by default if the --layout option has been set.
Individual sheet indices can be specified.
--no-deskew <sheet>{,<sheet>[-<sheet>]}
Disables deskewing. Individual sheet indices can be specified.
--no-wipe <sheet>{,<sheet>[-<sheet>]}
Disables explicit wipe-areas. This means the effect of parameter
--wipe can be disabled individually per sheet.
--no-border <sheet>{,<sheet>[-<sheet>]}
Disables explicitly set borders. This means the effect of
parameter --border can be disabled individually per sheet.
--no-border-scan <sheet>{,<sheet>[-<sheet>]}
Disables border-scanning from the edges of the sheet. Individual
sheet indices can be specified.
--no-border-align <sheet>{,<sheet>[-<sheet>]}
Disables aligning of the area detected by border-scanning (see
--border-align). Individual sheet indices can be specified.
-n, --no-processing <sheet>{,<sheet>[-<sheet>]}
Do not perform any processing on a sheet except pre/post
rotating and mirroring, and file-depth conversions on saving.
This option has the same effect as setting all --no-xxx options
together. Individual sheet indices can be specified.
--no-qpixels
Disable qpixel-mode for deskewing (do not internally use a 4x
bigger image when rotating).
--no-multi-pages
Disable multi-page processing even if the input filename
contains a counter).
--dpi <dpi>
Dots per inch used for conversion of measured size values, like
e.g. 21cm,27.9cm. Note that this parameter should occur before
specifying any size value with measurement suffix. (default:
300)
-t, --type pbm|pgm
Output file type. (default: as input)
-d, --depth <bits>
Output pixel depth. (default: as input)
-T, --test-only
Do not write any output. May be useful in combination with
--verbose to get information about the input.
-in, --input-file-sequence <file-patterns>
Sequence of input filename patterns which is repeatedly
traversed while resolving input filenames. Specifying a single
entry is equivalent to the first filename argument after the
options-list.
-out, --output-file-sequence <file-patterns>
Sequence of output filename patterns which is repeatedly
traversed while resolving output filenames. Specifying a single
entry is equivalent to the second filename argument after the
options-list.
-si, --start-input <nr>
Set the first page number to substitute for %d in input
filenames. Every time the input file sequence is repeated, this
number gets increased by 1. (default:
(startsheet-1)*inputpages+1)
-so, --start-output <nr>
Set the first page number to substitute for %d in output
filenames. Every time the output file sequence is repeated, this
number gets increased by 1. (default:
(startsheet-1)*outputpages+1)
--insert-blank <nr>{,<nr>[-<nr>]}
Use blank input instead of an input file from the input file
sequence at the specified index-positions. The input file
sequence will be interrupted temporarily and will continue with
the next input file afterwards. This can be useful to insert
blank content into a sequence of input images.
--replace-blank <nr>{,<nr>[-<nr>]}
Like --insert-blank, but the input images at the specified index
positions get replaced with blank content and thus will be
ignored.
--overwrite
Allow overwriting existing files. Otherwise the program
terminates with an error if an output-file to be written already
exists.
-q, --quiet
Quiet mode, no output at all.
-v, --verbose
Verbose output, more info messages.
-vv Even more verbose output, show parameter settings before
processing.
--time Output processing time consumed.
-V, --version
Output version and build information.
AUTHOR
unpaper was written by Jens Gulden <unpaper@jensgulden.de>.
This manual page was written by Julien BLACHE <jblache@debian.org>, for
the Debian project (but may be used by others).
December 31, 2007