NAME
datapacker - Tool to pack files into the minimum number of bins
SYNOPSIS
datapacker [ -0 ] [ -a ACTION ] [ -b FORMAT ] [ -d ] [ -p ] [ -S SIZE ]
-s SIZE FILE ...
datapacker -h | --help
DESCRIPTION
datapacker is a tool to group files by size. It is designed to group
files such that they fill fixed-size containers (called "bins") using
the minimum number of containers. This is useful, for instance, if you
want to archive a number of files to CD or DVD, and want to organize
them such that you use the minimum possible number of CDs or DVDs.
In many cases, datapacker executes almost instantaneously. Of
particular note, the hardlink action (see OPTIONS below) can be used to
effectively copy data into bins without having to actually copy the
data at all.
datapacker is a tool in the traditional Unix style; it can be used in
pipes and call other tools.
OPTIONS
Here are the command-line options you may set for datapacker. Please
note that -s and at least one file (see FILE SPECIFICATION below) is
mandatory.
-0
--null When reading a list of files from standard input (see FILE
SPECIFICATION below), expect the input to be separated by NULL
(ASCII 0) characters instead of one per line. Especially useful
with find -print0.
-a ACTION
--action=ACTION
Defines what action to take with the matches. Please note that,
with any action, the output will be sorted by bin, with bin 1
first. Possible actions include:
print Print one human-readable line per file. Each line
contains the bin number (in the format given by -b), an
ASCII tab character, then the filename.
printfull
Print one semi-human-readable line per bin. Each line
contains the bin number, then a list of filenames to
place in that bin, with an ASCII tab character after the
bin number and between each filename.
print0 For each file, output the bin number (according to the
format given by -b), an ASCII NULL character, the
filename, and another ASCII NULL character. Ideal for
use with xargs -0 -L 2.
exec:COMMAND
For each file, execute the specified COMMAND via the
shell. The program COMMAND will be passed information on
its command line as indicated below.
It is an error if the generated command line for a given
bin is too large for the system.
A nonzero exit code from any COMMAND will cause
datapacker to terminate. If COMMAND contains quotes,
don’t forget to quote the entire command, as in:
datapacker ’--action=exec:echo "Bin: $1"; shift; ls "$@"’
The arguments to the given command will be:
· argv[0] ($0 in shell) will be the name of the shell
used to invoke the command -- $SHELL or /bin/sh.
· argv[1] ($1 in shell) will be the bin number, formatted
according to -b.
· argv[2] and on ($2 and on in shell) will be the files
to place in that bin
hardlink
For each file, create a hardlink at bin/filename pointing
to the original input filename. Creates the directory
bin as necessary. Alternative locations and formats for
bin can be specified with -b. All bin directories and
all input must reside on the same filesystem.
After you are done processing the results of the bin, you
may safely delete the bins without deleting original
data. Alternatively, you could leave the bins and delete
the original data. Either approach will be workable.
It is an error to attempt to make a hard link across
filesystems, or to have two input files with the same
filename in different paths. datapacker will exit on
either of these situations.
See also --deep-links.
symlink
Like hardlink, but create symlinks instead. Symlinks can
span filesystems, but you will lose information if you
remove the original (pre-bin) data. Like hardlink, it is
an error to have a single filename occur in multiple
input directories with this option.
See also --deep-links.
-b FORMAT
--binfmt=FORMAT
Defines the output format for the bin name. This format is
given as a %d input to a function that interprets it as
printf(3) would. This can be useful both to define the name and
the location of your bins. When running datapacker with certain
arguments, the bin format can be taken to be a directory in
which files in that bin are linked. The default is %03d, which
outputs integers with leading zeros to make all bin names at
least three characters wide.
Other useful variants could include destdir/%d to put the string
"destdir/" in front of the bin number, which is rendered without
leading zeros.
-d
--debug
Enable debug mode. This is here for future expansion and does
not currently have any effect.
-D
--deep-links
When used with the symlink or hardlink action, instead of making
all links in a single flat directory under the bin, mimic the
source directory structure under the bin. Makes most sense when
used with -p, but could also be useful without it if there are
files with the same name in different source directories.
--help Display brief usage information and exit.
-p
--preserve-order
Normally, datapacker uses an efficient algorithm that tries to
rearrange files such that the number of bins required is
minimized. Sometimes you may instead wish to preserve the
ordering of files at the expense of potentially using more bins.
In these cases, you would want to use this option.
As an example of such a situation: perhaps you have taken one
photo a day for several years. You would like to archive these
photos to CD, but you want them to be stored in chronological
order. You have named the files such that the names indicate
order, so you can pass the file list to datapacker using -p to
preserve the ordering in your bins. Thus, bin 1 will contain
the oldest files, bin 2 the second-oldest, and so on. If -p
wasn’t used, you might use fewer CDs, but the photos would be
spread out across all CDs without preserving your chronological
order.
-s SIZE
--size=SIZE
Gives the size of each bin in bytes. Suffixes such as "k", "m",
"g", etc. may be used to indicate kilobytes, megabytes,
gigabytes, and so forth. Numbers such as 1.5g are valid, and if
needed, will be rounded to the nearest possible integer value.
The size of the first bin may be overridden with -S.
Here are the sizes of some commonly-used bins. For each item, I
have provided you with both the underlying recording capacity of
the disc and a suggested value for -s. The suggested value for
-s is lower than the underlying capacity because there is
overhead imposed by the filesystem stored on the disc. You will
perhaps find that the suggested value for -s is lower than
optimal for discs that contain few large files, and higher than
desired for discs that contain vast amounts of small files.
· CD-ROM, 74-minute (standard): 650m / 600m
· CD-ROM, 80-minute: 703m / 650m
· CD-ROM, 90-minute: 790m / 740m
· CD-ROM, 99-minute: 870m / 820m
· DVD+-R: 4.377g / 4g
· DVD+R, dual layer: 8.5g / 8g
-S
--size-first
The size of the first bin. If not given, defaults to the value
given with -s. This may be useful if you will be using a
mechanism outside datapacker to add additional information to
the first bin: perhaps an index of which bin has which file, the
information necessary to make a CD bootable, etc. You may use
the same suffixes as with -s with this option.
--sort Sorts the list of files to process before acting upon them.
When combined with -p, causes the output to be sorted. This
option has no effect save increasing CPU usage when not combined
with -p.
FILE SPECIFICATION
After the options, you must supply one or more files to consider for
packing into bins. Alternatively, instead of listing files on the
command line, you may list a single hyphen (-), which tells datapacker
to read the list of files from standard input (stdin).
datapacker never recurses into subdirectories. If you want a recursive
search -- finding all files in a given directory and all its
subdirectories -- see the second example in the EXAMPLES section below.
datapacker is designed to integrate with find(1) in this situation to
let you take advantage of find’s built-in powerful recursion and
filtering features.
When reading files from standard input, it is assumed that the list
contains one distinct filename per line. Seasoned POSIX veterans will
recognize the inherent limitations in this format. For that reason,
when given -0 in conjunction with the single file -, datapacker will
instead expect, on standard input, a list of files, each one terminated
by an ASCII NULL character. Such a list can be easily generated with
find(1) using its -print0 option.
EXAMPLES
· Put all JPEG images in ~/Pictures into bins (using hardlinks) under
the pre-existing directory ~/bins, no more than 600MB per bin:
datapacker -b ~/bins/%03d -s 600m -a hardlink ~/Pictures/*.jpg
· Put all files in ~/Pictures or any subdirectory thereof into 600MB
bins under ~/bins, using hardlinking. This is a simple example to
follow if you simply want a recursive search of all files.
find ~/Pictures -type f -print0 | \
datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
· Find all JPEG images in ~/Pictures or any subdirectory thereof, put
them into bins (using hardlinks) under the pre-existing directory
~/bins, no more than 600MB per bin:
find ~/Pictures -name "*.jpg" -print0 | \
datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -
· Find all JPEG images as above, put them in 4GB bins, but instead of
putting them anywhere, calculate the size of each bin and display it.
find ~/Pictures -name "*.jpg" -print0 | \
datapacker -0 -b ~/bins/%03d -s 4g \
’--action=exec:echo -n "$1: "; shift; du -ch "$@" | grep total’ \
-
This will display output like so:
/home/jgoerzen/bins/001: 4.0G total
/home/jgoerzen/bins/002: 4.0G total
/home/jgoerzen/bins/003: 4.0G total
/home/jgoerzen/bins/004: 992M total
Note: the grep pattern in this example is simple, but will cause
unexpected results if any matching file contains the word "total".
· Find all JPEG images as above, and generate 600MB ISO images of them
in ~/bins. This will generate the ISO images directly without ever
hardlinking files into ~/bins.
find ~/Pictures -name "*.jpg" -print0 | \
datapacker -0 -b ~/bins/%03d.iso -s 4g \
’--action=exec:BIN="$1"; shift; mkisofs -r -J -o "$BIN" "$@"’ \
-
You could, if you so desired, pipe this result directly into a DVD-
burning application. Or, you could use growisofs to burn a DVD+R in
a single step.
ERRORS
It is an error if any specified file exceeds the value given with -s or
-S.
It is also an error if any specified files disappear while datapacker
is running.
BUGS
Reports of bugs should be reported online at the datapacker homepage.
Debian users are encouraged to instead use the Debian bug-tracking
system.
COPYRIGHT
datapacker, and this manual, are Copyright (C) 2008 John Goerzen.
All code, documentation, and build scripts are under the following
license unless otherwise noted:
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program. If not, see
<URL:http://www.gnu.org/licenses/>.
The GNU General Public License is available in the file COPYING in the
source distribution. Debian GNU/Linux users may find this in
/usr/share/common-licenses/GPL-3.
If the GPL is unacceptable for your uses, please e-mail me; alternative
terms can be negotiated for your project.
AUTHOR
datapacker, its libraries, documentation, and all included files,
except where noted, was written by John Goerzen <jgoerzen@complete.org>
and copyright is held as stated in the COPYRIGHT section.
datapacker may be downloaded, and information found, from its homepage
<URL:http://software.complete.org/datapacker>.
SEE ALSO
mkisofs(1), genisoimage(1)