Man Linux: Main Page and Category List

NAME

       datapacker - Tool to pack files into the minimum number        of bins

SYNOPSIS

       datapacker [ -0 ] [ -a ACTION ] [ -b FORMAT ] [ -d ] [ -p ] [ -S SIZE ]
       -s SIZE FILE ...

       datapacker -h | --help

DESCRIPTION

       datapacker is a tool to group files by size.  It is designed  to  group
       files  such  that they fill fixed-size containers (called "bins") using
       the minimum number of containers.  This is useful, for instance, if you
       want  to  archive  a number of files to CD or DVD, and want to organize
       them such that you use the minimum possible number of CDs or DVDs.

       In  many  cases,  datapacker  executes  almost   instantaneously.    Of
       particular note, the hardlink action (see OPTIONS below) can be used to
       effectively copy data into bins without having  to  actually  copy  the
       data at all.

       datapacker  is  a tool in the traditional Unix style; it can be used in
       pipes and call other tools.

OPTIONS

       Here are the command-line options you may set for  datapacker.   Please
       note  that  -s  and at least one file (see FILE SPECIFICATION below) is
       mandatory.

       -0

       --null When reading a list of  files  from  standard  input  (see  FILE
              SPECIFICATION  below),  expect the input to be separated by NULL
              (ASCII 0) characters instead of one per line.  Especially useful
              with find -print0.

       -a ACTION

       --action=ACTION
              Defines what action to take with the matches.  Please note that,
              with any action, the output will be sorted by bin,  with  bin  1
              first.  Possible actions include:

              print  Print  one  human-readable  line  per  file.   Each  line
                     contains the bin number (in the format given by  -b),  an
                     ASCII tab character, then the filename.

              printfull
                     Print  one  semi-human-readable  line per bin.  Each line
                     contains the bin number, then  a  list  of  filenames  to
                     place  in that bin, with an ASCII tab character after the
                     bin number and between each filename.

              print0 For each file, output the bin number  (according  to  the
                     format  given  by  -b),  an  ASCII  NULL  character,  the
                     filename, and another ASCII NULL  character.   Ideal  for
                     use with xargs -0 -L 2.

              exec:COMMAND
                     For  each  file,  execute  the  specified COMMAND via the
                     shell.  The program COMMAND will be passed information on
                     its command line as indicated below.

                     It  is an error if the generated command line for a given
                     bin is too large for the system.

                     A  nonzero  exit  code  from  any  COMMAND   will   cause
                     datapacker  to  terminate.   If  COMMAND contains quotes,
                     don’t forget to quote the entire command, as in:

                     datapacker ’--action=exec:echo "Bin: $1"; shift; ls "$@"’

                     The arguments to the given command will be:

                     · argv[0] ($0 in shell) will be the  name  of  the  shell
                       used to invoke the command -- $SHELL or /bin/sh.

                     · argv[1] ($1 in shell) will be the bin number, formatted
                       according to -b.

                     · argv[2] and on ($2 and on in shell) will be  the  files
                       to place in that bin

              hardlink
                     For each file, create a hardlink at bin/filename pointing
                     to the original input filename.   Creates  the  directory
                     bin  as necessary.  Alternative locations and formats for
                     bin can be specified with -b.  All  bin  directories  and
                     all input must reside on the same filesystem.

                     After you are done processing the results of the bin, you
                     may safely delete  the  bins  without  deleting  original
                     data.  Alternatively, you could leave the bins and delete
                     the original data.  Either approach will be workable.

                     It is an error to attempt to  make  a  hard  link  across
                     filesystems,  or  to  have  two input files with the same
                     filename in different paths.   datapacker  will  exit  on
                     either of these situations.

                     See also --deep-links.

              symlink
                     Like hardlink, but create symlinks instead.  Symlinks can
                     span filesystems, but you will lose  information  if  you
                     remove the original (pre-bin) data.  Like hardlink, it is
                     an error to have a  single  filename  occur  in  multiple
                     input directories with this option.

                     See also --deep-links.

       -b FORMAT

       --binfmt=FORMAT
              Defines  the  output  format  for  the bin name.  This format is
              given as a  %d  input  to  a  function  that  interprets  it  as
              printf(3) would.  This can be useful both to define the name and
              the location of your bins.  When running datapacker with certain
              arguments,  the  bin  format  can  be taken to be a directory in
              which files in that bin are linked.  The default is %03d,  which
              outputs  integers  with  leading  zeros to make all bin names at
              least three characters wide.

              Other useful variants could include destdir/%d to put the string
              "destdir/" in front of the bin number, which is rendered without
              leading zeros.

       -d

       --debug
              Enable debug mode.  This is here for future expansion  and  does
              not currently have any effect.

       -D

       --deep-links
              When used with the symlink or hardlink action, instead of making
              all links in a single flat directory under the  bin,  mimic  the
              source directory structure under the bin.  Makes most sense when
              used with -p, but could also be useful without it if  there  are
              files with the same name in different source directories.

       --help Display brief usage information and exit.

       -p

       --preserve-order
              Normally,  datapacker  uses an efficient algorithm that tries to
              rearrange files  such  that  the  number  of  bins  required  is
              minimized.   Sometimes  you  may  instead  wish  to preserve the
              ordering of files at the expense of potentially using more bins.
              In these cases, you would want to use this option.

              As  an  example  of such a situation: perhaps you have taken one
              photo a day for several years.  You would like to archive  these
              photos  to  CD,  but you want them to be stored in chronological
              order.  You have named the files such that  the  names  indicate
              order,  so  you can pass the file list to datapacker using -p to
              preserve the ordering in your bins.  Thus, bin  1  will  contain
              the  oldest  files,  bin  2 the second-oldest, and so on.  If -p
              wasn’t used, you might use fewer CDs, but the  photos  would  be
              spread  out across all CDs without preserving your chronological
              order.

       -s SIZE

       --size=SIZE
              Gives the size of each bin in bytes.  Suffixes such as "k", "m",
              "g",   etc.  may  be  used  to  indicate  kilobytes,  megabytes,
              gigabytes, and so forth.  Numbers such as 1.5g are valid, and if
              needed, will be rounded to the nearest possible integer value.

              The size of the first bin may be overridden with -S.

              Here are the sizes of some commonly-used bins.  For each item, I
              have provided you with both the underlying recording capacity of
              the  disc and a suggested value for -s.  The suggested value for
              -s is lower  than  the  underlying  capacity  because  there  is
              overhead imposed by the filesystem stored on the disc.  You will
              perhaps find that the suggested  value  for  -s  is  lower  than
              optimal  for discs that contain few large files, and higher than
              desired for discs that contain vast amounts of small files.

              · CD-ROM, 74-minute (standard): 650m / 600m

              · CD-ROM, 80-minute: 703m / 650m

              · CD-ROM, 90-minute: 790m / 740m

              · CD-ROM, 99-minute: 870m / 820m

              · DVD+-R: 4.377g / 4g

              · DVD+R, dual layer: 8.5g / 8g

       -S

       --size-first
              The size of the first bin.  If not given, defaults to the  value
              given  with  -s.   This  may  be  useful  if you will be using a
              mechanism outside datapacker to add  additional  information  to
              the first bin: perhaps an index of which bin has which file, the
              information necessary to make a CD bootable, etc.  You  may  use
              the same suffixes as with -s with this option.

       --sort Sorts  the  list  of  files  to process before acting upon them.
              When combined with -p, causes the output  to  be  sorted.   This
              option has no effect save increasing CPU usage when not combined
              with -p.

   FILE SPECIFICATION
       After the options, you must supply one or more files  to  consider  for
       packing  into  bins.   Alternatively,  instead  of listing files on the
       command line, you may list a single hyphen (-), which tells  datapacker
       to read the list of files from standard input (stdin).

       datapacker never recurses into subdirectories.  If you want a recursive
       search  --  finding  all  files  in  a  given  directory  and  all  its
       subdirectories -- see the second example in the EXAMPLES section below.
       datapacker is designed to integrate with find(1) in this  situation  to
       let  you  take  advantage  of  find’s  built-in  powerful recursion and
       filtering features.

       When reading files from standard input, it is  assumed  that  the  list
       contains  one distinct filename per line.  Seasoned POSIX veterans will
       recognize the inherent limitations in this format.   For  that  reason,
       when  given  -0  in conjunction with the single file -, datapacker will
       instead expect, on standard input, a list of files, each one terminated
       by  an  ASCII NULL character.  Such a list can be easily generated with
       find(1) using its -print0 option.

EXAMPLES

       · Put all JPEG images in ~/Pictures into bins (using  hardlinks)  under
         the pre-existing directory ~/bins, no more than 600MB per bin:

         datapacker -b ~/bins/%03d -s 600m -a hardlink ~/Pictures/*.jpg

       · Put  all  files  in ~/Pictures or any subdirectory thereof into 600MB
         bins under ~/bins, using hardlinking.  This is a  simple  example  to
         follow if you simply want a recursive search of all files.

         find ~/Pictures -type f -print0 | \
           datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -

       · Find  all  JPEG images in ~/Pictures or any subdirectory thereof, put
         them into bins (using hardlinks)  under  the  pre-existing  directory
         ~/bins, no more than 600MB per bin:

         find ~/Pictures -name "*.jpg" -print0 | \
           datapacker -0 -b ~/bins/%03d -s 600m -a hardlink -

       · Find  all  JPEG images as above, put them in 4GB bins, but instead of
         putting them anywhere, calculate the size of each bin and display it.

         find ~/Pictures -name "*.jpg" -print0 | \
           datapacker -0 -b ~/bins/%03d -s 4g \
         ’--action=exec:echo -n "$1: "; shift; du -ch "$@" | grep total’ \
           -

         This will display output like so:

         /home/jgoerzen/bins/001: 4.0G   total
         /home/jgoerzen/bins/002: 4.0G   total
         /home/jgoerzen/bins/003: 4.0G   total
         /home/jgoerzen/bins/004: 992M   total

         Note:  the  grep  pattern  in  this example is simple, but will cause
         unexpected results if any matching file contains the word "total".

       · Find all JPEG images as above, and generate 600MB ISO images of  them
         in  ~/bins.   This will generate the ISO images directly without ever
         hardlinking files into ~/bins.

         find ~/Pictures -name "*.jpg" -print0 | \
           datapacker -0 -b ~/bins/%03d.iso -s 4g \
         ’--action=exec:BIN="$1"; shift; mkisofs -r -J -o "$BIN" "$@"’ \
           -

         You could, if you so desired, pipe this result directly into  a  DVD-
         burning  application.  Or, you could use growisofs to burn a DVD+R in
         a single step.

ERRORS

       It is an error if any specified file exceeds the value given with -s or
       -S.

       It  is  also an error if any specified files disappear while datapacker
       is running.

BUGS

       Reports of bugs should be reported online at the  datapacker  homepage.
       Debian  users  are  encouraged  to  instead use the Debian bug-tracking
       system.

COPYRIGHT

       datapacker, and this manual, are Copyright (C) 2008 John Goerzen.

       All code, documentation, and build  scripts  are  under  the  following
       license unless otherwise noted:

       This program is free software: you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published  by  the
       Free  Software Foundation, either version 3 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it  will  be  useful,  but
       WITHOUT   ANY   WARRANTY;   without   even   the  implied  warranty  of
       MERCHANTABILITY or FITNESS FOR  A  PARTICULAR  PURPOSE.   See  the  GNU
       General Public License for more details.

       You should have received a copy of the GNU General Public License along
       with this program.  If not, see
        <URL:http://www.gnu.org/licenses/>.

       The GNU General Public License is available in the file COPYING in  the
       source   distribution.    Debian  GNU/Linux  users  may  find  this  in
       /usr/share/common-licenses/GPL-3.

       If the GPL is unacceptable for your uses, please e-mail me; alternative
       terms can be negotiated for your project.

AUTHOR

       datapacker,  its  libraries,  documentation,  and  all  included files,
       except where noted, was written by John Goerzen <jgoerzen@complete.org>
       and copyright is held as stated in the COPYRIGHT section.

       datapacker  may be downloaded, and information found, from its homepage
       <URL:http://software.complete.org/datapacker>.

SEE ALSO

       mkisofs(1), genisoimage(1)