rdd-copy - copy a file, even if read errors occur

NAME

       rdd-copy - copy a file, even if read errors occur

SYNOPSIS

       rdd-copy [OPTION] src [dst]

       rdd-copy -C [CLIENT OPTION] src [host:]dst

       rdd-copy -S [SERVER OPTION]

DESCRIPTION

       Rdd-copy  is  a  file and device copying utility that includes features
       that are useful in a forensic environment.  In particular, rdd-copy can
       compute  cryptographic  hashes  over the data it copies, is robust with
       respect to read errors, and can copy data across a network.

       Rdd-copy is best understood as a program  that  consists  of  a  reader
       stage  and one or more processing stages.  The reader stage reads input
       data in a robust way.  It will retry failed reads.   If  a  read  error
       persists,  the  reader stage substitutes zero bytes for the input bytes
       that it  fails  to  read.   The  resulting  bytes  are  passed  to  all
       subsequent processing stages.

       The  processing  stages  are enabled through command-line options.  The
       current stages are: checksumming (Adler32 and CRC32), hashing (MD5  and
       SHA1), file output, network output, and statistics.

       Rdd-copy  can be run in local mode, in client mode, and in server mode.
       The mode is indicated by the first command-line argument.

       Copying data across a network requires two rdd-copy processes: a client
       process  that  reads  the  data  from  disk and transmits it across the
       network, and a server process that reads the data from the network  and
       writes it to a file or device.

LOCAL MODE

       In local mode, rdd-copy copies source file src to destination file dst,
       handling  read  errors  according  to  the  options.   If  dst  is  not
       specified,  the  data in src will be read and optionally hashed, but it
       will not be written.  To write to standard output, specify - as dst.

       Rdd-copy will optionally compute an MD5 or a SHA1 hash value  over  the
       input  bytes  and  the  zero  bytes it substitutes for blocks it cannot
       read.  These hash values should be interpreted with care (see below).

       Rdd-copy does NOT guarantee that the bytes it reads are the same  bytes
       that  are  stored  on  the  input medium.  It simply takes what read(2)
       returns.  Any hash values (see options) are  computed  over  the  bytes
       that read(2) returns or, if read(2) fails, over zero-valued fill bytes.

       Rdd-copy does NOT guarantee that the bytes that it  reads  into  memory
       (or the zero-valued bytes that it substitutes when a read error occurs)
       will be written to the output file correctly.  If you  wish  to  verify
       the  correspondence  between  what rdd-copy saw and what got written to
       disk, you will have to recompute the MD5 and/or SHA1 hash  values  over
       the  output file and compare them with the hash values reported by rdd-
       copy.  This is a useful verification step, but beware  that  even  this
       step  cannot  guarantee  perfect correspondence with the data stored on
       the source medium.

       The best end-to-end test is probably to read back the output  file  and
       compare  each  output byte to the corresponding input byte, unless that
       input byte was part of a block  for  which  rdd-copy  reported  a  read
       error.

       Rdd-copy  does  NOT recover from persisting write errors.  Rdd-copy was
       designed to handle unfriendly source media  only.   If  you  get  write
       errors, you should replace your target medium.

READ ERRORS

       In  local  mode and in client mode, rdd-copy reads from disk.  Rdd-copy
       assumes that the source disk may be faulty and tries to be robust  with
       respect  to  disk-read errors.  In server mode, rdd-copy reads from the
       network and makes no attempt to survive read errors.   The  explanation
       below  applies  only  to  read  errors  that occur in local mode and in
       client mode.

       When a read error occurs,  rdd-copy  reduces  the  block  size  to  the
       minimum  block  size (see --min-block-size) and resets the read pointer
       to the location at which it started the read that failed.

       Next, rdd-copy tries to read a  series  of  minimum-sized  blocks  (see
       --min-block-size).   When  such  a  read  fails,  it is retried a user-
       specified  number  of  times  (see  --nretry).   If  the  read  failure
       persists,  rdd-copy  normally  will skip a minimum-sized block of input
       data and will  write  a  minimum-sized  block  of  zero  bytes  to  the
       destination  file.   These zero bytes are also passed to all other rdd-
       copy processing stages (checksumming, hashing, and statistics).

       Any persistent read failure counts toward the maximum  number  of  read
       errors  that  the  user  will  tolerate  (see --max-read-err).  If this
       maximum is  reached,  rdd-copy  will  exit  immediately.   By  default,
       however, an infinite number of read errors is allowed.

       After  a read failure, rdd-copy continues to use the minimum block size
       to read data until it has read block-size bytes of data without errors.
       (block-size  is the user-specified block size, see --block-size.)  Only
       then will rdd-copy increase its block size again, doubling the size  at
       each successful read, until it reaches the default block size.

CLIENT MODE

       In  client  mode,  rdd-copy  operates as in local mode, except that the
       data will not be copied to a  file,  but  will  be  written  to  a  TCP
       connection to an rdd-copy server process.

       In  client mode, a destination file, dst, on a destination host must be
       specified.  If no host is specified, localhost will be used.

SERVER MODE

       In server mode, rdd-copy accepts one TCP connection  from  an  rdd-copy
       client.   The server process must be started before the client process.
       In server mode, rdd-copy will read data from a TCP connection and write
       it to a target file.  For now, the target file must always be specified
       by the client.  The main reason for this decision is to keep  open  the
       option  of  having  inetd(8)  or  xinetd(8)  start  an  rdd-copy server
       process.

OUTPUT

       Informative messages, error messages, and statistics are all written to
       stderr.

OPTIONS

       -C, --client
              Run  rdd-copy  in  client mode.  If you use this option, it must
              come first.

       -S, --server
              Run rdd-copy in server mode.  If you use this  option,  it  must
              come first.

       -p, --port <portnum>
              Modes: client, server.

              Specifies  the port number <portnum> at which the server listens
              for an incoming connection.  The default port is 4832.

       -?, --help
              Modes: all.

              Print a usage message that includes this list of options.

       -V, --version
              Modes: all.

              Print version information and exit

       -v, --verbose
              Modes: all.

              Be verbose.

       -q, --quiet
              Modes: all.

              Do not pose interactive questions.

       -l, --log-file <logfile>
              Modes: all.

              Log all messages except progress messages to <logfile>.

       -f, --force
              Modes: local, server.

              Force existing files to be overwritten.  The default behavior is
              to bail out when the output file already exists.

       -b, --block-size <size>
              Modes: local, client.

              Specify  the  default block size; <size> must be a power of two.
              While no read errors occur, rdd-copy will read and write  blocks
              of <size> bytes.

       -m, --min-block-size <size>
              Modes: local, client.

              Specify  the  minimum  read size; <size> must be a power of two.
              When a persistent read error occurs, at least this many bytes of
              data  will  be  skipped  and  replaced  with  zero  bytes in the
              destination file.

       -n, --nretry <count>
              Modes: local, client.

              Retry failed reads up to <count> times.  In many cases, using  a
              large  retry  value  makes  little  sense, because the operating
              system’s device driver will not indicate a failed read until  it
              has, itself, retried the read several times.

       -o, --offset <size>
              Modes: local, client.

              Skip  <size>  bytes  from  the  start  of  the input file before
              reading any data.  The  bytes  that  are  skipped  will  not  be
              included  in any hash computation and will not be written to the
              output file.

       -c, --count <size>
              Modes: local, client.

              Read at most <size> input bytes or read until end-of-file.

       -z, --compress
              Modes: client.

              Compress network data.

       -s, --split <size>
              Modes: local, server.

              If necessary, create multiple output files, none of  which  will
              be  larger than <size> bytes.  Each output file will have a name
              that consists of a sequence number followed by a  dash  and  the
              name specified on the command line.

       -r, --raw
              Modes: local, client.

              Access the device using the raw device. The data will not travel
              through the buffer cache.

       -P, --progress <sec>
              Modes: all.

              Report progress (bytes read  and  percentage  of  data  covered)
              every <sec> seconds.

       -M, --max-read-err <count>
              Modes: local, client.

              Give up after <count> read errors.

       --md5  Modes: all.

              Compute  an  MD5  hash value over all data that was read without
              errors and over the zero-filled blocks that are used to  replace
              bad blocks.

       --sha, --sha1
              Modes: all.

              Compute  a  SHA1  hash value over all data that was read without
              errors and over the zero-filled blocks that are used to  replace
              bad blocks.

       --checksum, --adler32 <file>
              Modes: all.

              Compute  an  Adler32 checksum value over blocks of data produced
              by the reader stage.  The last block to be  checksummed  may  be
              smaller  than  the  the  block  size that is used.  All checksum
              values are written to <file>.

       --checksum-block-size, --adler32-block-size <size>
              Modes: all.

              Compute Adler32 checksum values over data blocks with a size  of
              <size> bytes.  Only the last data block to be checksummed may be
              smaller than <size>.  The default block size is 32 Kbyte.

       --crc32 <file>
              Modes: all.

              Compute a CRC32 checksum value over blocks of data  produced  by
              the  reader  stage.   The  last  block  to be checksummed may be
              smaller than the the block size  that  is  used.   All  checksum
              values are written to <file>.

       --crc32-block-size <size>
              Modes: all.

              Compute  CRC32  checksum  values over data blocks with a size of
              <size> bytes.  Only the last data block to be checksummed may be
              smaller than <size>.  The default block size is 32 Kbyte.

       -H, --histogram <file>
              Modes: all.

              Compute  a  histogram  over  each  block of data produced by the
              reader stage.  The histogramming block size can be  set  by  the
              user  (see  --hist-block-size).   For each block, write a single
              text line of statistics to <file>.

       -h, --hist-block-size <size>
              Modes: all.

              Set the histogramming block size to <size> bytes.   The  default
              block size is 256 Kbyte.

       --block-md5 <file>
              Modes: all.

              Compute  the  MD5 hash value over blocks of data produced by the
              reader stage.  The last block to be hashed may be  smaller  than
              the block size.  All MD5 values are written to text file <file>.
              Each line in this file contains a block number,  followed  by  a
              space, followed by the hash value of the corresponding block.

       --block-md5-size <size>
              Modes: all.

              Sets  the  block  size  of  the block-wise MD5 computation.  The
              default block size is 4 Kbyte.

       A  <size>  argument  may  be  followed  by   one   of   the   following
       multiplicative  suffixes:  c  1, w 2, b 512, k 1024, M 1,048,576, and G
       1,073,741,824.

EXAMPLES

       rdd-copy --md5 /dev/hda1

              Compute and print the MD5 hash value over /dev/hda1.  On  Linux,
              /dev/hda1  denotes  the  first  partition  of the primary master
              disk.

       rdd-copy -b 16k -m 512 -l rdd-log.txt /dev/fd0 f.img

              Create an image of a floppy disk (/dev/fd0).  Copy 16 Kbyte at a
              time,  but  use  blocks  as small as a single sector (512 bytes)
              when read errors occur. Write all log messages to the file  rdd-
              log.txt.

       On the server: rdd-copy -S --sha1

       On the client: rdd-copy -C --sha1 /dev/hdb snake:/images/disk.img

              Copy  the primary slave disk to host snake and store the data in
              file /images/disk.img.  The client host  computes  a  SHA1  hash
              over the data it reads from the disk; the server host computes a
              SHA1 hash over the data it receives from the network.

       rdd-copy --count 512 /dev/hda mbr.img

              Copy the master boot record (MBR) from the primary  master  disk
              to file mbr.img.

NOTES

       If  you  encounter  read  errors,  do examine /var/log/messages (or the
       equivalent file on your  operating  system).   It  may  contain  useful
       device driver error messages.

       On  Linux  (kernel 2.4 and lower) rdd-copy and other programs that read
       from a block device may yield an I/O error when they reach the  end  of
       the device, even if there’s nothing wrong with the device.  To the best
       of my knowledge, this is  a  Linux  problem  rather  than  an  rdd-copy
       problem;  the  same problem occurs with GNU dd-copy and other programs.
       The   problem    is    described    in    the    following    document:
       http://www.cftt.nist.gov/Notes_on_dd_and_Odd_Sized_Disks4.doc.      The
       problem has apparently been solved in the Linux 2.6 kernel.

       If you use rdd-copy to access a device, consider using the  raw  device
       (see  raw(8)).   This way, your data will not travel through the buffer
       cache.

BUGS

       Server-side errors are not reported back to  the  client.   Users  must
       watch the server’s output.

REPORTING BUGS

       Report bugs to <rdd@holmes.nl>.

ACKNOWLEDGEMENTS

       Many  thanks  to all who reported bugs and successes, and who suggested
       improvements.  You know who you are.

COPYRIGHT

       Copyright © 2002-2003 Netherlands Forensic Institute
       This software comes with NO warranty; not even for  MERCHANTABILITY  or
       FITNESS FOR A PARTICULAR PURPOSE.

HISTORY

       Up  to version 1.2-7a rdd-copy (then called rdd) used a different error
       recovery strategy.  With the new strategy, users can no longer set  the
       recovery threshold, so the --recovery-len option has been retired.

NAME

SYNOPSIS

DESCRIPTION

LOCAL MODE

READ ERRORS

CLIENT MODE

SERVER MODE

OUTPUT

OPTIONS

EXAMPLES

SEE ALSO

NOTES

BUGS

REPORTING BUGS

ACKNOWLEDGEMENTS

COPYRIGHT

HISTORY