NAME
rdd-copy - copy a file, even if read errors occur
SYNOPSIS
rdd-copy [OPTION] src [dst]
rdd-copy -C [CLIENT OPTION] src [host:]dst
rdd-copy -S [SERVER OPTION]
DESCRIPTION
Rdd-copy is a file and device copying utility that includes features
that are useful in a forensic environment. In particular, rdd-copy can
compute cryptographic hashes over the data it copies, is robust with
respect to read errors, and can copy data across a network.
Rdd-copy is best understood as a program that consists of a reader
stage and one or more processing stages. The reader stage reads input
data in a robust way. It will retry failed reads. If a read error
persists, the reader stage substitutes zero bytes for the input bytes
that it fails to read. The resulting bytes are passed to all
subsequent processing stages.
The processing stages are enabled through command-line options. The
current stages are: checksumming (Adler32 and CRC32), hashing (MD5 and
SHA1), file output, network output, and statistics.
Rdd-copy can be run in local mode, in client mode, and in server mode.
The mode is indicated by the first command-line argument.
Copying data across a network requires two rdd-copy processes: a client
process that reads the data from disk and transmits it across the
network, and a server process that reads the data from the network and
writes it to a file or device.
LOCAL MODE
In local mode, rdd-copy copies source file src to destination file dst,
handling read errors according to the options. If dst is not
specified, the data in src will be read and optionally hashed, but it
will not be written. To write to standard output, specify - as dst.
Rdd-copy will optionally compute an MD5 or a SHA1 hash value over the
input bytes and the zero bytes it substitutes for blocks it cannot
read. These hash values should be interpreted with care (see below).
Rdd-copy does NOT guarantee that the bytes it reads are the same bytes
that are stored on the input medium. It simply takes what read(2)
returns. Any hash values (see options) are computed over the bytes
that read(2) returns or, if read(2) fails, over zero-valued fill bytes.
Rdd-copy does NOT guarantee that the bytes that it reads into memory
(or the zero-valued bytes that it substitutes when a read error occurs)
will be written to the output file correctly. If you wish to verify
the correspondence between what rdd-copy saw and what got written to
disk, you will have to recompute the MD5 and/or SHA1 hash values over
the output file and compare them with the hash values reported by rdd-
copy. This is a useful verification step, but beware that even this
step cannot guarantee perfect correspondence with the data stored on
the source medium.
The best end-to-end test is probably to read back the output file and
compare each output byte to the corresponding input byte, unless that
input byte was part of a block for which rdd-copy reported a read
error.
Rdd-copy does NOT recover from persisting write errors. Rdd-copy was
designed to handle unfriendly source media only. If you get write
errors, you should replace your target medium.
READ ERRORS
In local mode and in client mode, rdd-copy reads from disk. Rdd-copy
assumes that the source disk may be faulty and tries to be robust with
respect to disk-read errors. In server mode, rdd-copy reads from the
network and makes no attempt to survive read errors. The explanation
below applies only to read errors that occur in local mode and in
client mode.
When a read error occurs, rdd-copy reduces the block size to the
minimum block size (see --min-block-size) and resets the read pointer
to the location at which it started the read that failed.
Next, rdd-copy tries to read a series of minimum-sized blocks (see
--min-block-size). When such a read fails, it is retried a user-
specified number of times (see --nretry). If the read failure
persists, rdd-copy normally will skip a minimum-sized block of input
data and will write a minimum-sized block of zero bytes to the
destination file. These zero bytes are also passed to all other rdd-
copy processing stages (checksumming, hashing, and statistics).
Any persistent read failure counts toward the maximum number of read
errors that the user will tolerate (see --max-read-err). If this
maximum is reached, rdd-copy will exit immediately. By default,
however, an infinite number of read errors is allowed.
After a read failure, rdd-copy continues to use the minimum block size
to read data until it has read block-size bytes of data without errors.
(block-size is the user-specified block size, see --block-size.) Only
then will rdd-copy increase its block size again, doubling the size at
each successful read, until it reaches the default block size.
CLIENT MODE
In client mode, rdd-copy operates as in local mode, except that the
data will not be copied to a file, but will be written to a TCP
connection to an rdd-copy server process.
In client mode, a destination file, dst, on a destination host must be
specified. If no host is specified, localhost will be used.
SERVER MODE
In server mode, rdd-copy accepts one TCP connection from an rdd-copy
client. The server process must be started before the client process.
In server mode, rdd-copy will read data from a TCP connection and write
it to a target file. For now, the target file must always be specified
by the client. The main reason for this decision is to keep open the
option of having inetd(8) or xinetd(8) start an rdd-copy server
process.
OUTPUT
Informative messages, error messages, and statistics are all written to
stderr.
OPTIONS
-C, --client
Run rdd-copy in client mode. If you use this option, it must
come first.
-S, --server
Run rdd-copy in server mode. If you use this option, it must
come first.
-p, --port <portnum>
Modes: client, server.
Specifies the port number <portnum> at which the server listens
for an incoming connection. The default port is 4832.
-?, --help
Modes: all.
Print a usage message that includes this list of options.
-V, --version
Modes: all.
Print version information and exit
-v, --verbose
Modes: all.
Be verbose.
-q, --quiet
Modes: all.
Do not pose interactive questions.
-l, --log-file <logfile>
Modes: all.
Log all messages except progress messages to <logfile>.
-f, --force
Modes: local, server.
Force existing files to be overwritten. The default behavior is
to bail out when the output file already exists.
-b, --block-size <size>
Modes: local, client.
Specify the default block size; <size> must be a power of two.
While no read errors occur, rdd-copy will read and write blocks
of <size> bytes.
-m, --min-block-size <size>
Modes: local, client.
Specify the minimum read size; <size> must be a power of two.
When a persistent read error occurs, at least this many bytes of
data will be skipped and replaced with zero bytes in the
destination file.
-n, --nretry <count>
Modes: local, client.
Retry failed reads up to <count> times. In many cases, using a
large retry value makes little sense, because the operating
system’s device driver will not indicate a failed read until it
has, itself, retried the read several times.
-o, --offset <size>
Modes: local, client.
Skip <size> bytes from the start of the input file before
reading any data. The bytes that are skipped will not be
included in any hash computation and will not be written to the
output file.
-c, --count <size>
Modes: local, client.
Read at most <size> input bytes or read until end-of-file.
-z, --compress
Modes: client.
Compress network data.
-s, --split <size>
Modes: local, server.
If necessary, create multiple output files, none of which will
be larger than <size> bytes. Each output file will have a name
that consists of a sequence number followed by a dash and the
name specified on the command line.
-r, --raw
Modes: local, client.
Access the device using the raw device. The data will not travel
through the buffer cache.
-P, --progress <sec>
Modes: all.
Report progress (bytes read and percentage of data covered)
every <sec> seconds.
-M, --max-read-err <count>
Modes: local, client.
Give up after <count> read errors.
--md5 Modes: all.
Compute an MD5 hash value over all data that was read without
errors and over the zero-filled blocks that are used to replace
bad blocks.
--sha, --sha1
Modes: all.
Compute a SHA1 hash value over all data that was read without
errors and over the zero-filled blocks that are used to replace
bad blocks.
--checksum, --adler32 <file>
Modes: all.
Compute an Adler32 checksum value over blocks of data produced
by the reader stage. The last block to be checksummed may be
smaller than the the block size that is used. All checksum
values are written to <file>.
--checksum-block-size, --adler32-block-size <size>
Modes: all.
Compute Adler32 checksum values over data blocks with a size of
<size> bytes. Only the last data block to be checksummed may be
smaller than <size>. The default block size is 32 Kbyte.
--crc32 <file>
Modes: all.
Compute a CRC32 checksum value over blocks of data produced by
the reader stage. The last block to be checksummed may be
smaller than the the block size that is used. All checksum
values are written to <file>.
--crc32-block-size <size>
Modes: all.
Compute CRC32 checksum values over data blocks with a size of
<size> bytes. Only the last data block to be checksummed may be
smaller than <size>. The default block size is 32 Kbyte.
-H, --histogram <file>
Modes: all.
Compute a histogram over each block of data produced by the
reader stage. The histogramming block size can be set by the
user (see --hist-block-size). For each block, write a single
text line of statistics to <file>.
-h, --hist-block-size <size>
Modes: all.
Set the histogramming block size to <size> bytes. The default
block size is 256 Kbyte.
--block-md5 <file>
Modes: all.
Compute the MD5 hash value over blocks of data produced by the
reader stage. The last block to be hashed may be smaller than
the block size. All MD5 values are written to text file <file>.
Each line in this file contains a block number, followed by a
space, followed by the hash value of the corresponding block.
--block-md5-size <size>
Modes: all.
Sets the block size of the block-wise MD5 computation. The
default block size is 4 Kbyte.
A <size> argument may be followed by one of the following
multiplicative suffixes: c 1, w 2, b 512, k 1024, M 1,048,576, and G
1,073,741,824.
EXAMPLES
rdd-copy --md5 /dev/hda1
Compute and print the MD5 hash value over /dev/hda1. On Linux,
/dev/hda1 denotes the first partition of the primary master
disk.
rdd-copy -b 16k -m 512 -l rdd-log.txt /dev/fd0 f.img
Create an image of a floppy disk (/dev/fd0). Copy 16 Kbyte at a
time, but use blocks as small as a single sector (512 bytes)
when read errors occur. Write all log messages to the file rdd-
log.txt.
On the server: rdd-copy -S --sha1
On the client: rdd-copy -C --sha1 /dev/hdb snake:/images/disk.img
Copy the primary slave disk to host snake and store the data in
file /images/disk.img. The client host computes a SHA1 hash
over the data it reads from the disk; the server host computes a
SHA1 hash over the data it receives from the network.
rdd-copy --count 512 /dev/hda mbr.img
Copy the master boot record (MBR) from the primary master disk
to file mbr.img.
SEE ALSO
rdd-verify(1), raw(8)
NOTES
If you encounter read errors, do examine /var/log/messages (or the
equivalent file on your operating system). It may contain useful
device driver error messages.
On Linux (kernel 2.4 and lower) rdd-copy and other programs that read
from a block device may yield an I/O error when they reach the end of
the device, even if there’s nothing wrong with the device. To the best
of my knowledge, this is a Linux problem rather than an rdd-copy
problem; the same problem occurs with GNU dd-copy and other programs.
The problem is described in the following document:
http://www.cftt.nist.gov/Notes_on_dd_and_Odd_Sized_Disks4.doc. The
problem has apparently been solved in the Linux 2.6 kernel.
If you use rdd-copy to access a device, consider using the raw device
(see raw(8)). This way, your data will not travel through the buffer
cache.
BUGS
Server-side errors are not reported back to the client. Users must
watch the server’s output.
REPORTING BUGS
Report bugs to <rdd@holmes.nl>.
ACKNOWLEDGEMENTS
Many thanks to all who reported bugs and successes, and who suggested
improvements. You know who you are.
COPYRIGHT
Copyright © 2002-2003 Netherlands Forensic Institute
This software comes with NO warranty; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.
HISTORY
Up to version 1.2-7a rdd-copy (then called rdd) used a different error
recovery strategy. With the new strategy, users can no longer set the
recovery threshold, so the --recovery-len option has been retired.