fio - flexible I/O tester
fio [options] [jobfile]...
fio is a tool that will spawn a number of threads or processes doing a
particular type of I/O action as specified by the user. The typical
use of fio is to write a job file matching the I/O load one wants to
Write output to filename.
Limit run time to timeout seconds.
Generate per-job latency logs.
Generate per-job bandwidth logs.
Print statistics in a terse, semicolon-delimited format.
Convert jobfile to a set of command-line options.
Enable read-only safety checks.
Specifies when real-time ETA estimate should be printed. when
may be one of ‘always’, ‘never’ or ‘auto’.
Only run section sec from job file.
Print help information for command. May be ‘all’ for all
Enable verbose tracing of various fio actions. May be ‘all’ for
all types or individual types seperated by a comma (eg
--debug=io,file). ‘help’ will list all available tracing
--help Display usage information and exit.
Display version information and exit.
JOB FILE FORMAT
Job files are in ‘ini’ format. They consist of one or more job
definitions, which begin with a job name in square brackets and extend
to the next job name. The job name can be any ASCII string except
‘global’, which has a special meaning. Following the job name is a
sequence of zero or more parameters, one per line, that define the
behavior of the job. Any line starting with a ‘;’ or ‘#’ character is
considered a comment and ignored.
If jobfile is specified as ‘-’, the job file will be read from standard
The global section contains default parameters for jobs specified in
the job file. A job is only affected by global sections residing above
it, and there may be any number of global sections. Specific job
definitions may override any parameter set in global sections.
Some parameters may take arguments of a specific type. The types used
str String: a sequence of alphanumeric characters.
int SI integer: a whole number, possibly containing a suffix
denoting the base unit of the value. Accepted suffixes are ‘k’,
’M’, ’G’, ’T’, and ’P’, denoting kilo (1024), mega (1024^2),
giga (1024^3), tera (1024^4), and peta (1024^5) respectively.
The suffix is not case sensitive. If prefixed with ’0x’, the
value is assumed to be base 16 (hexadecimal).
bool Boolean: a true or false value. ‘0’ denotes false, ‘1’ denotes
irange Integer range: a range of integers specified in the format
lower:upper or lower-upper. lower and upper may contain a suffix
as described above. If an option allows two sets of ranges,
they are separated with a ‘,’ or ‘/’ character. For example:
May be used to override the job name. On the command line, this
parameter has the special purpose of signalling the start of a
Human-readable description of the job. It is printed when the
job is run, but otherwise has no special purpose.
Prefix filenames with this directory. Used to place files in a
location other than ‘./’.
fio normally makes up a file name based on the job name, thread
number, and file number. If you want to share files between
threads in a job or several jobs, specify a filename for each of
them to override the default. If the I/O engine used is ‘net’,
filename is the host and port to connect to in the format
host/port. If the I/O engine is file-based, you can specify a
number of files by separating the names with a ‘:’ character.
‘-’ is a reserved name, meaning stdin or stdout, depending on
the read/write direction set.
Fio defaults to not locking any files before it does IO to them.
If a file or file descriptor is shared, fio can serialize IO to
that file to make the end result consistent. This is usual for
emulating real workloads that share files. The lock modes are:
none No locking. This is the default.
Only one thread or process may do IO at the time,
excluding all others.
Read-write locking on the file. Many readers may
access the file at the same time, but writes get
The option may be post-fixed with a lock batch number. If set,
then each thread/process may do that amount of IOs to the file
before giving up the lock. Since lock acquisition is expensive,
batching the lock/unlocks will speed up IO.
opendir=str Recursively open any files below directory str.
Type of I/O pattern. Accepted values are:
read Sequential reads.
write Sequential writes.
rw Mixed sequential reads and writes.
randrw Mixed random reads and writes.
For mixed I/O, the default split is 50/50. For random I/O, the
number of I/Os to perform before getting a new offset can be
specified by appending ‘:int’ to the pattern type. The default
The base unit for a kilobyte. The defacto base is 2^10, 1024.
Storage manufacturers like to use 10^3 or 1000 as a base ten
unit instead, for obvious reasons. Allow values are 1024 or
1000, with 1024 being the default.
Seed the random number generator in a predictable way so results
are repeatable across runs. Default: true.
By default, fio will use fallocate() to advise the system of the
size of the file we are going to write. This can be turned off
with fallocate=0. May not be available on all supported
Disable use of posix_fadvise(2) to advise the kernel what I/O
patterns are likely to be issued. Default: true.
Total size of I/O for this job. fio will run until this many
bytes have been transfered, unless limited by other options
(runtime, for instance). Unless nr_files and filesize options
are given, this amount will be divided between the available
files for the job.
Sets size to something really large and waits for ENOSPC (no
space left on device) as the terminating condition. Only makes
sense with sequential write. For a read workload, the mount
point will be filled first then IO started on the result.
Individual file sizes. May be a range, in which case fio will
select sizes for files at random within the given range, limited
to size in total (if that is given). If filesize is not
specified, each created file is the same size.
Block size for I/O units. Default: 4k. Values for reads and
writes can be specified seperately in the format read,write,
either of which may be empty to leave that value at its default.
Specify a range of I/O block sizes. The issued I/O unit will
always be a multiple of the minimum size, unless
blocksize_unaligned is set. Applies to both reads and writes if
only one range is given, but can be specified seperately with a
comma seperating the values. Example: bsrange=1k-4k,2k-8k. Also
This option allows even finer grained control of the block sizes
issued, not just even splits between them. With this option, you
can weight various block sizes for exact control of the issued
IO for a job that has mixed block sizes. The format of the
option is bssplit=blocksize/percentage, optionally adding as
many definitions as needed seperated by a colon. Example:
bssplit=4k/10:64k/50:32k/40 would issue 50% 64k blocks, 10% 4k
blocks and 40% 32k blocks. bssplit also supports giving separate
splits to reads and writes. The format is identical to what the
bs option accepts, the read and write parts are separated with a
If set, any size in blocksize_range may be used. This typically
won’t work with direct I/O, as that normally requires sector
At what boundary to align random IO offsets. Defaults to the
same as ’blocksize’ the minimum blocksize given. Minimum
alignment is typically 512b for using direct IO, though it
usually depends on the hardware block size. This option is
mutually exclusive with using a random map for files, so it will
turn off that option.
Initialise buffers with all zeros. Default: fill buffers with
If this option is given, fio will refill the IO buffers on every
submit. The default is to only fill it at init time and reuse
that data. Only makes sense if zero_buffers isn’t specified,
naturally. If data verification is enabled, refill_buffers is
also automatically enabled.
Number of files to use for this job. Default: 1.
Number of files to keep open at the same time. Default:
Defines how files to service are selected. The following types
random Choose a file at random
Round robin over open files (default).
sequential Do each file in the set sequentially.
The number of I/Os to issue before switching a new file can be
specified by appending ‘:int’ to the service type.
Defines how the job issues I/O. The following types are
sync Basic read(2) or write(2) I/O. fseek(2) is used
to position the I/O location.
psync Basic pread(2) or pwrite(2) I/O.
vsync Basic readv(2) or writev(2) I/O. Will emulate
queuing by coalescing adjacents IOs into a single
libaio Linux native asynchronous I/O.
glibc POSIX asynchronous I/O using aio_read(3)
mmap File is memory mapped with mmap(2) and data
copied using memcpy(3).
splice splice(2) is used to transfer the data and
vmsplice(2) to transfer data from user-space to
Use the syslet system calls to make regular
sg SCSI generic sg v3 I/O. May be either synchronous
using the SG_IO ioctl, or if the target is an sg
character device, we use read(2) and write(2) for
null Doesn’t transfer any data, just pretends to.
Mainly used to exercise fio itself and for
debugging and testing purposes.
net Transfer over the network. filename must be set
appropriately to ‘host/port’ regardless of data
direction. If receiving, only the port argument
Like net, but uses splice(2) and vmsplice(2) to
map data and send/receive.
cpuio Doesn’t transfer any data, but burns CPU cycles
according to cpuload and cpucycles parameters.
guasi The GUASI I/O engine is the Generic Userspace
Asynchronous Syscall Interface approach to
Loads an external I/O engine object file. Append
the engine filename as ‘:enginepath’.
Number of I/O units to keep in flight against the file.
Number of I/Os to submit at once. Default: iodepth.
This defines how many pieces of IO to retrieve at once. It
defaults to 1 which
means that we’ll ask for a minimum of 1 IO in the retrieval
process from the kernel. The IO retrieval will go on until we
hit the limit set by iodepth_low. If this variable is set to 0,
then fio will always check for completed events before queuing
more IO. This helps reduce IO latency, at the cost of more
retrieval system calls.
Low watermark indicating when to start filling the queue again.
If true, use non-buffered I/O (usually O_DIRECT). Default:
If true, use buffered I/O. This is the opposite of the direct
parameter. Default: true.
Offset in the file to start I/O. Data before the offset will not
How many I/Os to perform before issuing an fsync(2) of dirty
data. If 0, don’t sync. Default: 0.
Like fsync, but uses fdatasync(2) instead to only sync the data
parts of the file. Default: 0.
Use sync_file_range() for every val number of write operations.
Fio will track range of writes that have happened since the last
sync_file_range() call. str can currently be one or more of:
So if you do sync_file_range=wait_before,write:8, fio
SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE for every 8
writes. Also see the sync_file_range(2) man page. This option
is Linux specific.
If writing, setup the file first and do overwrites. Default:
Sync file contents when job exits. Default: false.
If true, sync file contents on close. This differs from
end_fsync in that it will happen on every close, not just at the
end of the job. Default: false.
How many milliseconds before switching between reads and writes
for a mixed workload. Default: 500ms.
Percentage of a mixed workload that should be reads. Default:
Percentage of a mixed workload that should be writes. If
rwmixread and rwmixwrite are given and do not sum to 100%, the
latter of the two overrides the first. This may interfere with a
given rate setting, if fio is asked to limit reads or writes to
a certain rate. If that is the case, then the distribution may
be skewed. Default: 50.
Normally fio will cover every block of the file when doing
random I/O. If this parameter is given, a new offset will be
chosen without looking at past I/O history. This parameter is
mutually exclusive with verify.
See norandommap. If fio runs with the random block map enabled
and it fails to allocate the map, if this option is set it will
continue without a random block map. As coverage will not be as
complete as with random maps, this option is disabled by
Run job with given nice value. See nice(2).
Set I/O priority value of this job between 0 (highest) and 7
(lowest). See ionice(1).
Set I/O priority class. See ionice(1).
Stall job for given number of microseconds between issuing I/Os.
Pretend to spend CPU time for given number of microseconds,
sleeping the rest of the time specified by thinktime. Only
valid if thinktime is set.
Number of blocks to issue before waiting thinktime microseconds.
Cap bandwidth used by this job. The number is in bytes/sec, the
normal postfix rules apply. You can use rate=500k to limit reads
and writes to 500k each, or you can specify read and writes
separately. Using rate=1m,500k would limit reads to 1MB/sec and
writes to 500KB/sec. Capping only reads or writes can be done
with rate=,500k or rate=500k,. The former will only limit writes
(to 500KB/sec), the latter will only limit reads.
Tell fio to do whatever it can to maintain at least the given
bandwidth. Failing to meet this requirement will cause the job
to exit. The same format as rate is used for read vs write
Cap the bandwidth to this number of IOPS. Basically the same as
rate, just specified independently of bandwidth. The same format
as rate is used for read vs write seperation. If blocksize is a
range, the smallest block size is used as the metric.
If this rate of I/O is not met, the job will exit. The same
format as rate is used for read vs write seperation.
Average bandwidth for rate and ratemin over this number of
milliseconds. Default: 1000ms.
Set CPU affinity for this job. int is a bitmask of allowed CPUs
the job may run on. See sched_setaffinity(2).
Same as cpumask, but allows a comma-delimited list of CPU
Delay start of job for the specified number of seconds.
Terminate processing after the specified number of seconds.
If given, run for the specified runtime duration even if the
files are completely read or written. The same workload will be
repeated as many times as runtime allows.
If set, fio will run the specified workload for this amount of
time before logging any performance numbers. Useful for letting
performance settle before logging results, thus minimizing the
runtime required for stable results. Note that the ramp_time is
considered lead in time for a job, thus it will increase the
total runtime if a special timeout or runtime is specified.
Invalidate buffer-cache for the file prior to starting I/O.
Use synchronous I/O for buffered writes. For the majority of
I/O engines, this means using O_SYNC. Default: false.
Allocation method for I/O unit buffer. Allowed values are:
malloc Allocate memory with malloc(3).
shm Use shared memory buffers allocated through
Same as shm, but use huge pages as backing.
mmap Use mmap(2) for allocation. Uses anonymous
memory unless a filename is given after the
option in the format ‘:file’.
Same as mmap, but use huge files as backing.
The amount of memory allocated is the maximum allowed blocksize
for the job multiplied by iodepth. For shmhuge or mmaphuge to
work, the system must have free huge pages allocated. mmaphuge
also needs to have hugetlbfs mounted, and file must point there.
At least on Linux, huge pages must be manually allocated. See
/proc/sys/vm/nr_hugehages and the documentation for that.
Normally you just need to echo an appropriate number, eg echoing
8 will ensure that the OS has 8 huge pages ready for use.
This indiciates the memory alignment of the IO memory buffers.
Note that the given alignment is applied to the first IO unit
buffer, if using iodepth the alignment of the following buffers
are given by the bs used. In other words, if using a bs that is
a multiple of the page sized in the system, all buffers will be
aligned to this value. If using a bs that is not page aligned,
the alignment of subsequent IO memory buffers is the sum of the
iomem_align and bs used.
Defines the size of a huge page. Must be at least equal to the
system setting. Should be a multiple of 1MB. Default: 4MB.
Terminate all jobs when one finishes. Default: wait for each
job to finish.
Average bandwidth calculations over the given time in
milliseconds. Default: 500ms.
If true, serialize file creation for the jobs. Default: true.
fsync(2) data file after creation. Default: true.
If true, the files are not created until they are opened for IO
by the job.
If this is given, files will be pre-read into memory before
starting the given IO operation. This will also clear the
invalidate flag, since it is pointless to pre-read and then drop
the cache. This will only work for IO engines that are seekable,
since they allow you to read the same data multiple times. Thus
it will not work on eg network or splice IO.
Unlink job files when done. Default: false.
Specifies the number of iterations (runs of the same workload)
of this job. Default: 1.
Run the verify phase after a write phase. Only valid if verify
is set. Default: true.
Method of verifying file contents after each iteration of the
job. Allowed values are:
md5 crc16 crc32 crc32c crc32c-intel crc64 crc7 sha256
Store appropriate checksum in the header of each
meta Write extra information about each I/O
(timestamp, block number, etc.). The block number
Fill I/O buffers with a specific pattern that is
used to verify. If the pattern is < 4bytes, it
can either be a decimal or a hexadecimal number.
If the pattern is > 4bytes, currently, it can
only be a hexadecimal pattern starting with
either "0x" or "0X".
null Pretend to verify. Used for testing internals.
This option can be used for repeated burn-in tests of a system
to make sure that the written data is also correctly read back.
If the data direction given is a read or random read, fio will
assume that it should verify a previously written file. If the
data direction includes any form of write, the verify will be of
the newly written data.
If true, written verify blocks are sorted if fio deems it to be
faster to read them back in a sorted manner. Default: true.
Swap the verification header with data somewhere else in the
block before writing. It is swapped back before verifying.
Write the verification header for this number of bytes, which
should divide blocksize. Default: blocksize.
If true, exit the job on the first observed verification
failure. Default: false.
Fio will normally verify IO inline from the submitting thread.
This option takes an integer describing how many async offload
threads to create for IO verification instead, causing fio to
offload the duty of verifying IO contents to one or more
separate threads. If using this offload option, even sync IO
engines can benefit from using an iodepth setting higher than 1,
as it allows them to have IO in flight while verifies are
Tell fio to set the given CPU affinity on the async IO
verification threads. See cpus_allowed for the format used.
Wait for preceeding jobs in the job file to exit before starting
this one. stonewall implies new_group.
Start a new reporting group. If not given, all jobs in a file
will be part of the same reporting group, unless separated by a
Number of clones (processes/threads performing the same
workload) of this job. Default: 1.
If set, display per-group reports instead of per-job when
numjobs is specified.
thread Use threads created with pthread_create(3) instead of processes
created with fork(2).
Divide file into zones of the specified size in bytes. See
Skip the specified number of bytes when zonesize bytes of data
have been read.
Write the issued I/O patterns to the specified file.
Replay the I/O patterns contained in the specified file
generated by write_iolog, or may be a blktrace binary file.
If given, write a bandwidth log of the jobs in this job file.
Can be used to store data of the bandwidth of the jobs in their
lifetime. The included fio_generate_plots script uses gnuplot to
turn these text files into nice graphs. See write_log_log for
behaviour of given filename. For this option, the postfix is
Same as write_bw_log, but writes I/O completion latencies. If
no filename is given with this option, the default filename of
"jobname_type.log" is used. Even if the filename is given, fio
will still append the type of log.
Disable measurements of completion latency numbers. Useful only
for cutting back the number of calls to gettimeofday, as that
does impact performance at really high IOPS rates. Note that to
really get rid of a large amount of these calls, this option
must be used with disable_slat and disable_bw as well.
Disable measurements of submission latency numbers. See
Disable measurements of throughput/bandwidth numbers. See
Pin the specified amount of memory with mlock(2). Can be used
to simulate a smaller amount of memory.
Before running the job, execute the specified command with
Same as exec_prerun, but the command is executed after the job
Attempt to switch the device hosting the file to the specified
If the job is a CPU cycle-eater, attempt to use the specified
percentage of CPU cycles.
If the job is a CPU cycle-eater, split the load into cycles of
the given time in milliseconds.
Generate disk utilization statistics if the platform supports
it. Default: true.
Enable all of the gettimeofday() reducing options (disable_clat,
disable_slat, disable_bw) plus reduce precision of the timeout
somewhat to really shrink the gettimeofday() call count. With
this option enabled, we only do about 0.4% of the gtod() calls
we would have done if all time keeping was enabled.
Sometimes it’s cheaper to dedicate a single thread of execution
to just getting the current time. Fio (and databases, for
instance) are very intensive on gettimeofday() calls. With this
option, you can set one CPU aside for doing nothing but logging
current time to a shared memory location. Then the other
threads/processes that run IO workloads need only copy that
segment, instead of entering the kernel with a gettimeofday()
call. The CPU set aside for doing these time calls will be
excluded from other uses. Fio will manually clear it from the
CPU mask of other jobs.
Add job to this control group. If it doesn’t exist, it will be
created. The system must have a mounted cgroup blkio mount
point for this to work. If your system doesn’t have it mounted,
you can do so with:
# mount -t cgroup -o blkio none /cgroup
Set the weight of the cgroup to this value. See the
documentation that comes with the kernel, allowed values are in
the range of 100..1000.
Instead of running as the invoking user, set the user ID to this
value before the thread/process does any work.
Set group ID, see uid.
While running, fio will display the status of the created jobs. For
Threads: 1: [_r] [24.8% done] [ 13509/ 8334 kb/s] [eta
The characters in the first set of brackets denote the current status
of each threads. The possible values are:
P Setup but not started.
C Thread created.
I Initialized, waiting.
R Running, doing sequential reads.
r Running, doing random reads.
W Running, doing sequential writes.
w Running, doing random writes.
M Running, doing mixed sequential reads/writes.
m Running, doing mixed random reads/writes.
F Running, currently waiting for fsync(2).
V Running, verifying written data.
E Exited, not reaped by main thread.
- Exited, thread reaped.
The second set of brackets shows the estimated completion percentage of
the current group. The third set shows the read and write I/O rate,
respectively. Finally, the estimated run time of the job is displayed.
When fio completes (or is interrupted by Ctrl-C), it will show data for
each thread, each group of threads, and each disk, in that order.
Per-thread statistics first show the threads client number, group-id,
and error code. The remaining figures are as follows:
io Number of megabytes of I/O performed.
bw Average data rate (bandwidth).
runt Threads run time.
slat Submission latency minimum, maximum, average and standard
deviation. This is the time it took to submit the I/O.
clat Completion latency minimum, maximum, average and standard
deviation. This is the time between submission and
bw Bandwidth minimum, maximum, percentage of aggregate
bandwidth received, average and standard deviation.
cpu CPU usage statistics. Includes user and system time,
number of context switches this thread went through and
number of major and minor page faults.
Distribution of I/O depths. Each depth includes
everything less than (or equal) to it, but greater than
the previous depth.
Number of read/write requests issued, and number of short
Distribution of I/O completion latencies. The numbers
follow the same pattern as IO depths.
The group statistics show:
io Number of megabytes I/O performed.
aggrb Aggregate bandwidth of threads in the group.
minb Minimum average bandwidth a thread saw.
maxb Maximum average bandwidth a thread saw.
mint Shortest runtime of threads in the group.
maxt Longest runtime of threads in the group.
Finally, disk statistics are printed with reads first:
ios Number of I/Os performed by all groups.
merge Number of merges in the I/O scheduler.
ticks Number of ticks we kept the disk busy.
Total time spent in the disk queue.
util Disk utilization.
If the --minimal option is given, the results will be printed in a
semicolon-delimited format suitable for scripted use. The fields are:
jobname, groupid, error
KB I/O, bandwidth (KB/s), runtime (ms)
min, max, mean, standard deviation
min, max, mean, standard deviation
min, max, aggregate percentage of total, mean,
KB I/O, bandwidth (KB/s), runtime (ms)
min, max, mean, standard deviation
min, max, mean, standard deviation
min, max, aggregate percentage of total, mean,
user, system, context switches, major page faults, minor
IO depth distribution:
<=1, 2, 4, 8, 16, 32, >=64
IO latency distribution (ms):
<=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, >=2000
fio was written by Jens Axboe <firstname.lastname@example.org>.
This man page was written by Aaron Carroll <email@example.com>
based on documentation by Jens Axboe.
Report bugs to the fio mailing list <firstname.lastname@example.org>. See README.
For further documentation see HOWTO and README.
Sample jobfiles are available in the examples directory.