doodle - a tool to search the meta-data in your files
doodle [OPTIONS] ([FILENAMES]*|[KEYWORDS]*)
doodle is a tool to index files. doodle uses libextractor to find
meta-data in files. Once a database has been built, doodle can be used
to quickly find files of which the meta-data matches a given
search-string. This way, doodle can be used to quickly search your
Generally, the first time you run doodle you pass the option -b to
build the database. Together with -b you specify the list of files or
directories to index, for example
$ doodle -b $HOME
Indexing with doodle is incremental. If doodle -b is run (with the
same database) twice it will update the index for files that were
changed. doodle will also remove files that are no longer accessible.
doodle will NOT remove files that are still present but no longer
specified in the argument list. Thus invoking either
$ doodle -b /foo /bar # or
$ doodle -b /foo ; doodle -b /bar
will result in the same database containing both the index for /foo and
/bar. Note that the only way to only un-index /foo at this point is to
make /foo inaccessible (using for example chmod 000 /foo or even rm -rf
/foo) and then run doodle -b again.
In networked environments, it often makes sense to build a database at
the root of each file system, containing the entries for that file
system. For this, doodle is run for each file system on the file
server where that file system is on a local disk, to prevent thrashing
the network. Users can select which databases doodle searches.
Databases cannot be concatenated together.
Once the files have been indexed, you can quickly query the doodle
database. Just run
$ doodle keyword
to search all of your files for keyword. Note that only the meta-data
extracted by libextractor is searched. Thus if libextractor does not
find any meta-data in the files, you may not get any results. You can
use the option -l to specify non-standard libextractor plugins. For
example, doodle could be used to replace the locate tool from the GNU
findutils like this:
$ alias updatedb="doodle -bn -d /var/lib/doodle/doodle-locate-db
-l libextractor_filename /"
$ alias locate="doodle -d /var/lib/doodle/doodle-locate-db"
-a NUMBER, --approximate=NUMBER
do approximate matching with mismatches of up to NUMBER letters
build the doodle database (passed arguments are directories and
filenames that are to be indexed). In comparison with GNU
locate the doodle binary encapsulates both the locate and the
updatedb tool. Using the -b option doodle builds or updates the
database (equivalent to updatedb), without -b it behaves similar
-B LANG, --binary=LANG
Use the generic plaintext extractor for the language with the
2-letter language code LANG. Supported languages are DA
(Danish), DE (German), EN (English), ES (Spanish), IT (Italian)
and NO (Norwegian). Use this option to enable fulltext indexing
(for a particular language). This option only makes sense
together with the -b option.
-d FILENAME, --database=FILENAME
use FILENAME for the location of the database (use when building
or searching). This option is particularly useful when doodle
is used to search different types of files (or is operated with
different extractor options). Using this option doodle can be
used to build specialized indices (i.e. one per file system),
which can in turn improve search performance. When searching,
you can pass a colon-separated list of database file names, in
that case all databases are searched. Note that the disk-space
consumption of a single database is typically slightly smaller
than if the database is split into multiple files.
Nevertheless, the space-savings are likely to be small (a few
percent). You can also use the environment variable
DOODLE_PATH to set the list of database files to search. The
option overrides the environment variable if both are used. If
the option is not given and DOODLE_PATH is not set,
"/var/lib/doodle" is used.
print the extracted keywords for each matching file found. Note
that this will slow down the program a lot, especially if there
are many matches in the database. Note that if the options
given for libextractor are different than the options used for
building the index the results may not contain the search
include filenames (full path) in the set of keywords
print help page
-H ALGORITHM, --hash=ALGORITHM
Use the ALGORITHM to compute a hash of each file (possible
algorithms are sha1 and md5).
-l LIBRARIES, --library=LIBRARIES
specify which libextractor plugins to use (for building the
index with -b or for printing information about files with -e)
-L FILENAME, --log=FILENAME
log all encountered keywords into a log file named FILENAME.
This option is mostly useful for debugging.
-m LIMIT, --memory=LIMIT
use at most LIMIT MB of memory for the nodes of the suffix-tree
(after that, serialize to disk). Note that a smaller value will
reduce memory consumption but increase the size of the temporary
file (and slow down indexing). The default is 8 MB.
do not load the default set of plugins (only load plugins
specified with -l)
make a human-readable screen dump of the doodle database (only
really useful for debugging)
-P PATH, --prunepaths=PATH
Directories to not put in the database, which would otherwise
be. The environment variable PRUNEPATHS also sets this value.
Default is "/tmp /usr/tmp /var/tmp /dev /proc /sys". This
option can also be used when searching, in which case search
results in the specified directories will be ignored.
print the version number
Colon-separated list of databases to search. Note that when
building the database this path must either only contain one
filename or the option -b must be used to specify the database
file. Default is "/var/lib/doodle".
Space-separated list of paths to exclude. Can be overridden
with the -P option.
Doodle depends on libextractor. You can download libextractor from
extract(1), slocate(1), updatedb(1), libextractor(3), libdoodle(3)
libdoodle and doodle are released under the GPL.
Report bugs to mantis <http://gnunet.org/mantis/> or by sending
electronic mail to <firstname.lastname@example.org>
doodle was originally written by Christian Grothoff
You can obtain the original author’s latest version from