NAME
recoll.conf - main personal configuration file for Recoll
DESCRIPTION
This file defines the indexation configuration for the Recoll full-text
search system.
The system-wide configuration file is normally located inside
/usr/[local]/share/recoll/examples. Any parameter set in the common
file may be overriden by setting it in the personal configuration file,
by default: $HOME/.recoll/recoll.conf
Please note while we try to keep this manual page reasonably up to
date, it will frequently lag the current state of the software. The
best source of information about the configuration are the comments in
the configuration file.
A short extract of the file might look as follows:
# Space-separated list of directories to index.
topdirs = ~/docs /usr/share/doc
[~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8
There are three kinds of lines:
· Comment or empty
· Parameter affectation
· Section definition
Empty lines or lines beginning with # are ignored.
Affectation lines are in the form ’name = value’.
Section lines allow redefining a parameter for a directory subtree.
Some of the parameters used for indexaction are looked up
hierarchically from the more to the less specific. Not all parameters
can be meaningfully redefined, this is specified for each in the next
section.
The tilde character (~) is expanded in file names to the name of the
user’s home directory.
Where values are lists, white space is used for separation, and
elements with embedded spaces can be quoted with double-quotes.
OPTIONS
topdirs = directories
Specifies the list of directories to index (recursively).
dbdir = directory
The name of the Xapian database directory. It will be created if
needed when the database is initialized. If this is not an
absolute pathname, it will be taken relative to the
configuration directory.
skippedNames = patterns
A space-separated list of patterns for names of files or
directories that should be completely ignored. The list defined
in the default file is:
*~ #* bin CVS Cache caughtspam tmp
The list can be redefined for subdirectories, but is only
actually changed for the top level ones in topdirs
skippedPaths = patterns
A space-separated list of patterns for paths the indexer should
not descend into. Together with topdirs, this allows pruning the
indexed tree to one’s content. daemSkippedPaths can be used to
define a specific value for the real time indexing monitor.
followLinks = boolean
Specifies if the indexer should follow symbolic links while
walking the file tree. The default is to ignore symbolic links
to avoid multiple indexing of linked files. No effort is made to
avoid duplication when this option is set to true. This option
can be set individually for each of the topdirs members by using
sections. It can not be changed below the topdirs level.
loglevel = value
Verbosity level for recoll and recollindex. A value of 4 lists
quite a lot of debug/information messages. 3 lists only errors.
daemloglevel can be used to specify a different value for the
real-time indexing daemon.
logfilename = file
Where should the messages go. ’stderr’ can be used as a special
value. daemlogfilename can be used to specify a different value
for the real-time indexing daemon.
indexstemminglanguages = languages
A list of languages for which the stem expansion databases will
be built. See recollindex(1) for possible values.
defaultcharset = charset
The name of the character set used for files that do not contain
a character set definition (ie: plain text files). This can be
redefined for any subdirectory.
maxfsoccuppc = percentnumber
Maximum file system occupation before we stop indexing. The
value is a percentage, corresponding to what the "Capacity" df
output column shows. The default value is 0, meaning no
checking.
idxflushmb = megabytes
Threshold (megabytes of new text data) where we flush from
memory to disk index. Setting this can help control memory
usage. A value of 0 means no explicit flushing, letting Xapian
use its own default, which is flushing every 10000 documents
(memory usage depends on average document size). The default
value is 10.
filtersdir = directory
A directory to search for the external filter scripts used to
index some types of files. The value should not be changed,
except if you want to modify one of the default scripts. The
value can be redefined for any subdirectory.
iconsdir = directory
The name of the directory where recoll result list icons are
stored. You can change this if you want different images.
guesscharset = boolean
Try to guess the character set of files if no internal value is
available (ie: for plain text files). This does not work well in
general, and should probably not be used.
usesystemfilecommand = boolean
Decide if we use the file -i system command as a final step for
determining the mime type for a file (the main procedure uses
suffix associations as defined in the mimemap file). This can be
useful for files with suffixless names, but it will also cause
the indexation of many bogus "text" files.
indexedmimetypes = list
Recoll normally indexes any file which it knows how to read.
This list lets you restrict the indexed mime types to what you
specify. If the variable is unspecified or the list empty (the
default), all supported types are processed.
compressedfilemaxkbs = value
Size limit for compressed (.gz or .bz2) files. These need to be
decompressed in a temporary directory for identification, which
can be very wasteful if ’uninteresting’ big compressed files are
present. Negative means no limit, 0 means no processing of any
compressed file. Defaults to -1.
indexallfilenames = boolean
Recoll indexes file names into a special section of the database
to allow specific file names searches using wild cards. This
parameter decides if file name indexing is performed only for
files with mime types that would qualify them for full text
indexation, or for all files inside the selected subtrees,
independent of mime type.
idxabsmlen = value
Recoll stores an abstract for each indexed file inside the
database. The text can come from an actual ’abstract’ section in
the document or will just be the beginning of the document. It
is stored in the index so that it can be displayed inside the
result lists without decoding the original file. The idxabsmlen
parameter defines the size of the stored abstract. The default
value is 250 bytes. The search interface gives you the choice
to display this stored text or a synthetic abstract built by
extracting text around the search terms. If you always prefer
the synthetic abstract, you can reduce this value and save a
little space.
aspellLanguage = lang
Language definitions to use when creating the aspell dictionary.
The value must match a set of aspell language definition files.
You can type "aspell config" to see where these are installed
(look for data-dir). The default if the variable is not set is
to use your desktop national language environment to guess the
value.
noaspell = boolean
If this is set, the aspell dictionary generation is turned off.
Useful for cases where you don’t need the functionality or when
it is unusable because aspell crashes during dictionary
generation.
nocjk = boolean
If this set to true, specific east asian (Chinese Korean
Japanese) characters/word splitting is turned off. This will
save a small amount of cpu if you have no CJK documents. If your
document base does include such text but you are not interested
in searching it, setting nocjk may be a significant time and
space saver.
cjkngramlen = value
This lets you adjust the size of n-grams used for indexing CJK
text. The default value of 2 is probably appropriate in most
cases. A value of 3 would allow more precision and efficiency on
longer words, but the index will be approximately twice as
large.
SEE ALSO
recollindex(1) recoll(1)
8 January 2006