NAME
sa-learn-cyrus - Train Spamassassin with spam/ham from user's imap
mailboxes
USAGE
sa-learn-cyrus [ options ] user-name(s)
user-name(s) One ore more user/mailbox name(s).
options:
--help Prints a brief help message and exits.
-h
--man Prints the manual page and exits.
--verbose level Be verbose if level > 0
-v level
--config file Use a configuration file other than the default
-c file one.
--sa-debug Run sa-learn in debug mode.
-d
--simulate Run in simulation mode (show commands only).
-s
--imap-domains domains Search mailboxes in list of domains.
-D domains
DESCRIPTION
sa-learn-cyrus feeds spam and non-spam (ham) messages to Spamassassin's
database. Its main purpose is to train SA's bayes database with
spam/ham messages sorted by the mailbox owners into special subfolders.
It is intended to be used on smal mail systems (e.g. home office) with
a single server-wide SA configuration.
Launching sa-learn-cyrus at regular intervalls (cron job) may improve
SA's hit rate considerably, provided that the users are well instructed
what to move to their ham/spam folders and what not.
FUNCTION
sa-learn-cyrus scans local mail spools as used by Cyrus IMAPd for
special subfolders. These subfolders are supposed to contain mails
which have been classfied as spam or ham by the mailbox owners.
Example: The users move spam mails which have not been tagged as spam
by SpamAssassin (false positives) to a subfolder INBOX.Learn.Spam.
Other mails, which may be classified by SA as spam in the future
because of certain characteristics are copied to a subfolder
INBOX.Learn.Ham.
sa-learn-cyrus feeds the content of these spam/ham folders to SA's
Bayes database using the sa-learn tool which is shipped with the
Spamassassin package.
Afterwards these mails are deleted (optionally) by means of ipurge
which is a helper tool coming along with the Cyrus IMAPd package.
ARGUMENTS
sa-learn-cyrus optionally takes a list of mailbox/user names as
agruments:
sa-learn-cyrus fred wilma fritz hjb
If not supplied all mailboxes found will be handled.
OPTIONS
All options supplied on the comand line will override corresponding
parameters given in the configuration file.
Please note that the basic parameters of sa-learn-cyrus have to be
defined in a configuration file. sa-learn-cyrus cannot be controlled
solely by means of command.
--config file, -c file
Use configuration file other then the default one. Always adopt
the configuartion file to your needs before using sa-learn-cyrus on
a live system. Otherwise you may loose data or corrupt your SA data
base!
--verbose level, -v level
Specify level of verbosity. (Default = 0)
--sa-debug, -d
Run sa-learn in debug mode. This may be useful to examine problems
with sa-learn.
--simulate, -s
Run sa-learn-cyrus in simulation mode. This is useful for first
tests after initial configuration or if problem are encountered. In
simulation mode sa-learn-cyrus doesn't execute any system commands
nor does it touch any data. It just displays what it would do.
--imap-domains list-of-domains, -D list-of-domains
If your Cyrus installation uses the "domain support" you may use
this option to tell what domains you want to be searched.
--domains example.com,another.org
is equivalent to
[imap]
...
domains = example.com another.org
...
in the configuration file.
CONFIGURATION
By default sa-learn-cyrus expects its configuration file as
/etc/sapmasassin/sa-learn-cyrus.conf.
One has to change this setting in the code, if another default file is
wanted. Another than the default file can always be choosen with the
"--config option".
A sample configuration file is shipped with sa-learn-cyrus.
Format
The configuration file has a format as knwon from rsync or samba is
very similar to the format of Windows ini files. The file consist of
sequence sections. The begin of each section is designated with a
section name, a word in square brackets, e.g. "[global]". The section
entries consist of parameters, which are key/value pairs each on a
single line. Key an value are separated by an equal sign like
key = value
The value is a single word or a list of words each of them representing
a number or a string. Words may be surrounded ba any number of spaces
for better readability. Empty lines and lines with a leading hash
character "#" are ingored.
Section [global]
The [global] section contains all global controll parameters.
tmp_dir = temporary-directory
sa-learn-cyrus creates some temporary files during each run. This
is the directory where thes files are created.
lock_file = full-path-to-lock-file
To avoid race conditions, sa-learn-cyrus uses a simple file locking
mechanism. Each new sa-learn-cyrus process looks for this file
before it realy does anything. If this file exists, the process
exits with a warning, assuming that another sa-learn-cyrus process
is running.
verbose = level
The level of verbosity. Values range from 0 (low) to 3 (high). A
reasonable level to start with is 1.
simulate = yes|no
sa-learn-cyrus should be run in simulation mode ("simulate = yes")
after the first customization of the configuration to avoid loss of
data or corruption of SA's database in case of wrongly configured
parameters.
Section [mailbox]
Section [mailbox] contains all parameters to select the mailboxes, to
specify the special subfolders, and to define the actions to apply.
include_list = list-of-mailboxes
Only spam/ham mails of these mailboxes are fed to Spamassassin's
database. If this List ist empty, all mailboxes will be used.
"include_list" may be used instead of the list on the command line.
Example:
include_list = fred wilma fritz hjb
include_regexp = regular-expression
If include_list is empty, a regular expression given here is
applied to all mailbox names to select mailboxes. This parameter is
ignored if include_list is not empty.
Example: Include all mailboxes beginning with 'knf-'.
include_regexp = ^knf-
exclude_list = list-of-mailboxes
A list of mailboxes wich will be excluded. If include_list is not
empty, this parameter is ignored.
exclude_regexp = regular-expression
Mailbox names which match with this regular expresson are excluded
from processing.
Example: Ignore all mailboxes ending with '.beie'
exclude_regexp = \.beie$
spam_folder = folder-name
The name of the special subfolder in each mailbox which contains
spam. The name should be a complete folder path relative to the
root folder INBOX. The Cyrus nomenclature is applied (same as with
cyradm).
Example:
spam_folder = Learn.Spam
This is a subfolder in a folder tree like this:
INBOX
+--Drafts
+--Templates
+--Sent
+--Learn
| +--Ham
| +--Spam <-- spam subfolder
|
ham_folder = folder-name
The name of the special subfolder in each mailbox which contains
ham. (Same naming scheme as with "spam_folder", see above.)
remove_spam = yes|no
Are the spam messages in the "spam_folder" to be removed after
feeding them to the SA database or not?
remove_ham = yes|no
Are the ham messages in the "ham_folder" to be removed after
feeding them to the SA database or not?
Section [sa]
Spamassassin (SA) configuration items.
site_config_path = path
Path to system-wide SA preferences.
Example:
site_config_path = /etc/spamassassin
prefs_file = file
Path of the system-wide SA configuartin file.
Example:
prefs_file = /etc/spamassassin/local.cf
learn_cmd = path
Path to the sa-learn utility.
Example:
learn_cmd = /usr/bin/sa-learn
user = user-id
The user id SA runs with.
Example:
user = mail
group = group-id
The group id SA runs with.
Example:
group = mail
debug = yes|no
Run sa-learn in debug mode or not. "debug = yes" may be useful to
examine problems.
Section [imap]
The section [imap] contains the necessary configuration parameter to
locate an manage the (Cyrus) IMAPd spool files.
base_dir = dir
The root of the base directory of the IMAP spool (below that the
mailboxes are located).
initial_letter = yes|no
If base_dir is divided in subdirectories named with the initial
letters of mailbox names set "initial_letter = yes" (default),
otherwise choose no.
Examples for joe's mailbox:
<base_dir>/j/user/joe/ : initial_letter = yes
<base_dir>/user/joe/ : initial_letter = no
domains = list-of-domains
If your Cyrus spool uses domain hierarchy supply a list of domains.
If domain support is not used leave this entry empty. The
"initial_letter" option (see above) is applied to domains, too.
Example for mailboxes fritz@bar.org and joe@foo.com :
The mail files within the Cyrus spool are located at
<base_dir>/domain/b/bar.org/f/fritz
<base_dir>/domain/f/foo.com/j/joe
List the domains as
domains = foo.com bar.org
unixhierarchysep = yes|no
Choose "unixhierarchysep = yes" if Cyrus is configured to accept
usernames like 'hans.mueller.somedomain.tld'. Otherwise set
"unixhierarchysep = no".
purge_cmd = path-to-command
The path to the Cyrus ipurge utility for purging mail messages.
Example:
purge_cmd = /usr/sbin/ipurge
user = user
The user Cyrus-IMAPd runs as.
Example:
user = cyrus
FILES
/etc/spamassassin/sa-learn-cyrus.conf
SEE ALSO
sa-learn(1), spamassassin(1), Mail::SpamAssassin(3), imapd(8)
The current version of this script is available at
<http://www.pollux.franken.de/mail-server-tools/sa-learn-cyrus/>
PREREQUISITES
sa-learn (part of the SpamAssassin package), ipurge (part of Cyrus
IMAPd)
AUTHOR
Hans-Juergen Beie <hjb@pollux.franken.de>
COPYRIGHT AND LICENSE
Copyright 2004-2008 by Hans-Juergen Beie.
This program is free software; you can redistribute it and/or modify it
under the terms of the Artistic License 2.0
<http://foundation.perl.org/legal/licenses/artistic-2_0-plain.html> or
the GNU General Public License as published by the Free Software
Foundation; either version 2 of the license
<http://www.gnu.org/licenses/old-licenses/gpl-2.0.html>, or (at your
option) any later version.
DISCLAIMER
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ACKNOWLEDGMENTS
Thanks to Robert Carnecky and Jan Hauke Rahm for testing and
suggestions for the implementation of the domain support.