NAME
dspam - DSPAM Anti-Spam Agent
SYNOPSIS
dspam [--mode=[teft|toe|tum|notrain|unlearn]] [--user user1
user2 ... userN] [--feature=[ch,no,wh,tb=N,sb]]
[--class=[spam|innocent]] [--source=[error|corpus|inoculation] ]
[--profile=[PROFILE] ] --deliver=[spam,innocent] ] [--help ] [--process
] [--classify ] [--signature=[signature] ] [--stdout] [--debug]
[--daemon] [--client] [--rcpt-to] [--mail-from=[address] ]
[ delivery_arguments ]
DESCRIPTION
The DSPAM agent provides a direct interface to mail servers for
command-line spam filtering. The agent can masquerade as the mail
server’s local delivery agent and will process any email passed to it.
The agent will then call whatever delivery agent was specified at
compile time or quarantine/tag/drop messages identified as spam. The
DSPAM agent can function locally or as a proxy. It is also responsible
for processing classification errors so that DSPAM can learn from its
mistakes.
OPTIONS
--user user1 user2 ... userN
Specifies the destination users of the incoming message. In most
cases this is the local user on the system, however some
implementations may call for virtual usernames, specific to
DSPAM, to be assigned. The agent processes an incoming message
once for each user specified. If the message is to be delivered,
the $u (or %u) parameters of the argument string will be
interpolated for the current user being processed.
--mode=[toe|tum|teft|notrain]
Configures the training mode to be used for this process,
overriding any defaults in dspam.conf:
teft : Train-Everything. Trains on all messages processed.
This is a very thorough training approach and should be
considered the standard training approach for most users. TEFT
may, however, prove too volatile on installations with extremely
high per-user traffic, or prove not very scalable on systems
with extremely large user-bases. In the event that TEFT is
proving ineffective, one of the other modes is recommended.
toe : Train-on-Error. Trains only on a classification error,
once the user’s metadata has matured to 2500 innocent messages.
This training mode is much less resource intensive, as only
occasional metadata writes are necessary. It is also far less
volatile than the TEFT mode of training. One drawback, however,
is that TOE only learns when DSPAM has made a mistake - which
means the data is sometimes too static, and unable to "ease
into" a different type of behavior.
tum : Train-until-Mature. This training mode is a hybrid
between the other two training modes and provides a great
balance between volatility and static metadata. TuM will train
on a per-token basis only tokens which have had fewer than 25
"hits" on them, unless an error is being retrained in which case
all tokens are trained. This training mode provides a solid
core of stable tokens to keep accuracy consistent, but also
allows for dynamic adaptation to any new types of email behavior
a user might be experiencing.
notrain : No training. Do not train the user’s data, and do not
keep totals. This should only be used in cases where you want
to process mail for a particular user (based on a group, for
example), but don’t want the user to accumulate any learning
data.
unlearn : Unlearn original training. Use this if you wish to
unlearn a previously learned message. Be sure to specify
--source=error and --class to whatever the original
classification the message was learned under. If not using
TrainPristine, this will require the original signature from
training.
--feature=[chained,noise,tb=N,whitelist]
Specifies the features that should be activated for this filter
instance. The following features may be used individually or
combined using a comma as a delimiter:
chained : Chained Tokens (also known as biGrams). Chained
Tokens combines adjacent tokens, presently with a window size of
2, to form token "chains". Chained tokens uses additional
storage resources, but greatly improves accuracy. Recommended
as a default feature.
noise : Bayesian Noise Reduction (BNR). Bayesian Noise
Reduction kicks in at 2500 innocent messages and provides an
advanced progressive noise logic to reduce Bayesian Noise
(wordlist attacks) in spams. See http://bnr.nuclearelephant.com
for more information.
tb=N : Sets the training loop buffering level. Training loop
buffering is the amount of statistical sedation performed to
water down statistics and avoid false positives during the
user’s training loop. The training buffer sets the buffer
sensitivity, and should be a number between 0 (no buffering
whatsoever) to 10 (heavy buffering). The default is 5, half of
what previous versions of DSPAM used. To avoid dulling down
statistics at all during the training loop, set this to 0.
whitelist : Automatic whitelisting. DSPAM will keep track of
the entire "From:" line for each message received per user, and
automatically whitelist messages from senders with more than 20
innocent messages and zero spams. Once the user reports a spam
from the sender, automatic whitelisting will automatically be
deactivated for that sender. Since DSPAM uses the entire
"From:" line, and not just the sender’s email address, automatic
whitelisting is a very safe approach to improving accuracy
especially during initial training.
sbph : Sparse Binary Polynomial Hashing. Bill Yerazunis’
tokenizer method from CRM114. Tokenizer method only - works with
existing combination algorithms.
--class=[spam|innocent]
Identifies the disposition (if any) of the message being
presented. This flag should be used when a misclassification has
occured, when the user is corpus-feeding a message, or when an
inoculation is being presented. This flag should not be used for
standard processing. This flag must be used in conjunction with
the --source flag. Omitting this flag causes DSPAM to determine
the disposition of the message on its own (the standard
operating mode).
--source=[error|corpus|inoculation]
Where --class is used, the source of the classification must
also be provided. The source tells dspam how to learn the
message being presented:
error : The message being presented was a message previously
misclassified by DSPAM. When ’error’ is provided as a source,
DSPAM requires that the DSPAM signature be present in the
message, and will use the signature to recall the original
training metadata. If the signature is not present, the message
will be rejected. In this source mode, DSPAM will also
decrement each token’s previous classification’s count as well
as the user totals.
You should use error only when DSPAM has made an error in
classifying the message, and should present the modified version
of the message with the DSPAM signature when doing so.
corpus : The message being presented is from a mail corpus, and
should be trained as a new message, rather than re-trained based
on a signature. The message’s full headers and body will be
analyzed and the correct classification will be incremented,
without its opposite being decremented.
You should use corpus only when feeding messages in from corpus.
inoculation : The message being presented is in pristine form,
and should be trained as an inoculation. Inoculations are a
more intense mode of training designed to cause DSPAM to train
the user’s metadata repeatedly on previoulsy unknown tokens, in
an attempt to vaccinate the user from future messages similar to
the one being presented. You should use inoculation only on
honeypots and the like.
--profile=[PROFILE]
Specify a storage profile from dspam.conf. The storage profile
selected will be used for all database connectivity. See
dspam.conf for more information.
--deliver=[innocent,spam]
Tells DSPAM to deliver the message if its result falls within
the criteria specified. For example, --deliver=innocent will
cause DSPAM to only deliver the message if its classification
has been determined as innocent. Providing
--deliver=innocent,spam will cause DSPAM to deliver the message
regardless of its classification. This flag provides a
significant amount of flexibility for nonstandard
implementations.
--stdout
If the message is indeed deemed "deliverable" by the --deliver
flag, this flag will cause DSPAM to deliver the message to
stdout, rather than the configured delivery agent.
--process
Tells DSPAM to process the message. This is the default
behavior, and the flag is implied unless --classify is used.
--classify
Tells DSPAM to only classify the message, and not perform any
writes to the user’s data or attempt to deliver/quarantine the
message. The results of a classification are printed to stdout
in the following format:
X-DSPAM-Result: User; result="Spam"; probability=1.0000;
confidence=0.80
NOTE : The output of the classification is specific to a user’s
own data, and does not include the output of any groups they
might be affiliated with, so it is entirely possible that the
message would be caught as spam by a group the user belongs to,
and appear as innocent in the output of a classification. To get
the classification for the group , use the group name as the
user instead of an individual.
--signature=[signature]
If only the signature is available for training, and not the
entire message, the --signature flag may be used to feed the
signature into DSPAM and forego the reading of stdin. DSPAM will
process the signature with whatever commandline classification
was specified. NOTE: This should only be used with
--source=error
--debug
If DSPAM was compiled with --enable-debug then using --debug
will turn on debugging messages to /tmp/dspam.debug.
--daemon
If DSPAM was compiled with --enable-daemon then using --daemon
will cause DSPAM to enter daemon mode, where it will listen for
DSPAM clients to connect and actively service requests.
--client
If DSPAM was compiled with --enable-daemon then using --client
will cause DSPAM to act as a client and attempt to connect to
the DSPAM server specified in the client’s configuration within
dspam.conf. If client behavior is desired, this option must be
specified, otherwise the agent simply operate as self-contained
and processes the message on its own, eliminating any benefit of
using the daemon.
--rcpt-to
If DSPAM will be configured to deliver via LMTP or SMTP, this
flag may be used to define the RCPT TOs which will be used for
the delivery of each user specified with --user. If no
recipients are provided, the RCPT TOs will match the username.
NOTE: The recipient list should always be balanced with the user
list, or empty. Specifying an unbalanced number of recipients to
users will result in undefined behavior.
--mail-from=
If DSPAM will be cofigured to deliver via LMTP or SMTP, this
flag will set the MAIL FROM sent on delivery of the message. The
default MAIL FROM depends on how the message was originally
relayed to DSPAM. If it was relayed via the commandline, an
empty MAIL FROM will be used. If it was relayed via LMTP, the
original MAIL FROM will be used.
EXIT VALUE
0 Operation was successful.
other Operation resulted in an error. If the error involved an error
in calling the delivery agent, the exit value of the delivery
agent will be returned.
AUTHORS
Jonathan A. Zdziarski
For more information, see http://dspam.nuclearelephant.com.
SEE ALSO
dspam_stats(1), dspam_train(1), dspam_clean(1), dspam_dump(1),
dspam_merge(1)