dspam - DSPAM Anti-Spam Agent

NAME

       dspam - DSPAM Anti-Spam Agent

SYNOPSIS

       dspam [--mode=[teft|toe|tum|notrain|unlearn]] [--user user1
       user2 ... userN] [--feature=[ch,no,wh,tb=N,sb]]
       [--class=[spam|innocent]] [--source=[error|corpus|inoculation] ]
       [--profile=[PROFILE] ] --deliver=[spam,innocent] ] [--help ] [--process
       ] [--classify ] [--signature=[signature] ] [--stdout] [--debug]
       [--daemon] [--client] [--rcpt-to] [--mail-from=[address] ]
       [ delivery_arguments ]

DESCRIPTION

       The  DSPAM  agent  provides  a  direct  interface  to  mail servers for
       command-line spam filtering. The  agent  can  masquerade  as  the  mail
       server’s  local delivery agent and will process any email passed to it.
       The agent will then call  whatever  delivery  agent  was  specified  at
       compile  time  or  quarantine/tag/drop messages identified as spam. The
       DSPAM agent can function locally or as a proxy. It is also  responsible
       for  processing  classification errors so that DSPAM can learn from its
       mistakes.

OPTIONS

       --user user1 user2 ... userN
              Specifies the destination users of the incoming message. In most
              cases  this  is  the  local  user  on  the  system, however some
              implementations may call  for  virtual  usernames,  specific  to
              DSPAM,  to be assigned.  The agent processes an incoming message
              once for each user specified. If the message is to be delivered,
              the  $u  (or  %u)  parameters  of  the  argument  string will be
              interpolated for the current user being processed.

       --mode=[toe|tum|teft|notrain]
              Configures the training  mode  to  be  used  for  this  process,
              overriding any defaults in dspam.conf:

              teft  :  Train-Everything.   Trains  on  all messages processed.
              This  is  a  very  thorough  training  approach  and  should  be
              considered  the standard training approach for most users.  TEFT
              may, however, prove too volatile on installations with extremely
              high  per-user  traffic,  or  prove not very scalable on systems
              with extremely large user-bases.  In  the  event  that  TEFT  is
              proving ineffective, one of the other modes is recommended.

              toe  :  Train-on-Error.   Trains only on a classification error,
              once the user’s metadata has matured to 2500 innocent  messages.
              This  training  mode  is  much  less resource intensive, as only
              occasional metadata writes are necessary.  It is also  far  less
              volatile than the TEFT mode of training.  One drawback, however,
              is that TOE only learns when DSPAM has made a  mistake  -  which
              means  the  data  is  sometimes  too static, and unable to "ease
              into" a different type of behavior.

              tum :  Train-until-Mature.   This  training  mode  is  a  hybrid
              between  the  other  two  training  modes  and  provides a great
              balance between volatility and static metadata.  TuM will  train
              on  a  per-token  basis only tokens which have had fewer than 25
              "hits" on them, unless an error is being retrained in which case
              all  tokens  are  trained.   This training mode provides a solid
              core of stable tokens to  keep  accuracy  consistent,  but  also
              allows for dynamic adaptation to any new types of email behavior
              a user might be experiencing.

              notrain : No training.  Do not train the user’s data, and do not
              keep  totals.   This should only be used in cases where you want
              to process mail for a particular user (based  on  a  group,  for
              example),  but  don’t  want  the user to accumulate any learning
              data.

              unlearn : Unlearn original training. Use this  if  you  wish  to
              unlearn  a  previously  learned  message.  Be  sure  to  specify
              --source=error   and   --class   to   whatever   the    original
              classification  the  message  was  learned  under.  If not using
              TrainPristine, this will require  the  original  signature  from
              training.

       --feature=[chained,noise,tb=N,whitelist]
              Specifies  the features that should be activated for this filter
              instance.  The following features may be  used  individually  or
              combined using a comma as a delimiter:

              chained  :  Chained  Tokens  (also  known  as biGrams).  Chained
              Tokens combines adjacent tokens, presently with a window size of
              2,  to  form  token  "chains".   Chained  tokens uses additional
              storage resources, but greatly improves  accuracy.   Recommended
              as a default feature.

              noise   :   Bayesian  Noise  Reduction  (BNR).   Bayesian  Noise
              Reduction kicks in at 2500 innocent  messages  and  provides  an
              advanced  progressive  noise  logic  to  reduce  Bayesian  Noise
              (wordlist attacks) in spams.  See http://bnr.nuclearelephant.com
              for more information.

              tb=N  :   Sets the training loop buffering level.  Training loop
              buffering is the amount of  statistical  sedation  performed  to
              water  down  statistics  and  avoid  false  positives during the
              user’s training loop.   The  training  buffer  sets  the  buffer
              sensitivity,  and  should  be  a  number between 0 (no buffering
              whatsoever) to 10 (heavy buffering).  The default is 5, half  of
              what  previous  versions  of  DSPAM used.  To avoid dulling down
              statistics at all during the training loop, set this to 0.

              whitelist :  Automatic whitelisting.  DSPAM will keep  track  of
              the  entire "From:" line for each message received per user, and
              automatically whitelist messages from senders with more than  20
              innocent  messages and zero spams.  Once the user reports a spam
              from the sender, automatic whitelisting  will  automatically  be
              deactivated  for  that  sender.   Since  DSPAM  uses  the entire
              "From:" line, and not just the sender’s email address, automatic
              whitelisting  is  a  very  safe  approach  to improving accuracy
              especially during initial training.

              sbph  :   Sparse  Binary  Polynomial  Hashing.  Bill  Yerazunis’
              tokenizer method from CRM114. Tokenizer method only - works with
              existing combination algorithms.

       --class=[spam|innocent]
              Identifies  the  disposition  (if  any)  of  the  message  being
              presented. This flag should be used when a misclassification has
              occured, when the user is corpus-feeding a message, or  when  an
              inoculation is being presented. This flag should not be used for
              standard processing. This flag must be used in conjunction  with
              the  --source flag. Omitting this flag causes DSPAM to determine
              the  disposition  of  the  message  on  its  own  (the  standard
              operating mode).

       --source=[error|corpus|inoculation]
              Where  --class  is  used,  the source of the classification must
              also be provided. The  source  tells  dspam  how  to  learn  the
              message being presented:

              error  :  The  message  being presented was a message previously
              misclassified by DSPAM.  When ’error’ is provided as  a  source,
              DSPAM  requires  that  the  DSPAM  signature  be  present in the
              message, and will use  the  signature  to  recall  the  original
              training metadata.  If the signature is not present, the message
              will  be  rejected.   In  this  source  mode,  DSPAM  will  also
              decrement  each  token’s previous classification’s count as well
              as the user totals.

              You should use error only  when  DSPAM  has  made  an  error  in
              classifying the message, and should present the modified version
              of the message with the DSPAM signature when doing so.

              corpus : The message being presented is from a mail corpus,  and
              should be trained as a new message, rather than re-trained based
              on a signature.  The message’s full headers  and  body  will  be
              analyzed  and  the  correct  classification will be incremented,
              without its opposite being decremented.

              You should use corpus only when feeding messages in from corpus.

              inoculation  :  The message being presented is in pristine form,
              and should be trained as an  inoculation.   Inoculations  are  a
              more  intense  mode of training designed to cause DSPAM to train
              the user’s metadata repeatedly on previoulsy unknown tokens,  in
              an attempt to vaccinate the user from future messages similar to
              the one being presented.  You should  use  inoculation  only  on
              honeypots and the like.

       --profile=[PROFILE]
              Specify  a  storage profile from dspam.conf. The storage profile
              selected  will  be  used  for  all  database  connectivity.  See
              dspam.conf for more information.

       --deliver=[innocent,spam]
              Tells  DSPAM  to  deliver the message if its result falls within
              the criteria specified.  For  example,  --deliver=innocent  will
              cause  DSPAM  to  only deliver the message if its classification
              has     been     determined     as      innocent.      Providing
              --deliver=innocent,spam  will cause DSPAM to deliver the message
              regardless  of  its  classification.  This   flag   provides   a
              significant    amount    of    flexibility    for    nonstandard
              implementations.

       --stdout
              If the message is indeed deemed "deliverable" by  the  --deliver
              flag,  this  flag  will  cause  DSPAM  to deliver the message to
              stdout, rather than the configured delivery agent.

       --process
              Tells  DSPAM  to  process  the  message.  This  is  the  default
              behavior, and the flag is implied unless --classify is used.

       --classify
              Tells  DSPAM  to  only classify the message, and not perform any
              writes to the user’s data or attempt to  deliver/quarantine  the
              message.  The  results of a classification are printed to stdout
              in the following format:

              X-DSPAM-Result:   User;    result="Spam";    probability=1.0000;
              confidence=0.80

              NOTE  : The output of the classification is specific to a user’s
              own data, and does not include the output  of  any  groups  they
              might  be  affiliated  with, so it is entirely possible that the
              message would be caught as spam by a group the user belongs  to,
              and appear as innocent in the output of a classification. To get
              the classification for the group , use the  group  name  as  the
              user instead of an individual.

       --signature=[signature]
              If  only  the  signature  is available for training, and not the
              entire message, the --signature flag may be  used  to  feed  the
              signature into DSPAM and forego the reading of stdin. DSPAM will
              process the signature with whatever  commandline  classification
              was   specified.   NOTE:   This   should   only   be  used  with
              --source=error

       --debug
              If DSPAM was compiled with  --enable-debug  then  using  --debug
              will turn on debugging messages to /tmp/dspam.debug.

       --daemon
              If  DSPAM  was compiled with --enable-daemon then using --daemon
              will cause DSPAM to enter daemon mode, where it will listen  for
              DSPAM clients to connect and actively service requests.

       --client
              If  DSPAM  was compiled with --enable-daemon then using --client
              will cause DSPAM to act as a client and attempt  to  connect  to
              the  DSPAM server specified in the client’s configuration within
              dspam.conf. If client behavior is desired, this option  must  be
              specified,  otherwise the agent simply operate as self-contained
              and processes the message on its own, eliminating any benefit of
              using the daemon.

       --rcpt-to
              If  DSPAM  will  be configured to deliver via LMTP or SMTP, this
              flag may be used to define the RCPT TOs which will be  used  for
              the   delivery  of  each  user  specified  with  --user.  If  no
              recipients are provided, the RCPT TOs will match  the  username.
              NOTE: The recipient list should always be balanced with the user
              list, or empty. Specifying an unbalanced number of recipients to
              users will result in undefined behavior.

       --mail-from=
              If  DSPAM  will  be  cofigured to deliver via LMTP or SMTP, this
              flag will set the MAIL FROM sent on delivery of the message. The
              default  MAIL  FROM  depends  on  how the message was originally
              relayed to DSPAM. If it was  relayed  via  the  commandline,  an
              empty  MAIL  FROM  will be used. If it was relayed via LMTP, the
              original MAIL FROM will be used.

EXIT VALUE

       0      Operation was successful.
       other  Operation resulted in an error. If the error involved  an  error
              in  calling  the  delivery agent, the exit value of the delivery
              agent will be returned.

AUTHORS

       Jonathan A. Zdziarski

       For more information, see http://dspam.nuclearelephant.com.

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXIT VALUE

AUTHORS

SEE ALSO