small and fast Bayesian spam filter
Quick Spam Filter (QSF) is an Open Source email classification filter,
designed to be small, fast, and accurate, which works to classify incoming
email as either spam or non-spam.
To recognise spam, QSF strips the text out of the email (using MIME decoding
and HTML stripping) and then splits it into tokens (words, word pairs, URLs,
and so on). These tokens are then looked up in a database and analysed using
the Bayesian technique to see whether the email should be classified as spam
or not.
The database is generated by a process of training - QSF is given two
mailboxes, one containing known spam, and the other containing known
non-spam, to train itself on. After training, if QSF misfiles any email,
the message it got wrong can be fed back into the database, thus making QSF
learn from its mistakes.
For a more in-depth look at the way in which QSF tokenises and classifies
messages, please see the Technical Details section of the manual.
QSF is designed to be run by an MDA, such as procmail.
QSF's targets are speed, accuracy, and simplicity. So:
* It is small and is written in C so it starts up quickly, unlike filters
written in Perl.
* It understands MIME and HTML, so it can intelligently deal with modern
spam, unlike older Bayesian filters such as ifile.
* It runs as an inline filter rather than as a daemon, so it is simple to
* It is written to do only one job - decide whether an email is spam or not
using the content of the message alone - so it is less complex than filters
such as SpamAssassin. Less complexity means bugs and security problems are
less likely.
* As well as words and word pairs, QSF also spots special patterns in email
such as runs of gibberish, HTML comments embedded in text, and other common
spam giveaways, and its flexible tokeniser allows more patterns to be added
as spammers change their tactics.
Homepage: http://www.ivarch.com/programs/qsf/
