Protecting Mailman with bogofilter against spam

Also tired of manually polishing the Mailman mail queue of trapped spam? You can easily insert the bogofilter spam filter into the mail flow. How to set it up?

Prerequisites

Naturally, you need a working installation of Mailman (the GNU Mailing List Manager). And an installation of bogofilter.

Setup of bogofilter: training the filter

After defining things in '/etc/bogofilter.cf' (not much to change), we first train the filter. This requires two "mbox" style email files, one containing the good emails (ham) and one containing the bad emails (spam). The Mutt E-Mail Client is a good friend to select messages and create these files. We generate them as '/var/spool/mail/bogotrain-as-spam' and '/var/spool/mail/bogotrain-as-good'.
  1. train on spams:
    cat /var/spool/mail/bogotrain-as-spam | bogofilter -s -v
  2. train on ham (non-spams):
    cat /var/spool/mail/bogotrain-as-good | bogofilter -n -v
Next you need to get the permissions of '/var/spool/bogofilter/' right to enable bogofilter to auto-update the wordlist there. Later all filtered spam will arrive in '/var/spool/mail/spam.bogofilter' (see below).

Procmail definitions for bogofilter

We have to use procmail to filter the incoming emails with bogofilter. Get the bogofilterrc file and store it in
/etc/mail/procmail/bogofilterrc

Modifying the Mailman definitions

Next step is to first pass incoming list emails to bogofilter before handing them over to Mailman. Basically one line is modified in '/etc/aliases' (we assume a working Mailman installation here):
## grass-dev mailing list
#old:
#grass-dev:              "|/usr/lib/mailman/mail/mailman post grass-dev"
#new:
grass-dev:              "|/usr/bin/procmail -m MAILMAN=grass-dev /etc/mail/procmail/bogofilterrc"
grass-dev-admin:        "|/usr/lib/mailman/mail/mailman admin grass-dev"
grass-dev-bounces:      "|/usr/lib/mailman/mail/mailman bounces grass-dev"
grass-dev-confirm:      "|/usr/lib/mailman/mail/mailman confirm grass-dev"
grass-dev-join:         "|/usr/lib/mailman/mail/mailman join grass-dev"
grass-dev-leave:        "|/usr/lib/mailman/mail/mailman leave grass-dev"
grass-dev-owner:        "|/usr/lib/mailman/mail/mailman owner grass-dev"
grass-dev-request:      "|/usr/lib/mailman/mail/mailman request grass-dev"
grass-dev-subscribe:    "|/usr/lib/mailman/mail/mailman subscribe grass-dev"
grass-dev-unsubscribe:  "|/usr/lib/mailman/mail/mailman unsubscribe grass-dev"
For other lists just replace the 'MAILMAN=xxxx' parameter accordingly. Do this for all lists which you are running... Easy, no?
Don't forget to run this as 'root' after modification:
newaliases

Define training cronjob for life-time learning

In the beginning you will observe, that some mails aren't yet properly classified. For the GRASS mailing lists it took less than 3 days to get it quite perfectly working, so no need to be nervous about this.

We simply define an overnight cronjob to re-train bogofilter from the spam/ham collection. Save this as '/usr/bin/bogolearn.sh':

#!/bin/sh
#
# TRAIN bogofilter CRONJOB
# save this as /usr/bin/bogolearn.sh
# train bogofilter with new spam and non-spam (ham)

PATH=/bin:/usr/bin
BOGOFILTER="/usr/bin/bogofilter"
MAILDIR="/var/spool/mail/"
SPAMTRAINFILE="bogotrain-as-spam"
NOSPAMTRAINFILE="bogotrain-as-good"

cd $MAILDIR
cat $SPAMTRAINFILE | $BOGOFILTER -s -v
cat $NOSPAMTRAINFILE | $BOGOFILTER -n -v
### end
Install it:
chmod a+x /usr/bin/bogolearn.sh
As root, define the following cronjob:
crontab -e
## insert:
#train bogolearn every morning at 3:30.
30 3 * * * sh /usr/bin/bogolearn.sh
Verify the job list:
crontab -l
You can now simply store wrongly classified emails into '/var/spool/mail/bogotrain-as-spam' or '/var/spool/mail/bogotrain-as-good', respectively. Bogofilter will take care to learn from that.

Watch it working...

Don't forget to check this mail folder from time to time:
mutt -f /var/spool/mail/spam.bogofilter
Save wrongly classified emails into '/var/spool/mail/bogotrain-as-spam' or '/var/spool/mail/bogotrain-as-good', respectively.

Enjoy!


© 2007 Markus Neteler (neteler AT itc.it)
Back homepage
Last change: $Date: 2007-01-15 18:17:33 +0100 (Mo, 15 Jan 2007) $