This document describes how to combine fetchmail, Exim 3.x and the DCC to filter out spam on a Debian Linux system with a small number of users, such as a home Linux box. The same method will probably work for any other fetchmail/Exim based system, too, although the paths to files may differ.
This system works for me. Test it first before relying on it. I provide this page in the hope that it will be useful, but if the system eats all your email and spits out the pieces, don't come running to me.
If you are running a large email system, you should run your own DCC server and investigate using the DCC as a sendmail milter. Hopefully someone will use the similar features in Exim 4 to incorporate the DCC into that at some point.
My email configuration uses fetchmail to download email using POP3. fetchmail then talks to Exim, the local mail server, and Exim delivers mail to user accounts. In this document, I'm assuming you already have a working fetchmail/Exim configuration. If you don't, stop reading now and get one. Make sure it works before continuing.
To deal with the spam, I decided to use the Distributed Checksum Clearinghouse or DCC. The DCC works by keeping track of the number of copies of a particular message which are flowing through various mail servers on the internet. A message which has been seen many times is either spam or a popular mailing list. The DCC requires you to white-list your legitimate mailing lists.
It seems to me that this is the most elegant method of filtering spam, since it relies on the one distinguishing feature of spam, namely unsolicited copies of the same thing lots of times (if it was solicited, you'd have whitelisted it when you subscribed, right?) As well as a straight checksum on the message body, the DCC uses "fuzzy" checksums to count how many times a message has been seen, so trivial variations on the same message are counted together.
Extract the dccproc source using uncompress and tar. You should configure the makefile to install under /usr/local/dcc/ to avoid clashing with the packaging system, by cd'ing to the directory created by the tar file and typing:
./configure --prefix=/usr/local/dcc
Once that's done, it's just make and make install, as usual. You will need to be root for the install step.
If you have a firewall of some description, you will need to open port 6277 for UDP packets. The client will attempt to contact one of the public DCC servers by default.
Test dccproc by piping it an email. The email needs to be the raw mail as found in your mailbox, without any MIME processing. The output should be a copy of the input with a header added. The header will look a bit like this:
X-DCC-wanadoo-be-Metrics: verence 1016; Body=1 Fuz1=1 Fuz2=1
The header shows the "brand" of DCC server (in this case, wanadoo.be's server) and the counts for various checksums (the straight body checksum and the two fuzzy checksums). These counts are how many times the DCC server has seen messages like the one you're looking at.
To your existing fetchmail configuration (usually in the file .fetchmailrc in the directory of the user who runs fetchmail), add the line:
mda "/usr/sbin/exim -oi -oee -oMr fetchmail -f '<%F>' '<%T>'"
For example, my .fetchmailrc looks like:
poll pop3.demon.co.uk with proto sdps no dns localdomains verence.demon.co.uk user "verence" there with password "password" is * here mda "/usr/sbin/exim -oi -oee -oMr fetchmail -f '<%F>' '<%T>'" options fetchall
Note: Don't use this .fetchmailrc if you're not using Demon Internet. It won't work, as it specifies the SDPS protocol and disables any DNS lookups.
What this does is cause fetchmail to deliver mail by calling Exim from the command line and piping the mail to it, rather than using SMTP. Doing this lets us use the "-oMr" option which allows us to specify the protocol used to receive the mail (search the spec for "-oMr"). We set this protocol value to "fetchmail". As we'll see below, this means that we can then tell Exim to use the DCC to check only messages coming in via fetchmail, rather than local mail and outbound messages. The DCC should really only be used on non-local mail: there's no point cluttering up the system with checksums from internal mail, and you don't want to accidentally filter common automatic messages, say.
I'm assuming you've got a working Exim configuration based on the supplied template.
You need to alter the exim.conf file, /etc/exim/exim.conf. In the main configuration section, add or modify the trusted_users option so it includes the user who runs fetchmail. On my system "paul" runs fetchmail, so:
trusted_users = mail:paulIn the transports section, add the following transport. It doesn't matter where in that section you add it.
# This transport passes messages to the DCC process to see whether
# they are spam.
dcc:
driver = pipe
command = "/usr/sbin/exim -oMr dcc -bS"
transport_filter = "/usr/local/bin/dccproc -f $sender_address -w \
/usr/local/dcc/whiteclnt -A -t $recipients_count"
user = mail
group = mail
log_output
bsmtp = all
prefix =
When this transport is used, it filters the mail through the dccproc
program and then runs exim again to pass the mail back into the system
using batch SMTP. Now we need to use this transport on mail from
fetchmail. So, at the top of the directors configuration section (the
order does matter here as the directors are checked in order), add:
# This checksums incoming mail using the DCC
checksum:
driver = smartuser
transport = dcc
condition = "${if eq {$received_protocol}{fetchmail}{1}{0}}"
user = mail
This causes mail received with the "fetchmail" protocol to be fed to the "dcc" transport we just created. As we specified "-oMr fetchmail" in the arguments fetchmail uses to call Exim, Exim will use this director and the "dcc" transport to pass incoming mail through dccproc.
Test all this by sending yourself an email which will be delivered to your external POP3 account. Run fetchmail once, manually. You should end up with an email in your inbox with headers which look a bit like this:
Received: from mail by verence.demon.co.uk with dcc (Exim 3.36 #1 (Debian))
id 181SY1-00009F-00
for ...
Received: from paul by verence.demon.co.uk with fetchmail (Exim 3.36 #1
(Debian))
id 181SY0-00009A-00
for ...
Received: from pop3.demon.co.uk
by localhost with POP3 (fetchmail-5.9.11)
for ...
15 Oct 2002 15:19:48 +0100 (BST)
...
X-DCC-wanadoo-be-Metrics: verence 1016; Body=1 Fuz1=1 Fuz2=1
If this doesn't work, you'll need to figure out what's gone wrong. Good luck! The logs from exim in /var/log/exim/mainlog should help.
If you're like me, you probably use Exim's .forward file filtering language to sort mail from mailing lists into separate mailboxes. You can also use it to delete mail which looks like spam, or save it to a separate folder.
Here's an example section from a .forward file:
# Exim filter
... Deliver your mail from mailing lists into boxes BEFORE this test...
if $message_headers matches "(?m)^X-DCC-.*-Metrics:(.*(?:\n\\\\s+.*)*)" then
if $1 contains "many" or ${extract{Body}{$1}{$value}{0}} is above 15 or
${extract{Fuz1}{$1}{$value}{0}} is above 15 or
${extract{Fuz2}{$1}{$value}{0}} is above 15 then
save $home/mail/spam
seen finish
endif
endif
Note: you must follow the instruction about delivering
mail from mailing lists before doing this check: if you don't do this,
you may find you start classifying your mailing lists as spam.This looks for an X-DCC header, allowing for headers which have continuation lines. It pulls out the Body= Fuz1= and Fuz2= parts and checks them against thresholds (after checking to see if any counts are "many", the special value which people use to indicate messages they consider to be spam). We cope with X-DCC headers which do not contain a particular checksum count (because the message is too short) by giving a default value of 0 for counts which are not there.
Spam is delivered to ~/mail/spam. You probably want to change that. If you just want to delete spam, remove the save $home/mail/spam line. However, it's a bad idea to delete the spam mails immediately after you start using this system. First, you should look at the spam folder to see whether you're getting false positives (for example, mailing lists you'd forgotten about). You may also want to adjust the thresholds (35 is what works for me at the moment).
The easiest way to do this is to set up a forward file for those spam addresses which pipes mail to dccproc and then discards it. Here's an example I use, which receives mail sent to various addresses which often get spam:
# Exim filter # spamtrap, report to DCC and razor pipe "/usr/local/bin/dccproc -t many -o /dev/null" pipe "/usr/bin/razor-report" save $home/mail/spam seen finish
This also reports to Vipul's Razor. I don't use that because I think the DCC is better (I don't trust Razor's trust mechanism), but there's no harm in giving them a hand. If you've just got the DCC installed, remove the line containing the razor-report command.