How not to spend your time reading spam

There is a confusing multitude of spam filters out there. I once wrote an article listing all the ways of filtering spam I could think of. If you’re confused by all this, here’s what I do, along with ways of doing the same thing on both Unix and Windows systems.

<lj-cut> My first line of defence is a bunch of blacklists. These don’t work on the From address of the spam, which is usually forged, but rather on the IP address of the machine sending the email. There are a multitude of blacklists available, too. They differ in their listing criteria from narrow listings of machines which have sent spam, to broad listings of entire networks, intended to help you boycott ISPs which support spam. Getting legitimate email is more important to me than filtering all the spam, so I choose narrowly focussed blacklists. I use:

  • The Spamhaus Blocklist, a manually edited list of the worst corners of the Internet. These days, spammers tend to host their websites in these places and exploit other people’s machines to actually send their spam. Which is why I also use…
  • The Spamhaus Exploits Blocklist, an automatically compiled list of machines which have been taken over by spammers, probably without their owners’ knowledge. Windows users with cable modems, usually.
  • The Open Relay Database, another list of machines which are exploitable in a different way (mostly not a way which is used by spammers these days, but it occasionally catches something).

If you want to filter your email using these blacklists, and you’re on Windows, you could try Spampal. It is completely free and very stable. It will work for you if you collect your mail using something like Thunderbird or Outlook Express (but don’t use OE unless you want to become one of the aforementioned exploited Windows owners). It works by sitting between your mail server and your mail program and marking suspect mail as it goes by. You then configure a filtering rule in your mail program to move the suspect mail into a separate folder. If you pare down the blacklists Spampal uses to just those listed above, it shouldn’t slow your mail downloads too much.

If you’re on Unix and you run your own mail server, receiving mail directly from the Internet, that server will probably have support for using these blacklists. If you pull mail from elsewhere, using fetchmail, say, so that your mail server doesn’t see the IP address of the machine which originated the mail, there’s a little Perl script called rblfilter which will help. It doesn’t seem to be maintained anymore, so I’ve put a copy here. You’ll need to work out how to tie it into your email system and edit the script according to the instructions in the comments.

The next line of defence is the Distributed Checksum Clearinghouse. The DCC works by sharing information about how many other copies of a particular email are floating around the Internet. If there are a lot of copies, it’s either something like a mailing list, or it’s spam. To use the DCC, you tell it where you expect to get legitimate bulk email from. Everything else you get which is bulk is therefore spam. The DCC is designed for Unix, so the web pages and Google will tell you how to get it set up there. There is a plugin for Spampal which will also let Windows people use the DCC. It’s beta software, that is, released to the public for testing, so it may contain some bugs: I’ve no idea how stable it is (despite getting a credit on that page, I didn’t actually write it).

If someone else manages your email for you, and you read it via a web interface, for example, then you should have a look a the spam filtering options you have available. I’ve just noticed that Pobox.com, who provide a forwarding address for me, now let people configure their service to reject mail based on those blacklists.

Fight the pink menace!

2 Comments on "How not to spend your time reading spam"


  1. I’ve found Bayesian filtering to be incredibly effective.

    SpamBayes is what I’ve been using, and it blocks ~98% of spam for me, I haven’t had any ham counted as spam for a long long time either. There is a really good outlook plugin or you can use it as a imap/pop3 proxy or a filter for procmail.

    Also surprised you didn’t mention SpamAssassin.

    Reply

    1. Well, both of those are probably good, but the point of the article was to tell Windows and Unix people how to do what I do, as I know that works reasonably well. Occassionally I get a spate of stuff which the blacklists haven’t caught up with yet, so I may add SpamBayes eventually.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *