There is a confusing multitude of spam filters out there. I once wrote an article listing all the ways of filtering spam I could think of. If you’re confused by all this, here’s what I do, along with ways of doing the same thing on both Unix and Windows systems.

<lj-cut> My first line of defence is a bunch of blacklists. These don’t work on the From address of the spam, which is usually forged, but rather on the IP address of the machine sending the email. There are a multitude of blacklists available, too. They differ in their listing criteria from narrow listings of machines which have sent spam, to broad listings of entire networks, intended to help you boycott ISPs which support spam. Getting legitimate email is more important to me than filtering all the spam, so I choose narrowly focussed blacklists. I use:

  • The Spamhaus Blocklist, a manually edited list of the worst corners of the Internet. These days, spammers tend to host their websites in these places and exploit other people’s machines to actually send their spam. Which is why I also use…
  • The Spamhaus Exploits Blocklist, an automatically compiled list of machines which have been taken over by spammers, probably without their owners’ knowledge. Windows users with cable modems, usually.
  • The Open Relay Database, another list of machines which are exploitable in a different way (mostly not a way which is used by spammers these days, but it occasionally catches something).

If you want to filter your email using these blacklists, and you’re on Windows, you could try Spampal. It is completely free and very stable. It will work for you if you collect your mail using something like Thunderbird or Outlook Express (but don’t use OE unless you want to become one of the aforementioned exploited Windows owners). It works by sitting between your mail server and your mail program and marking suspect mail as it goes by. You then configure a filtering rule in your mail program to move the suspect mail into a separate folder. If you pare down the blacklists Spampal uses to just those listed above, it shouldn’t slow your mail downloads too much.

If you’re on Unix and you run your own mail server, receiving mail directly from the Internet, that server will probably have support for using these blacklists. If you pull mail from elsewhere, using fetchmail, say, so that your mail server doesn’t see the IP address of the machine which originated the mail, there’s a little Perl script called rblfilter which will help. It doesn’t seem to be maintained anymore, so I’ve put a copy here. You’ll need to work out how to tie it into your email system and edit the script according to the instructions in the comments.

The next line of defence is the Distributed Checksum Clearinghouse. The DCC works by sharing information about how many other copies of a particular email are floating around the Internet. If there are a lot of copies, it’s either something like a mailing list, or it’s spam. To use the DCC, you tell it where you expect to get legitimate bulk email from. Everything else you get which is bulk is therefore spam. The DCC is designed for Unix, so the web pages and Google will tell you how to get it set up there. There is a plugin for Spampal which will also let Windows people use the DCC. It’s beta software, that is, released to the public for testing, so it may contain some bugs: I’ve no idea how stable it is (despite getting a credit on that page, I didn’t actually write it).

If someone else manages your email for you, and you read it via a web interface, for example, then you should have a look a the spam filtering options you have available. I’ve just noticed that, who provide a forwarding address for me, now let people configure their service to reject mail based on those blacklists.

Fight the pink menace!

A couple of students in Another Place are in trouble for “hacking”. The news papers aren’t particularly specific about what they did, but it sounds like they installed a packet sniffer and listened in on traffic across their network.

Ethernet networks have everyone hanging off the same piece of wire. If you’re on an Ethernet network, your network card has a unique address. As the traffic for everyone on that piece of wire flows by, your computer picks up traffic addressed to it. It doesn’t listen to other people’s traffic because you usually don’t care about it. However, by running your network card in what is delightfully known as promiscuous mode, you can see other people’s traffic. Programs which do this and present the results to you are called packet sniffers. Ethereal is a popular free packet sniffer. Packet sniffers have legitimate uses, like diagnosing network problems or writing and debugging software which uses the network (I installed Ethereal the last time I was having problems with DNS lookups, for example). The remedies for undesired sniffing are encryption and restructuring the network so everyone’s packets don’t share the same piece of wire.

The Oxford students seem to have been disciplined for drawing attention to what they did, but none of what they found is news. A college network probably has everyone hanging off the same wire. There are encrypted versions of telnet, HTTP, IMAP and POP3 but not many people use them. There are a lot of clever people with time on their hands. You work it out.

People who know this have done some sort of risk calculation and come up with a solution that they’re happy with, which balances convenience against privacy. For example, I only permit encrypted logins to my machines and don’t send my password itself when fetching email (although the mail itself comes across the wire as plain text). Now you know what’s possible, you can do that calculation too.

Mozex is an extension to the Mozilla/Netscape browser. It works on both Windows and Linux. Among the useful things it does is enable you to edit textareas in forms (such as, say, the LJ “post comment” form) using an external editor. I’ve been looking for something which like this for a while. While my client lets me use an editor to compose journal entries, it doesn’t work for comments. I like being able to use my own favourite editor, where I can use Google tricks like the ghref script.

Among the best April Fools jokes today are LJ’s own pranks of changing “Friends” to “Stalkers” and creating lj_serialadder, who appears to have friended just about everyone, at least briefly. Count the sheer number of whining lamers in that journal. Astonishing.

I also liked morayallan‘s apt-gentoo. And IF Quake is rather good, too. I loaded it into Frotz and it plays! Hmm… not quite sure I believe them about needing the .pak files from the original Quake.

This year’s crop of April Fool RFCs was not a patch on RFC 1149.

Gmail, Google’s proposed mail service, seems to be for real. One to watch, anyway.

I got into a brief flame war on lj_biz for pointing out that it’s possible to spam via LJ by sending out lots of community invites. People, if you’re using the DCC you know you have to whitelist sources of legitimate bulk email. So, I’ve not caused LJ’s emails to disappear into a bottomless pit.

On that subject, the DCC plugin for Spampal is in beta-testing, so the DCC is not just for Unix users anymore.

DoeS aNy Boydie know how to get friends of friends to only display people who are connected via other users and not via communities?

Kevin S. Wilson writes in NANAE:

You just don’t get it, do you? WE ARE PISSED, VENGEFUL, AND UNSYMPATHETIC. You helped to create the mess that e-mail has become, invading the privacy of millions of people and generally making an annoyance of yourself on a GLOBAL scale. Ultimately, you may have helped to render e-mail unuseable. You think anyone cares that you can’t find hosting for a vanity domain? Instead of looking for sympathy here, you ought to be thanking your lucky stars that someone sick of your spam hasn’t hunted you down and broken your arms, or worse.

I’m sure we all feel that way some days. (If ASR is the scary devil monastery, what does that make NANAE, I wonder?)

In other good news, Microsoft, AOL, Earthlink and Yahoo are going after some of the most prolific spammers. A quick look at the example emails in the lawsuit documents shows that many of the obvious suspects are in the frame. They’re filed against “John Doe” (the US legal equivalent of “John Smith”) as this allows the plaintiffs to get ISPs and other organisations to disclose the identities of the people behind the spam, but the targets here are well chosen, so I think the plaintiffs know who they expect to end up bankrupting. The mills of justice grind slowly, but we may hope they grind exceeding small.

There is a conspiracy theory which says this is just large commercial interests getting the porn’n’pills people out of the way, leaving the field clear for mainsleaze. Even if this is the plan of people like Microsoft, there are strategies in place for dealing with spamming from mainstream companies. Such companies can’t afford to use the deceptive and criminal tactics of the worst spammers, so blacklists and bulk email detectors like the DCC should see them off.

I am sick, although I’m getting better. Possibly I was brought low by a moody pistaccio nut at PaulB’s board games thing on Sunday, or possibly I caught whatever S has had recently (although the effects seem to have been more spectacularly gastrointestinal in my case). Today I am moving about slowly, eating digestive biscuits and soup, and geeking out.

My web wanderings turned up Corey Doctorow’s notes on Tech Secrets of Overprolific Alpha Geeks, a talk at some uber-geek conference or other, given by Danny O’Brien of NTK notoriety. Nice quote from Python BDFL Guido van Rossum: “My 10-line python scripts are just like everyone else’s except I wrote a script to interpret them.”

I must be on track to becoming an Alpha Geek, because I have a TODO.txt file myself (actually, it’s just called TODO, putting me a cut above you Windows-using geeks).

Interesting statistic from bradfitz, too: 8 LiveJournal entries every 10 minutes are private, that is, locked to the poster only. Who’s posting these? What sorts of things are you writing in them?

In other news, Paul Vixie wrote a message to the DCC mailing list which nicely summarises my attitude to NTL (he’s actually writing about RoadRunner’s spamming problems, but NTL’s reliability problems with mail and news seem to have a similar solution).

When you get your shiny new cable modem, you usually configure your mail program to send email via your ISP’s server at (or whatever). then sends on your mail to the destination at its leisure (or in NTL‘s case, doesn’t). There was no particular reason why a clever enough computer couldn’t just connect to the destination directly, especially if it’s a computer which is left on most of the time, so that if the destination is down or busy, it can try again later. This is what my computer did. But now lots of servers are blocking mail sent by my machine. This is because of spam.

Know, O King, that the modern porn’n’pills spammer uses open proxies to send email advertising his website. His website is hosted in China or Brazil (Spammy himself is actually a resident of Florida, and the mail originates from his machines in China, but the trail goes cold at the proxy, so it’s hard to prove this). Most of these open proxies are on machines connected to cable modems. Sometimes the proxy has been installed without the owner’s knowledge, perhaps by one of these “virus” things you Outlook users are so keen on. Sometimes, the owner installed the proxy themselves to share their cable connection with a local network, but misconfigured it. Misconfiguration is easy when your chosen software is insecure by design. Marc Thompson, author of the AnalogX proxy, must surely be a prime candidate for first trials of makali and jwz‘s famed audio-cock technology.

But, anyway, the solution adopted by some servers is to block any cable modem (or technically, any machine with a dynamic IP address) from sending them mail directly. That’s why my mail bounces: my IP address is on a list of dynamically allocated IPs. I can advocate that the admins use the Spamhaus XBL instead, since that only lists the addresses of insecure machines. But then someone will point out that my address is right next door to someone who is compromised, and, being a dynamic address space, I could get that address tomorrow.

So, I’m going to start using Gradwell‘s machines to relay my mail (they’ll let me do this as they also host my domain and incoming mail). They’re a lot more clued up than NTL, so their relay machine will probably be up most of the time and will probably ensure my email reaches its destination. But still, it’s a shame. It takes that little bit of control away, as I can only tell when something has left here, not when it’s been finally received. And it breaks something that wouldn’t need to be broken, were it not for those pesky spammers.

LJ has started publishing FOAF information for all its users. The main use of this is obviously to draw pictures. Click the small piccy for a bigger one.

Purdy, ain’t it? This is a graph of the connections out to friends-of-friends level. Anyone at friends-of-friends level without at least 2 lines into them was pruned. The graph came from GraphViz, the data from LJ using a Python script and rdflib. I’m not desperately keen on unleashing the script on an unsuspecting world, as it doesn’t do polite things like caching fetches between runs (although it doesn’t fetch the same person twice in a single run), but if you’d like to see it, let me know.

I also went through the names on PaulB’s photo page and produced another graph from memory, but it’s much less connected than I remember. Anyone know what happened to the original?

Boingboing contains some interesting stuff today. There’s an article defending Ikea against that Fight Club scene: “If your life is mediocre, I promise you, Ingvar Kamprad didn’t make it that way”.

Spammers are apparently getting other people to solve captchas, the little puzzles you have to do to get free email or LiveJournal accounts, using the lure of free porn. There must be other ways we can harness the pornotropic lusers on the Internet for good rather than ill, in a human parallel to distributed computing efforts like SETI at Home.

Meanwhile, the author of the DeCSS Haiku is unmasked, and says “I set myself a strict rule against using hexadecimal constants, because they seemed unpoetic.” My own slim volume, entitled unsigned long letters;, will be available soon.

It’s snowed here. There may be pictures later.