For a while now, I’ve been getting comments on my LiveJournal which apparently aren’t spam, but rather are questions which are totally out of context. For instance, I got one the other day which said “Hi. I find forum about work and travel. Where can I to see it?”
I recently got some more comment spam advertising something called XRumer, a clever and nasty program for spamming bulletin boards and other forums (like LJ), which is brought to us by some evil Russians (“No Meester Bond, I expect you to die”). One of the things the authors claim it can do is a crude form of astroturfing. They say you can configure it to post a comment asking about something, and response apparently from another user mentioning the site you actually want to advertise. It looks like this feature doesn’t quite work, and that the questions I’ve been seeing are examples of it misfiring. Mystery solved.
The spammers seem to favour certain entries of mine, so I’m screening anonymous comments on those entries (and on this one too, since I imagine it might attract undesirables). I don’t want to do that for my entire journal, as I get comments from people who aren’t on LJ but who say worthwhile things. In an ideal world, the way round this would be OpenID, but that’s not in widespread use yet, possibly because people who have an OpenID often don’t know they do. [Attention LJ users: you have an OpenID. Congrats. You’ve got a Jabber instant messaging account, too. See how good bradfitz is to you?]
A system which allows easy communication between two people who have no previous connection to each other is susceptible to spam. The trick is to keep this desirable feature while not being buried in junk (you could go the other way and remove this feature, of course, as many some IM users have, or make a virtue of it with social networking sites, but that’s not really an option for public blogs). Anything an ordinary user might to do create an identity, a spammer can do too, so cryptographic certificates aren’t a magical solution. Legislation doesn’t help, because the police don’t care and anyhow, spammers are in Wild West states like China or Russia, or at least run front operations there.
Most spam is still sent via email. Email spammers have been subject to an evolutionary arms race. The remaining effective spammers are bright and totally amoral. They’ll hijack millions of other people’s computers to send their spam or even to host the website they’re advertising, making it hard for blacklists to keep up (and they’ll use these computers to flood centralised blacklist sites with traffic in an attempt to knock them off the net). They’ll vary the text they use, to defeat schemes which detect the same posting lots of times. They’ll use images rather than text, or simply links to those images, to defeat textual analysis. You can bet that blog spammers will learn from this (some of them are probably email spammers too).
What’s working for email spam, and will similar ideas work for blog spam?
- Banning mail sent directly from consumer ISP connections is the single most effective thing I do (you can do this with the Spamhaus PBL and with a few checks for generic rDNS to catch what the PBL misses). You can’t do that with blog comments, as spam or not, they almost all come from consumer ISP connections.
- Banning mail sent from IPs which are known sources of spam is also effective. You can do that with blog comments, but you either need to be big enough to generate your own list (as LJ might be) or have the resources to run a centralised list like Spamhaus (which will be attacked by spammers). There are currently no IP blacklists devoted to blog spamming, as far as I know, although some spam comments I’ve seen came from IPs which were in the Spamhaus XBL.
- Filtering on ways in which spamming programs differ from legitimate SMTP clients (greylisting, greet pause) is currently effective, but only as long as these methods don’t become so widespread that it’s worth the spammers’ while to look more like a legitimate sender. Still, this doesn’t seem that likely. Incompetent admins aren’t in short supply, and I don’t have to outrun the bear, only outrun them. This sounds promising against blog spammers. Apparently simple minded schemes are pretty effective.
What else can we do with a website that we can’t do on email?
- CAPTCHAs are popular, but a bit of a bugger if you’re blind. The evil Russians claim to have defeated most of the deployed ones which use obscured letters, though that still leaves the “click on the picture of a cat” variant.
- Proof-of-work or hashcash schemes are currently very effective, suggesting that blog spammers don’t yet have the huge amounts of stolen computing resources available to email spammers, or that they don’t have the knowledge to implement the hashcash algorithm in their spamming software. By using proof-of-work, we can at least drive the weak blog spammers to the wall.