Ok, blog spam. It’s like regular spam, but dumber. The theory is if you post a comment to a popular blog that gets high google rankings, then when google spiders the popular blog, it’ll find the link back to the poster’s website, and increase their search engine ranking. The dumb part is that blog spam is pretty easily thwarted in a way. You can:
- disable comments
- moderate all comments
- moderate (or block outright) unregistered posters, and don’t allow people to register new accounts
Obviously, to varying degrees, these will turn off posters and reduce the appeal of having your say on whatever the topic at hand may be. Then there are other, more complex ways of dealing with the problem. This part is actually interesting to me. Ok, spam is effective–why? Because it’s so cheap to produce, it doesn’t take many responses to turn a profit. But its only cheap when it is automated. Over the past few years, spammers have gotten more devious about their livelihood. Some apparently come up with tools to automatically spider the web and post comments wherever they find a blog. You look at the HTML, parse the titles of fields in a form and it’s fairly straightforward programmatically to then fill it in.
So how do you let real people in, but keep spam bots out?
There’s a lot of ways actually. I personally like a tool called
. Bad Behavior hooks into all incoming http requests and inspects the headers. Bots tend to do things that humans don’t. For example when I browse to a website, my web browser is going to tell the server on the other end, “I’m using Firefox version 1.5, and GET /index.html for me please”. Well a spam bot might just say, “I’m not telling you what software I am, and I want /index.html, and right after, I want to POST this bogus comment”. You can start to see where the differences occur there.
As I indicated, there’s a lot of ways. You can check an RBL of known spammer IP addresses. You can moderate only the first comment with a given name, IP, website, or whatever other criteria your blog may request, and then after you hand-verify a post is good, you can automatically approve those that follow. Hmm what else. You can check to see how often a given IP address or range of IP addresses is hiting you, and if its is too often, block them. You can embed custom code inside your submission forms, code that the bot won’t know how to handle. Oh, you may have seen “captchas” by now. That’s a picture of a word or random characters and you, being the human being, can type what the picture says whereas a bot cannot.
All of these have pros and cons of course. I think a sort of heuristic analysis approach like BB is the best in terms of having the greatest bang for your buck. Anyway, that’s what I use.
But what happens (HERE IS THE POINT OF THE POST, DOWN HERE AT THE END OF THE ARTICLE YOO HOO) when a human is doing the spamming? Check this crap out:

(it’s an image dummy, you can’t click on it)
By itself I would think nothing of this and quickly approve it. I remembered the post this referring to, and instantly knew that the comment was perfectly on topic. But here’s the deal. I keep getting these normal-looking comments on old posts! And this did not happen before a week or two ago. There is no reason my blog is suddenly more super-popular today than it was two weeks ago. Nobody ever comments on old blogs–only spam bots do. I can only assume that this is spam!
Just for the heck of it, I followed the link to see what it was, then did some backtracking and looking around to notice that this website could probably use a little google juice.
I dunno. Maybe I’m being too paranoid.

There’s another approach, that proves to be very effective. You can set up your blog to automatically add a ‘rel=”nofollow”‘ attribute to the links in the comments. When this attribute is present, Google’s (and Yahoo’s, and MSN’s) spider does not follow the link. In a more general fashion, it is to be used whenever you (the webmaster) do not explicitely guarantee that the link’s target is ok.
Strictly speaking, this does not _block_ spammers, but it makes spamming completely pointless. Yes, this applies to regular people as well, but hey, people do not expect their pagerank to be increased when leaving a comment, do they?