I think a lot of it also has to do with the community itself. A site of this size would probably get 300-400 spam messages a day if it weren't for the fact that it's audience would see right through it. Tech people are so concious of Spam that they ignore it out of principle which means spamming a tech site pointless.
As for suggestions...
1. Obviously CAPTCHA. It just makes sense
2. I find keyword blocking very effective. So, for example, if I was running Hacker News I'd block any news item containing the word Viagra that was submitted by a user that is under a certain feedback level (like, no feedback, for example). With one caveat which is to give them a way to manually verify it (say an e-mail sent to them that allows them to verify they are an actual person and have the item approved)
3. Use E-Mail Spam Block Lists. Lists like SBL, CBL and XBL give IP addresses that generate massive amounts of spam. Many of those same IP addresses generate web spam.
4. I've never been a fan of this paticular method because I think it's discriminatory to an extent I'm uncomfortable with but many places have special requirements for countries that are famous for spam generation (Russia, China, etc...) Like making users from those IPs jump through special registration hoops.
I don't see how captchas "just make sense", especially in the most common image-based incarnation. I have worked with visually impaired people and the most popular request was always "I want to something on this website, but they have a captcha I can't see (and occasionally an audio captcha that makes no sense), can you sign me up/comment for me/do whatever task?".
As a sighted person, I've even run across captchas that were impossible to decipher, both from some third party solution and from something like recaptcha, the latter which bothers me to no end because sometimes both words are ambiguous.
Whether or not they make sense depends on your audience and your site and your implementation of it.
Well you'd only do it on words that are almost certainly spam. Like Viagra or male impotence or...well, you get the picture. It works on the theory of "this word would almost never be used legitimately in a post so it's almost certainly spam"
I use this on my mail server and with 200 users I've yet to ever get a false positive.
> I use this on my mail server and with 200 users
> I've yet to ever get a false positive.
How do you know? I don't see how you would measure that; if you can figure out it is a false positive, you have discovered a better filter. You might get user complaints, but the absence of user complaints doesn't prove you have no false positives. (Although the presence of user complains could prove that you do.)
Also: The assertion that everything to do with viagra is spam makes it very difficult to have a discussion about viagra or spam. For example, this posting would be rejected.
If you read my initial post I said specifically that it can't be just a flat out block. What you do is stop it and send an e-mail to the person who posted it asking them to verify they are an actual person.
That's both why it works even if you want to discuss viagra and how you can tell if you are getting too many false positives.
As for suggestions...
1. Obviously CAPTCHA. It just makes sense 2. I find keyword blocking very effective. So, for example, if I was running Hacker News I'd block any news item containing the word Viagra that was submitted by a user that is under a certain feedback level (like, no feedback, for example). With one caveat which is to give them a way to manually verify it (say an e-mail sent to them that allows them to verify they are an actual person and have the item approved) 3. Use E-Mail Spam Block Lists. Lists like SBL, CBL and XBL give IP addresses that generate massive amounts of spam. Many of those same IP addresses generate web spam. 4. I've never been a fan of this paticular method because I think it's discriminatory to an extent I'm uncomfortable with but many places have special requirements for countries that are famous for spam generation (Russia, China, etc...) Like making users from those IPs jump through special registration hoops.
Hope it Helps!