gordon.dewis.ca - Random musings from Gordon



February 28, 2010 @ 14:15 By: gordon Category: Meta

image Gather ’round, boys and girls, for I am going to tell you the story of the CAPTCHAs. Once upon a time, the Internet was a wonderful place, full of intelligent people, and useful sources of information. People could have serious conversations in public forums and everything was good.

Then came things like AOL and the Freenets came along and suddenly this wonderful place started filling up with not-so-intelligent people. The useful sources of information became less useful as ads started to appearing in the middle of the conversation threads because some people thought it was acceptable to post their ads wherever they went. Some even went so far as to create programs that surfed the web and automatically posted their ads wherever they went.

The intelligent people who ran the sites didn’t like this, so they started coming up with ways to confirm that the messages being posted in conversations were being written by people and not ‘bots.

And thus the CAPTCHAs were born.

Completely Automated Public Turing tests to tell Computers and Human Apart, or CAPTCHAs, are challenges that only a human should be able to solve. Many CATPCHAs involve letters and numbers that have been distorted and obfuscated in such a way that someone looking at them can figure out what they are, but they could be OCR’ed. The people running ‘bots are forever trying to find ways to defeat CAPTCHAs, either by writing better OCR programs or getting humans to figure them out.

How the latter works is this: The ‘bot visits a website, grabs the CAPTCHA graphic and then puts it on a different website where people provide the answer to the CAPTCHA, either knowingly or unknowingly. People unknowingly answer the CAPTCHA because they think it’s the CAPTCHA for the bot’s website, which is probably “rewarding” them with porn. Once the CAPTCHA has been solved, the ‘bot then submits it to the original website.

image I’ve used a number of CAPTCHAs on my blog over the last few years, with varying success. For the last while, I’ve used the SI CAPTCHA Anti-Spam module for WordPress. Very little spam makes it through and those that do are almost certainly semi-automated ‘bots where the spammer is sitting there, manually answering the CAPTCHAs.

But one complaint I’ve had is that SI CAPTCHA can be a little too effective in that it sometimes presents CAPTCHAs that are difficult for even a human to decipher. There is a little refresh button to request a different CAPTCHA and you can also listen to the CAPTCHA.

image Because I’m a nice guy, I listen to the complaints and try to come up with solutions. Someone suggested the reCAPTCHA plugin, so I installed it yesterday afternoon. This is a popular CAPTCHA that presents you with two words that have been scanned from books and had things done to them, such as being blurred or “scribbled” on. You type in the two words and then are able to post your comment. A neat part is that you’re helping the people at reCAPTCHA digitize books with your answers.

Unfortunately, it appears that reCAPTCHA is too easy for the ‘bots because I was finding comment spam getting past it. Fortunately, the spam was getting caught by Akismet, so it wasn’t making it into my blog. But I don’t want to have to wade through a folder full of comment spam to make sure there aren’t any false positives every few hours.

So, I’ve switched back to SI CAPTCHA for the time being. Hopefully it won’t be a problem for too many people.

3 Responses to “CAPTCHAs”

  1. I notice that it’s physically larger now. That helps.

  2. Sean says:

    Captcha is a pain in the butt, but on my site I’ve made it as simple as I could. My captcha is not an image but rather a question with a culturally neutral question that anyone except a robot can answer – the captcha rotates among a series of questions and if I start getting spam, which I haven’t yet, then I’ll just change the questions.

    Found your site via the Ottawa start blog and link to your latest post about driving.

    • gordon says:

      That’s a good way to do it, actually. It’s just non-standard enough that the ‘bots probably won’t be able to handle it.


Leave a Reply