Captchas Considered Harmful – Why Captchas Are Bad And How You Can Do Better

Captchas Considered Harmful – Why Captchas Are Bad And How You Can Do Better

What Is A Captcha?

If you’ve ever joined a large website, then you’ve seen a captcha. Twitter uses it, Facebook uses it, and WordPress uses it. Captchas are small test that you must complete in order to prove that you are human. Usually, the test involves looking at some text and typing that text into a near-by box.

What Are Captchas Meant To Do?

The simple answer is keep your website spam free. That is the purpose of a captcha. More generally, it aims to stop automated spamming, by assuming that every single user of the site is a robot until they prove otherwise.

Why Captchas Are Bad

On the surface, captchas look easy to use and interact with. Well, they only look that way to people who have perfect vision and an elastic mind. If you have these qualities, congratulations. If you don’t, here’s how captchas will be difficult for you and your users:

  • Warped letters are hard for users to read. Combine that with a strike-through, and you have a bad idea.
  • Users can’t tell the difference between 0 and o; and I, l, and 1, in certain fonts.
  • Blind people can’t use them. Even colour-blind people find some captcha implementations difficult to use.

Worst of all, even people with super sight and brilliant spatial awareness find captchas plain annoying. This drives away users, which drives down revenue.

How Can You Do Captchas Better?

The best way to do captchas better is to turn the concept on its head. Instead of asking every user to prove that he is human, you can trick every robot into telling you he’s an automated spamming machine. The easiest way to do this is to lay some traps.

Traps And Why They Work

In order to explain what a trap is, and how it’s going to stop your site from filling with spam, you first need to know how a spam-bot works. If you don’t know what you’re up against, then you will never defeat it.

The Two Kinds Of Robots

There are two kinds of spam-bots. The first kind is a replay-robot. This type of spam-bot records a form verbatim, and then posts the form with the fields filled in at a later date. It’s like a spam version of TiVo. Usually, the bot will do this again and again, literally replaying the recording it has made of the form.

The second kind of spam-bot is the form-filler. Form fillers crawl the web, find a form, insert the spam, and submit. Some form-fillers are dumb, and will put the same information everywhere. Others recognize common field names like username, email. A robot like that will place the spam in a text area, as this is usually where your users’ content goes in each post.

Traps For Form Fillers

The simplest way to trap a form-filling robot is to create a field which a human won’t (ideally can’t) fill in. You can do this by warning users not to fill in a certain field, but that’s just as bad as asking users to complete a captcha. Much better is to make your trap hidden. A form-filling robot doesn’t see the form, it reads it and parses it. You can make a field which is invisible to humans, but which is parsed by the robot. You can hide a field using CSS, or by changing the style of a field directly using JavaScript. As long as the visibility of the field is set in separately from the form, the robot will fill always in the field, and your users will not even know of its existence.

Catching Repeats

With replay robots you should capitalize on the fact that forms are collected once ahead of time. The forms are collected, and then, sometime later, they are replayed. You can record when a form was produced by including a date and time stamp with each form served. By comparing that date and time with the current date and time, your site can deduce whether or not an unreasonable amount of time has passed between form generation and form submission.

When The Robots Get Smarter

The obvious problem with adding a time stamp to your form is that once it becomes widely known as a tool for stopping spam, the people who program replay robots will generate good time stamps at the moment the form is replayed. The way around this is to encode the time stamp using a secret key. The data is decoded when the form is sent and then evaluated. It is highly unlikely that a replay-robot will be able to guess both your secret key and your encoding algorithm, providing you with security against spam, no matter how smart the robot gets.