CAPTCHA: Hard for Humans, Easy for Bots – PerimeterX
CAPTCHA: A Well-worn Approach to Bot Defense
For years, website owners have used a number of approaches and technologies to battle constantly evolving bot threats. One of the most common ways to battle bots has been to use CAPTCHAs, a challenge-response mechanism that promised an easy way to distinguish between a bot and a human. CAPTCHA is an acronym for completely automated public Turing test to tell computers and humans apart. Used in millions of sites, CAPTCHA is employed to help prevent bots from doing form submissions, executing logins and accessing sensitive pages or processes.
How CAPTCHA Has Evolved
As bot-based threats have evolved, so have the CAPTCHA mechanisms intended to stop them. In its early forms, users were asked to read distorted text and submit it in a form.
An example of one of the types of Google reCAPTCHAs that are most commonly used today.
Today, Google reCAPTCHA represents the dominant form of CAPTCHA technology in use. One study found that, across one million of the world’s top websites that employ CAPTCHA, Google reCAPTCHA was deployed by 94% of them.
How CAPTCHA Is Failing
In spite of its widespread, continued usage, there are two very fundamental problems with CAPTCHA:
User experience: From a user standpoint, as just about anyone alive can tell you, the experience is a poor one. It’s time-consuming, increasingly difficult, and can often keep legitimate users from doing what they want and need to do.
Efficacy: From a security standpoint, quite simply, it doesn’t work. The challenge is supposed to be easy for users, and hard for bots, but in fact, it’s become quite the opposite.
Following is an overview of the plethora of options available that make it easy to bypass CAPTCHA challenges.
How Attackers are Easily Bypassing CAPTCHA Challenges
There are a number of CAPTCHA-solving technologies and services available to attackers today. Attackers choose the solvers that work best against the type of CAPTCHA used on a target site. Here are two high-level categories:
Automated Technologies and Plug-ins
There is a range of automated technologies, including APIs, browser plug-ins and extensions that enable attackers to bypass or solve CAPTCHA challenges. Here are a few examples:
A group of researchers from Lancaster University, Northwest University and Peking University used the concept of a generative adversarial network (GAN) in order to create an extremely fast and accurate CAPTCHA solver.
There are several free online CAPTCHA solving services and libraries that leverage deep learning-based technologies, including GRIS, Alchemy, Clarifai and NeuralTalk. Academic studies show that deep-learning-based approaches are highly accurate in solving CAPTCHA challenges.
DeCaptcher is an example of one of the solving services available via APIs making it easy to integrate into applications. Based on an optical character recognition system, the service solves challenges and provides a file to download that details the time, the challenge image, and the text used to solve the challenges.
Open-source tools and browser extensions, including Buster and UnCaptcha, use audio recognition that was intended to help visually impaired users and abuses it to bypass CAPTCHA mechanisms in an automated fashion.
Human-assisted Solving Services
In addition, there are also human-powered services that are available. These services are often staffed by people who work in so-called farms. These services are easy to find via a simple Google search. These services make it cost-effective for attackers to bypass the object recognition challenges used in reCAPTCHA.
2captcha and anti-captcha are some of the most popular examples of such a service. At a high level, these services enable customers to submit target websites, often via an API, to the vendor. The vendor’s staff will solve the challenge and provide the solution back to the customer. These vendors advertise solving 1, 000 regular CAPTCHA challenges for as little as $1. 00, and 1, 000 reCAPTCHA challenges for between $1. 99 and $2. 99.
Increasing Prevalence and Usage of CAPTCHA Solvers
Given their low/no cost, availability and efficacy, the use of CAPTCHA solvers continues to grow. With our PerimeterX Bot Defender solution, we’ve detected a rapid expansion in the use of CAPTCHA solvers. As the diagram below illustrates, between August 2019 and March 2020, we saw a significant increase in the volume of attempted attacks that employed CAPTCHA solvers.
Given their accessibility and ease, the use of CAPTCHA solvers has grown rapidly.
It’s abundantly clear that users and businesses can’t stand CAPTCHA mechanisms that interrupt the user flow and ultimately lower conversions on websites. Particularly as artificial intelligence continues to improve, standalone visual-challenge-response approaches aren’t viable. Quite simply, organizations can’t rely solely on CAPTCHA-based mechanisms to combat bots, given the abundance of CAPTCHA solvers. These realities are exposing a very clear demand for advanced mechanisms that don’t frustrate users and are difficult for bots to solve.
Is it still reasonable to use CAPTCHA? – ResearchGate
Of course, nobody wants to use CAPTCHAs. They’re a necessary evil, just like the locks on the doors to your home and your PTCHAs are designed to discriminate between computer scripts from spammers and real human beings. There’s a popular misconception in technical circles that CAPTCHA has been “broken”:CAPTCHA, which stands for (C)ompletely (A)utomated (P)ublic (T)uring test to tell (C)omputers and (H)umans (A)part, works well for small sites but larger ‘community’ sites where there are multiple SPAM targets CAPTCHA only provides a false sense of security – it can be broken fairly easily and serious spammers are getting more sophisticated all the people actually believe that spammers can now “fairly easily” write scripts which use advanced optical character recognition to automatically defeat any online CAPTCHA though there have been a number of CAPTCHA-defeating proof of concepts published, there is no practical evidence that these exploits are actually working in the real world. And if CAPTCHA is so thoroughly defeated, why is it still in use on virtually every major website on the internet? Google, Yahoo, Hotmail, you name it, if the site is even remotely popular, their new account forms are protected by comment form of my blog is protected by what I refer to as “naive CAPTCHA”, where the CAPTCHA term is the same every single time. This has to be the most ineffective CAPTCHA of all time, and yet it stops 99. 9% of comment spam. I can count on two hands the number of manually entered comment spams I’ve gotten since I implemented it. Granted, Yahoo is more popular than my blog by many orders of magnitude. But it’s still strong evidence that moving the difficulty bar up even one tiny notch can be quite effective in reducing spam. I went from cleaning up comment spam every day to cleaning one per month. Big difference. I’ve been experimenting with improving the rendering algorithms in my CAPTCHA server control, and it’s interesting how fragile typical computer OCR really is. SimpleOCR has an online form that allows you to upload and OCR small greyscale TIF images. Here are the results of submitting a few standard 180×50 CAPTCHAs from my reworked rendering algorithm. Note that these CAPTCHAs all use the same font, Courier result Captcha image, no perturbation Standard CQXKN 5/5Captcha image, low perturbation Low perturbation KxT*2 3/5Captcha image, medium perturbation Medium perturbation acNx4 2/5Captcha image, high perturbation High perturbation Kc 0/5Captcha image, extreme perturbation Extreme perturbation (blank) 0/5Captcha image, low noise Standard, low noise (blank) 0/5I didn’t expect it to do well, but I was frankly surprised how poorly the SimpleOCR engine actually performed. Adding a tiny bit of noise or perturbation to the CAPTCHA text was all it took to break the OCR. I’m sure there are more advanced OCR engines out there that might be able to do somewhat better than the free SimpleOCR engine. Still, it’s unlikely that any OCR engine could beat high perturbation – where the characters are physically overlapping each other – plus a little background noise. And that level of CAPTCHA security is absolute overkill unless you happen to run one of the top 100 most popular sites on the internet. Furthermore, none of these are particularly difficult CAPTCHAs. The most extreme perturbation sample shown above is eminently “human solvable”, at least in my default settings for my new and improved CAPTCHA server control, a combination of …high contrast for human readabilitymedium, per-character perturbationrandom fonts per characterlow background noise… should be far more protection than most websites ptcha image, low noise, medium perturbation, varied fontsRemember, I use “naive CAPTCHA” with 99. 9% effectiveness. The “low” settings will be even easier to read than the defaults and may be more appropriate for your user course, OCR isn’t the only way to attack CAPTCHA. But the other scenarios for spammers “beating” CAPTCHA are even more far-fetched. The Petmail documentation explains:1. The Turing FarmLet’s say spammers set up a sweatshop to employ people to look at computer screens and answer CAPTCHA challenges. They get to send one message for each challenge passed. Assuming 10 seconds per challenge, and paying roughly $5 per hour, that represents $14 per thousand messages. A typical spam run of 1 million messages per day would cost $14, 000 per day and require 116 people working 24/ would break the economic model used by most current spammers. A recent Wired article showed one spammer earning $10 for each successful sale. At that rate, the cost of $14, 000 for 1, 000, 000 spam emails requires a 1 in 1000 success rate just to break even, whereas current spammers are managing a 1 in 100, 000 or even 1 in 1, 000, 000 sucess rate. 2. The Turing Porn FarmA recent slashdot article described a trick in which spammers run a porn site that is gated by CAPTCHA challenges, which are actually ripped directly from Yahoo’s new account creation page. The humans unwittingly solve the challenge on behalf of the spammers, who can therefore automate a process that was meant to be rate-limited to humans. This attack is simply another way of paying the workers of a Turing Farm. The economics may be infeasible because porn hosting costs money you’re not using CAPTCHAs because you think they’re compromised, then you’re too gullible for your own good. There’s absolutely no concrete data supporting any of these attack scenarios happening outside laboratory (read: infinite money and time) conditions. Just ask Google:Some captchas have been solved with more than 90% accuracy by scientists specializing in computer vision research at the University of California, Berkeley, and elsewhere. Hobbyists also regularly write code to solve captchas on commercial sites with a high degree of several Internet companies say their captchas appeared to be highly effective at thwarting spammers. “Researchers are really good, and the attackers really are not, ” says Mr. Jeske of Google, based in Mountain View, Calif. “Having these methods in place we find extremely effective against automated malicious attackers. “The real secret to CAPTCHA is that it hits spammers where they are most vulnerable: in the pocketbook. The minute you put up a computational barrier, the entire economic model of spam comes crashing if you’d prefer not to use CAPTCHA because it’s an inconvenience for the user, I can respect that. CAPTCHA isn’t the only way to block spammers. But give CAPTCHA its due: it was one of the original spam blocking measures used way back in 1997 by AltaVista. And, even more impressively, it’s still one of the most effective ways to block spam at its source today.
How CAPTCHAs work | What does CAPTCHA mean?
What is a CAPTCHA?
A CAPTCHA test is designed to determine if an online user is really a human and not a bot. CAPTCHA is an acronym that stands for “Completely Automated Public Turing test to tell Computers and Humans Apart. ” Users often encounter CAPTCHA and reCAPTCHA tests on the Internet. Such tests are one way of managing bot activity, although the approach has its drawbacks.
Although CAPTCHAs are designed to block automated bots, CAPTCHAs are themselves automated. They’re programmed to pop up in certain places on a website, and they automatically pass or fail users.
How does a CAPTCHA work?
Classic CAPTCHAs, which are still in use on some web properties today, involve asking users to identify letters. The letters are distorted so that bots are not likely to be able to identify them. To pass the test, users have to interpret the distorted text, type the correct letters into a form field, and submit the form. If the letters don’t match, users are prompted to try again. Such tests are common in login forms, account signup forms, online polls, and e-commerce checkout pages.
The idea is that a computer program such as a bot will be unable to interpret the distorted letters, while a human being, who is used to seeing and interpreting letters in all kinds of contexts – different fonts, different handwritings, etc. – will usually be able to identify them.
The best that many bots will be able to do is input some random letters, making it statistically unlikely that they will pass the test. Thus, bots fail the test and are blocked from interacting with the website or application, while humans are able to continue using it like normal.
Advanced bots are able to use machine learning to identify these distorted letters, so these kinds of CAPTCHA tests are being replaced with more complex tests. Google reCAPTCHA has developed a number of other tests to sort out human users from bots.
What is reCAPTCHA?
reCAPTCHA is a free service Google offers as a replacement for traditional CAPTCHAs. reCAPTCHA technology was developed by researchers at Carnegie Mellon University, then acquired by Google in 2009.
reCAPTCHA is more advanced than the typical CAPTCHA tests. Like CAPTCHA, some reCAPTCHAs require users to enter images of text that computers have trouble deciphering. Unlike regular CAPTCHAs, reCAPTCHA sources the text from real-world images: pictures of street addresses, text from printed books, text from old newspapers, and so on.
Over time, Google has expanded the functionality of reCAPTCHA tests so that they no longer have to rely on the old style of identifying blurry or distorted text. Other types of reCAPTCHA tests include:
General user behavior assessment (no user interaction at all)
How does an image recognition reCAPTCHA test work?
For an image recognition reCAPTCHA test, typically users are presented with 9 or 16 square images. The images may all be from the same large image, or they may each be different. A user has to identify the images that contain certain objects, such as animals, trees, or street signs. If their response matches the responses from most other users who have submitted the same test, the answer is considered “correct” and the user passes the test.
Picking out certain objects from blurry photos is a hard problem for computers to solve. Even advanced artificial intelligence (AI) programs struggle with it – so a bot will struggle with it as well. However, a human user should be able to do this fairly easily, since humans are used to perceiving everyday objects in all kinds of contexts and situations.
How do reCAPTCHA tests with a single checkbox work?
Some reCAPTCHA tests simply prompt the user to check a box next to the statement, “I’m not a robot. ” However, the test is not the actual action of clicking the checkbox – it’s everything leading up to the checkbox click.
This reCAPTCHA test takes into account the movement of the user’s cursor as it approaches the checkbox. Even the most direct motion by a human has some amount of randomness on the microscopic level: tiny unconscious movements that bots can’t easily mimic. If the cursor’s movement contains some of this unpredictability, then the test decides that the user is probably legitimate. The reCAPTCHA also may assess the cookies stored by the browser on a user device and the device’s history in order to tell if the user is likely to be a bot.
If the test is still unable to determine whether or not the user is a human, it may present an additional challenge, such as the image recognition test described above. However, most of the time the user’s cursor movements, cookies, and device history are conclusive enough.
How does reCAPTCHA work without any user interaction?
The latest versions of reCAPTCHA are able to take a holistic look at a user’s behavior and history of interacting with content on the Internet. Most of the time, the program can decide based on those factors whether or not the user is a bot, without providing the user with a challenge to complete. If not, then the user will get a typical reCAPTCHA challenge.
What triggers a CAPTCHA test?
Some web properties just automatically have CAPTCHAs in place as a proactive defense against bots. Other times, a test may be triggered if user behavior seems to resemble a bot’s behavior: if users request webpages or click hyperlinks at a far higher rate than average, for instance.
Are CAPTCHAs and reCAPTCHAs enough for stopping malicious bots?
Some bots can get past the text CAPTCHAs on their own. Researchers have demonstrated ways to write a program that beats the image recognition CAPTCHAs as well. In addition, attackers can use click farms to beat the tests: thousands of low-paid workers solving CAPTCHAs on behalf of bots.
Besides a CAPTCHA, there need to be other strategies in place for stopping unwanted bots (such as content scraping bots, credential stuffing bots, or spam bots).
What are the drawbacks of using CAPTCHAs or reCAPTCHAs to stop bots?
Bad user experience: A CAPTCHA test can interrupt the flow of what users are trying to do, giving them a negative view of their experience on the web property, and leading to them abandoning the webpage altogether in some cases.
Not usable for visually impaired individuals: The problem with CAPTCHAs is that they rely on visual perception. This makes them nearly impossible, not just for people who are legally blind, but for anyone with seriously impaired vision.
These tests can be fooled by bots: As described above, CAPTCHAs are not fully bot-proof and shouldn’t be relied upon for bot management.
Are there alternatives to using CAPTCHAs or reCAPTCHAs?
Bot management solutions such as Cloudflare Bot Management or Super Bot Fight Mode can identify bad bots without impacting the user experience, based on the behavior of the bot. This way, bots can be mitigated without forcing users to complete CAPTCHAs.
How are CAPTCHA and reCAPTCHA related to artificial intelligence (AI) projects?
As millions of users identify hard-to-read text and pick out objects in blurry images, that data is fed into AI computer programs so that they become better at those tasks as well.
In general, computer programs struggle with identifying objects and letters in different contexts, because context can change almost infinitely in the real world. For instance, a stop sign is a red octagon with white letters reading “STOP. ” A computer program could identify a shape-and-word combination like that fairly easily. However, a stop sign in a photo may look very different from that simple description depending on context: the angle of the photo, the lighting, the weather involved, and so on.
Via machine learning, AI programs can get better at overcoming these limitations. For the stop sign example, the programmer would feed the AI program a bunch of data on what is and is not a stop sign. For this to be effective, they need lots of examples of images with stop signs and images without stop signs, and they need human users to identify them until the program has enough data to be effective at it.
reCAPTCHA helps fill this need by getting humans to identify objects and texts, which slowly provides enough data to build robust AI programs.
What is a Turing test? How are Turing tests relevant to CAPTCHA tests?
A Turing test assesses a computer’s ability to mimic human behavior. Alan Turing, an early computing pioneer, invented the concept of a Turing test in 1950. A computer program “passes” the Turing test if its performance during the test is indistinguishable from that of a human – if it acts the way that a human would act. A Turing test is not dependent on getting answers correct; it’s about how “human” the answers sound, regardless of whether they’re right or wrong.
Although it’s called a “Public Turing test, ” a CAPTCHA is really the opposite of a Turing test – it determines whether a supposedly human user is actually a computer program (a bot) or not, instead of trying to determine if a computer is human. To accomplish this, a CAPTCHA needs to assign a brief task that people tend to be good at and computers struggle with. Identifying text and images usually fits those criteria.
Frequently Asked Questions about do captchas work
Are CAPTCHAs effective?
Some captchas have been solved with more than 90% accuracy by scientists specializing in computer vision research at the University of California, Berkeley, and elsewhere. … But several Internet companies say their captchas appeared to be highly effective at thwarting spammers.
Do CAPTCHAs really work against bots?
A CAPTCHA test is designed to determine if an online user is really a human and not a bot. … Although CAPTCHAs are designed to block automated bots, CAPTCHAs are themselves automated. They’re programmed to pop up in certain places on a website, and they automatically pass or fail users.
Do CAPTCHA tests work?
According to Microsoft research experts Kumar Chellapilla and Patrice Simard, humans have about an 80 percent success rate at solving any CAPTCHA, but machines only have a 0.01 success rate. Therefore, it is beneficial to use CAPTCHA in order to keep your website safe.Jan 9, 2019