Google Earned $800 Billion While Wasting 819 Million Hours of User Time on Human-Machine Tests Annually

10

While surfing the web, you’ve likely come across various human-machine tests, such as identifying distorted letters and numbers, selecting specific objects in images, or solving puzzles. These tests are collectively known as CAPTCHA. 

However, one of the most controversial and often infuriating verification systems is reCAPTCHA, Google’s own CAPTCHA system. Many people have expressed their anger online, claiming that the questions posed by reCAPTCHA are so difficult that they’ve considered abandoning the Internet altogether. And if you get the answer wrong, you’re forced to re-verify; after several attempts, it may even block your access. In reality, when Google’s reCAPTCHA asks you to answer these questions, you are also helping them perform free data labeling tasks.

reCAPTCHA’s predecessor was a large-scale collaborative project initiated by Carnegie Mellon University (CMU). The project aimed to digitize old books, but many of the texts were difficult for Optical Character Recognition (OCR) software to decipher. The solution was to involve the public, using the web to crowdsource this task. This led to the integration of these tasks into CAPTCHA tests on various websites.

In 2009, Google acquired this collaborative platform and expanded its use for other manual recognition tasks, such as digitizing Google Books and the New York Times archives. In 2012, reCAPTCHA took on a new challenge: helping to identify data from Google Street View, such as recognizing crosswalks, bicycles, and minivans—the very elements most of us have identified while using the service.

CAPTCHA helps stop bots from maliciously accessing websites, while reCAPTCHA leverages users’ time to assist in data mining tasks that improve Google’s AI. For instance, reCAPTCHA works hand-in-hand with Google Street View: securing the system on one hand, and helping to improve Google Maps on the other.

Later, reCAPTCHA developed into versions 2 and 3. Version 2 introduced the familiar “I’m not a robot” checkbox. Upon checking it, reCAPTCHA runs risk analysis algorithms to decide whether further questions are necessary. There’s also an invisible reCAPTCHA that doesn’t require users to click anything; it uses cursor movement to identify whether the user is a bot. But if any anomalies are detected, you may be presented with increasingly complex puzzles. Version 3, meanwhile, assigns users a score, but the criteria for this score remain vague—Google only mentions it’s based on behavioral characteristics.

Despite its evolution, reCAPTCHA has been the subject of much controversy. It has been criticized for its potential to collect user data. In 2020, Cloudflare, an internet infrastructure provider, raised concerns that Google might be using reCAPTCHA data for advertising purposes, prompting some companies to switch to hCaptcha, a more privacy-focused alternative. Over time, reCAPTCHA has become a black box that users don’t fully understand. They don’t know what personal data is being gathered when they click a box, solve puzzles, or move their cursor around.

Today, some companies are exploring alternatives to CAPTCHA. For instance, Apple’s 

Private Access Tokens allow authentication via an encrypted Apple ID account, bypassing the need for traditional puzzles. While this method saves time, it remains limited in use, and we still often find ourselves completing more challenging verification tests.

The issues surrounding reCAPTCHA suggest that it might be time for a new approach. Its effectiveness is waning, and the growing concerns about privacy and security signal that it may soon need to be replaced.

A 2023 study from the University of California, Irvine surveyed more than 3,600 internet users, and, unsurprisingly, many found these graphic recognition tasks frustrating. It took 5.5 times longer to complete these visual challenges compared to simply checking a box. The study concluded that, in terms of security, reCAPTCHA is no longer as effective—its primary value seems to lie in data collection. The researchers calculated that reCAPTCHA wastes 819 million hours of human time annually, which equates to $6.1 billion in wages. The value of the data it collects is estimated at $888 billion.