Google : Harvesting the masses

Google turned 18 last year, and is now an adult. You may ask: Do we really care? Well, the answer is “Yes, we do…” That’s because, Google is everywhere and whether we like it or not, its presence cannot be denied; in fact few companies have had such a huge impact on our daily lives as Google. Other than providing anything asked, some people believe they have an hidden agenda, and as it turns out, they were right.

Harvesting the brain power of 750 million internet users to digitize books.

If you haven’t been living under a Dwayne Johnson for the last decade, you’ve probably seen this:

260px-FancyCaptcha_screenshot

It’s called a Captcha and it was used to make sure that a website user is actually human, because computers had (and to an extent, are still having) a tough time reading text like that. It literally stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart.”

And when thousands of websites started using it in the mid 2000s to ensure bots don’t screw them over, roughly 200 million captchas were typed every single day. And assuming each one takes about 10–12 seconds, that’s approximately a gazillion minutes of essentially wasted human effort.

Luis Von Ahn, one of the founders of Captcha, explains in this amazing TEDx talk that he then realized that that massive human effort could be put to use to digitize all books, to make them searchable and easily accessible.

So, they built reCaptcha. You may have noticed that at some point, you had to type two words instead of just one:

images

Here, one word is used to genuinely check if you’re human. And the other word is from a scanned copy of an old book and is showed to you to basically ask you to digitize it.

Google then acquired it in 2009. And guess what? ReCaptcha was used to completely digitize all books on Google Books, to make them searchable. We all did it together. And we did it for free!

And Google has moved on to image recognition using reCaptcha to get labeled data sets for its AI research:

200px-Images_Recaptcha

And we are still doing it for free. Time to add “Senior Data Labeling Expert at Google Inc” to the resume, amirite? 😛

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s