Technology news and Jobs
Information Technology News
CAPTCHAs make up for OCR shortcomings
Information Technology News
CAPTCHAs make up for OCR shortcomings | CAPTCHAs make up for OCR shortcomings |
|
| by Stephen Withers | |
| Monday, 28 May 2007 | |
|
A group at Carnegie Mellon University has come up with a way of helping with efforts to digitise books at the same time as allowing web sites to prove that a user is a human rather than a piece of software. The reCAPTCHA system is a variation on the widely used CAPTCHA method of verifying human users by asking them to type in distorted or otherwise obscured words or other sequences of characters. reCAPTCHA presents users with a dual CAPTCHA. It 'knows' the answer to one, but the other is an unrecognised word obtained from scanning and OCRing a book. The reasoning is that if people can correctly recognise the known CAPTCHA, then their response to the unknown word will also be correct. Once people give the same answer, it is taken as definitive. This is a doubly clever idea. In addition to getting some useful work out of a chore many of us perform several times a day (tens of millions of CAPTCHAs are thought to be decoded daily), the fact that the words used have already proved resistant to OCR makes them good candidates for CAPTCHAs. The reCAPTCHA project is currently helping digitise books from the Internet Archive. In addition to plugins for for popular systems and languages including WordPress, phpBB, PHP, Perl and Ruby, the project also offers reCAPTCHA Mailhide, a way of concealing email addresses on even simple web pages. CAPTCHA is an acronym - possibly back-formed - for Completely Automated Turing Test To Tell Computers and Humans Apart. CAPTCHAs are often used to ensure that only humans submit comments to web sites, sign up for accounts, vote in online polls, and perform other activities. reCAPTCHA is run by the original creators of CAPTCHA.{moscomment} |
| < Next story in category | Previous story in the category > |
|---|









Tags




