Sie sind auf Seite 1von 28

By

BHARATH B S
4VV05CS009
Agenda

 Definition
 Background
 Types
 Applications
 Constructing CAPTCHAs
 Breaking CAPTCHAs
 Issues with CAPTCHAs
 Conclusion
Intro

 CAPTCHA Completely Automated


Public Turing test to tell Computers
and Humans Apart

 Invented at CMU by Luis von Ahn,


Manuel Blum, et. al

 A program that is a challenge –


response test to separate humans
from computer programs
 Generic CAPTCHAs distort letters and
numbers

 Distorted characters are presented


to user

 User has to recognize the distorted


letters

 If the guessed letters are correct, the


user is inferred to be a human and
allowed access
 Humans can read the distorted and
noisy text

 Current OCRs cannot read them


Background

 Why CAPTCHA was needed?

 Sabotage of online polls


 Spam emails
 Abusing free online accounts
 Tampering with rankings on
recommendation systems (like EBay,
Amazon)
 Altavista first used a crude CAPTCHA
in their sites

 Resulted in 95% spam reduction

 Yahoo partnered CMU to counter


these threats in Messenger chat
service.

 Luis von Ahn and Manuel Blum of


CMU trademarked CAPTCHA in 2000
 What is a Turing test?
 Proposed by Alan Turing
 To test a machine’s level of intelligence
 Human judge asks questions to two
participants, one is a machine, he
doesn’t know which is which
 If judge can’t tell which is the machine,
the machine passes the test
 CAPTCHA employs a reverse Turing test,
judge = CAPTCHA program,
participant = user
if user passes CAPTCHA, he is human
if user fails, it is a machine
Types of CAPTCHAs

 Text based:

 Simple, normal language questions:


 What is sum of three and thirty-five?
If today is Saturday, what is day after
tomorrow?
 Which of mango, table, water is a fruit?
 Very effective, needs a large question
bank
 Cognitively challenged users find it hard
 Gimpy:
 Designed by Yahoo and CMU
 Picks up 10 random words from dictionary
and distorts, fills with noise
 User has to recognize at least 3 words
 If user is correct, he is admitted
 EZ-Gimpy:
 A modified version of Gimpy
 Yahoo used this version in Messenger
 Has only 1 random string of characters
 Not a dictionary word, so not prone to
dictionary attack
 Not a good implementation, already
broken by OCRs
 MSN’s Passport service CAPTCHAs:

 Provided for Microsoft’s MSN services


 Use 8 characters
 Warping is used to distort
 Very strong implementation, hasn’t been
broken
 It is segmentation-resistant
 Graphic based CAPTCHAs:

 BONGO:
 After M.M.Bongard, pattern recognition
expert
 User has to solve a pattern recognition
problem
 Has to tell the distinct characteristic
between two sets of figures
 Then tell to which set a given figure
belongs to
 PIX:
 Uses a large database of labelled images
 It shows a set of images, user has to
recognize the common feature among
those
 E.g., Pick the common characteristic
among the following four
pictures-----”Aeroplane”
 Audio CAPTCHAs:
 Consist of downloadable audio clip
 User listens and enters the spoken word
 Helps visually disabled users
 Below is the Google’s audio enabled
CAPTCHA
 Not popular
Applications

 Protect online polls

 Prevent Web registration abuse,


protect passwords from brute-force
attack

 Prevent comment spam and spam


emails

 E-Ticketing, prevent scalping


 Verify digitized books: reCAPTCHA
 Used in Google Books Project
 Two words are shown, the program
knows first word
 If user enters first word correctly, it
assumes that the second unknown word
will also be entered correctly
 Second word becomes “known”
 Help advance AI knowledge

 CAPTCHAs are called Hard-AI problems


 A win-win scenario:
 If CAPTCHAs are broken by a bot, a Hard-
AI problem is solved
 If its not yet broken, then current
implementation is able to withstand
attacks

 Thus AI knowledge is advanced if


CAPTCHAs are broken
Constructing CAPTCHAs

 Things to keep in mind:


 Don’t store CAPTCHA solution in Web
page’s metadata

 A CAPTCHA is no good if it doesn't


distort

 Need a large database of different


CAPTCHA questions

 Avoid repetition of questions


 CAPTCHA Logic:

 Generate the question

 Persist the correct answer

 Present the question to user

 Evaluate answer, if incorrect, start


again-- Generate a different CAPTCHA

 If correct, allow access to user


 Embeddable CAPTCHAs:
 Available freely, just embed code into
Web page’s HTML, from e.g.,
www.recaptcha.net
 No maintenance

 Custom CAPTCHAs:
 Fits to the theme of the page
 Better protected from spammers

Can be written in any language– Perl,


.NET, ASP, JavaScript
 Guidelines:
 Accessibility

 Image security

 Script security

 Security after widespread adoption

 Custom implementation or a general


CAPTCHA?
Breaking CAPTCHAs

 Cracking CAPTCHAs through


programs

 Convert CAPTCHA into greyscale


 Detect patterns in the image
corresponding to characters
 Or, read session files of that user and
know the CAPTCHA word
 Solution: Only store a hash of the
CAPTCHA word in session files
 Greg Mori and Jitendra Malik have
broken text CAPTCHAs, e.g., Ez-
Gimpy
 To break this CAPTCHA 

 Segmentation: Locate possible


letters in the image 

 Construct graph of consistent


letters 

 Find out plausible words from


the graph, use scores to rank
roll=11.94, profit=9.42 (better match)
 Social engineering to break
CAPTCHAs:
 Spammer encounters a CAPTCHA
 That CAPTCHA is copied to another site
 Humans are baited, e.g., free MP3s
 To get those MP3s, users are told to
solve the copied CAPTCHA
 Solution is routed to the spammer
 Solution: Fix a time-to-live period for a
question

 CAPTCHA cracking as a business:


 Firms offer CAPTCHA cracking service in
exchange for money
Issues with CAPTCHAs

 Usability issues:
 W3C mandates Web to be accessible to
all people
 Some CAPTCHAs are inaccessible to
visually impaired, cognitively challenged
people

 Compatibility issues:
 JavaScript may need to be activated in
browsers
 Some may need Adobe Flash plugin
Summary

 CAPTCHAs are an effective way to


counter bots and reduce spam
 They serve dual purpose– help
advance AI knowledge
 Applications are varied– from
stopping bots to character
recognition & pattern matching
 Some issues with current
implementations represent
challenges for future improvements

Das könnte Ihnen auch gefallen