Ieee - 01414527

A TEXT-GRAPHICS CHARACTER CAPTCHA FOR PASSWORD
AUTHENTICATION
Matthew Dailey ana Chanathip Namprempre
Sirindhorn International Institute oI Technology and
Electrical Engineering Department, Faculty oI Engineering
Thammasat University
Patumtani, Thailand 12121
Email: mdaileyOsiit.tu.ac.th, nchanathOengr.tu.ac.th
ABSTRACT
We propose a new construct, the Text-Graphics Char-
acter (TGC) CAPTCHA, for preventing aictionary at-
tacks against passwora authenticatea systems allowing
remote access via aumb terminals. Passwora authen-
tication is commonly usea for computer access con-
trol. But passwora authenticatea systems are prone
to aictionary attacks, in which attackers repeatealy
attempt to gain access using the entries in a list of
frequently-usea passworas. CAPTCHAs (Completely
Automatea Public Turing tests to tell Computers ana
Humans Apart) are currently being usea to prevent au-
tomatea 'bots` from registering for email accounts.
They have also been suggestea as a means for prevent-
ing aictionary attacks. However, current CAPTCHAs
are unsuitable for text-basea remote access. Our TGC
CAPTCHA lls this gap. In this paper, we dene the
TGC CAPTCHA, prove that it is a (secure) CAPTCHA,
aemonstrate its utility in a prototype basea on the SSH
(Secure Shell) protocol suite, ana proviae empirical ev-
iaence that the test is easy for humans ana hara for
machines. We believe that the system will not only help
improve the security of servers allowing remote termi-
nal access, but also encourage a healthy spirit of com-
petition in the elds of pattern recognition, computer
graphics, ana psychology.
1. INTRODUCTION
Password authentication is one oI the most common
building blocks in implementing access control. Each
user has a relatively short sequence oI characters com-
monly reIerred to as a passwora. To gain access, the
user provides his/her password to the system. Access
is granted iI the password is correct; it is denied other-
wise.
A common attack against password authenticated
systems is the aictionary attack. An attacker can write
a program that, imitating a legitimate user, repeatedly
tries diIIerent passwords, say Irom a dictionary, until it
nds one that works.
There are several well-known ways to cope with dic-
tionary attacks. For example, the system can deny ac-
cess Ior the user in question aIter some number oI tries,
a technique known as account locking. However, this
invites a denial oI service attack: an attacker can lock
0-7803-8560-8/04/$20.00 c 2004IEEE
anyone out oI the system by submitting a sequence oI
incorrect passwords on behalI oI the victim. Other so-
lutions also have their own shortcomings |1|.
In this paper, we present an alternative deIense
against dictionary attacks. The idea is to make it harder
Ior automated programs to mount dictionary attacks
by requiring the attacking programs to pass a test that
is easy Ior humans but is hard (in terms oI accuracy
and/or compute time) Ior computer programs. A con-
struct with this property is called a CAPTCHA (Com-
pletely Automated Public Turing test to tell Computers
and Humans Apart) |2|. In particular, iI there exists a
program that can pass a CAPTCHA with high proba-
bility, then that program can be used to solve a hard AI
problem. CAPTCHAs are already in use in some sys-
tems that benet Irom distinguishing between humans
and 'bots |3|. Recently, they have also been suggested
as a means Ior deterring dictionary attacks in password
authenticated systems |1|.
One CAPTCHA suitable Ior password authenticated
systems displays a degraded image oI a word to the
human, who then responds by typing the word he
or she sees. A similar CAPTCHA using a sound
clip instead oI an image can also be used. How-
ever, these CAPTCHAs are not suitable Ior systems
that allow remote access via consoles or dumb ter-
minal programs. Our goal is to make it possible Ior
such minimal systems to obtain the same benets Irom
CAPTCHA-assisted password authentication as sys-
tems with graphical displays and/or speakers.
To this end, we propose a new CAPTCHA based
on Text-Graphics Characters. We call it the TGC
CAPTCHA. A text-graphics character is an image oI
a character rendered on a text-only screen, namely a
screen in which ASCII characters are used to represent
pixels. UnIortunately, images rendered on a text-only
screen are necessarily low resolution. This limits the
number oI characters that can be put on a single screen
without hindering recognition. In Iact, we nd that on
an 80x24 screen, it is unreasonable to display more
than one or two distorted English characters. With-
out the contextual inIormation aIIorded when complete
words are displayed, humans have a harder time rec-
ognizing what is on each screen. Additionally, neither
color nor grayscale are usually available to help provide
visual clues. These limitations together make it diI-
cult to generate text-graphics characters that are easy
Ior humans but hard Ior computers to recognize.
45
CONTRIBUTIONS. First, we precisely dene the TGC
CAPTCHA and prove that it is indeed a (secure)
CAPTCHA according to the denitions in |2|. SpeciI-
ically, we oIIer a reductionistic prooI to show that, iI
a particular class oI problems believed to be hard is in
Iact a hard AI problem as dened in |2|, then our TGC
CAPTCHA is secure. As such, the term 'prove here
is used in the same sense as that in the area oI provable
security |4|, namely that the security oI a construct, i.e.
the CAPTCHA, is proved based on the security oI a
primitive, i.e. the corresponding hard AI problem.
Second, we also implement a prototype as part oI
the popular Secure Shell (SSH) protocol suite by in-
corporating our CAPTCHA into the user authentica-
tion mechanism oI SSH, thus hardening SSH servers
against dictionary attacks. The prototype is available
as a source code patch Ior OpenSSH 3.6.1 at |5|.
Finally, we provide empirical evidence that our
CAPTCHA is easy Ior humans and relatively hard Ior
machines by running the CAPTCHA test against hu-
man subjects and an Optical Character Recognition
(OCR) program |6|, respectively. The results are en-
couraging: on a character-by-character level, humans
achieve 95 accuracy aIter two practice trials whereas
the OCR program`s accuracy is less than 35. We be-
lieve that improving the OCR system`s accuracy to ap-
proach that oI humans will require more eIIort on the
part oI OCR system designers, and in any case, attack-
ers using OCR systems to mount automated dictionary
attacks will require a substantial amount oI total com-
pute time.
THE TGC CAPTCHA. Our CAPTCHA presents a se-
quence oI / distorted characters, one at a time, to the
user and accepts only correct responses (/ = 8 in our
implementation). To make the test easy Ior humans,
we use only uppercase English characters. For each
character, we pick n
J
'distracters and apply several
operations to both the character and the distracters be-
Iore laying the Iormer over the latter. The operations
are scaling, rotation, translation, and 'row sliding, an
operation involving horizontally moving each pixel in
incremental steps, row-wise. The characters, the dis-
tracters, the operations, and all oI the parameters are
chosen at random Irom congurable ranges. The re-
sulting image is then displayed to the user.
THE PROTOTYPE IN SSH. All our modications are com-
pliant with SSH specications |7|. The server pro-
gram, when congured with CAPTCHA support, in-
Iorms the client that CAPTCHA-based authentication
is a valid method. The client, when congured with
CAPTCHA support, requests CAPTCHA authentica-
tion. The server then transmits a sequence oI trans-
Iormed images oI characters to the client. The client
displays the received images one by one and collects
responses Irom the user beIore sending them all back
to the server with the user`s password (thus minimizing
the impact oI network delays). The server grants access
to the user only iI he/she both passes the CAPTCHA
test ana enters the correct password. The server denies
access otherwise.
FEATURES OF THE TGC CAPTCHA AND THE PRO-
TOTYPE. The TGC CAPTCHA provides a deIense
against online dictionary attacks without resorting to
other countermeasures, e.g. account locking, with well-
known disadvantages. Also, as discussed in |1|, even
iI solving the underlying AI problem, in this case the
problem oI recognizing distorted text-graphics charac-
ters, turns out to be easy Ior OCR systems, it would
still be useIul to use the TGC CAPTCHA in password
authenticated systems. The reason is that an attacker
mounting an online dictionary attack still needs both to
solve the CAPTCHA ana to nd the password in order
to login successIully. In this situation, our approach
reduces to pricing via processing, a paradigm origi-
nally proposed to combat senders oI bulk junk email
('spam) |8|. In order to mount a dictionary attack, the
adversary must perIorma moderately complex compu-
tational task on each password authentication attempt.
Thus, even iI the attacker only spends a Iew compute
cycles on each trial, the cumulative eIIect becomes sig-
nicant over the many trials required Ior a dictionary
attack.
We stress here that the use oI the TGC CAPTCHA
in password authentication is only one possible ap-
plication oI our CAPTCHA. Like other CAPTCHAs,
the TGC CAPTCHA can be used in any application
in which it is useIul to distinguish human users Irom
bots |2, 3|. Furthermore, the TGC CAPTCHA extends
these benets to applications with console-based inter-
Iaces. For example, it can help prevent bots Irom sign-
ing up Ior Iree email accounts or help conduct online
polls via, say, text-based web browsers. Search engines
that disallow bots by requiring graphical CAPTCHAs
to be recognized will no longer end up automatically
rejecting legitimate users who access the sites via a
text-based interIace, iI the TGC CAPTCHA is made
available to those users.
RELATED WORK. von Ahn et al. |2| were the rst
to propose the concept oI CAPTCHAs. They Ior-
mally dened the CAPTCHA construct and its secu-
rity notion and specied two classes oI AI problems.
Our CAPTCHA is an instance oI their second problem
class.
Pinkas and Sanders proposed the use oI Reverse Tur-
ing Tests (RTTs), constructs similar to CAPTCHAs, to
cope with dictionary attacks |1|. They Iocus on us-
ability and scalability by using a persistent data struc-
ture and by requiring users to solve RTTs only some
Iraction oI the time, respectively. Xu et al. proposed
the use oI character recognition to separate humans
Irom machines |9|. However, they did not assume au-
thenticated, replay-resilient channels. Consequently,
the communication had to be protected via message
authentication codes and serial numbers, timestamps,
or state inIormation. In contrast, our CAPTCHA
challenges and responses are sent over SSH channels
which are already encrypted, authenticated, and replay-
resilient |10|. This dramatically simplies our proto-
col.
NOTATION. Let / be a positive integer. We denote by
a
, . . . , a
|
A the act oI sampling each element a

I
uniIormly and independently Irom the set A.
46
Fig. 1. ReIerence image set .
2. TEXT-GRAPHICS CHARACTER CAPTCHA
We call our CAPTCHA a Text-Graphics Character
(TGC) CAPTCHA. We present it here Iollowing the
Iormalization in |2|. Let be a set oI images oI all
upper case English characters, be a set oI transIor-
mations on images, be the map Irom an image oI a
character to the (ASCII ID oI) the character portrayed
in the image, and and / be the security parame-
ters. A TGC CAPTCHA instance is a tuple =
( , , , /) dening the Iollowing test. First, the
verier (i.e. server) draws i
, . . . , i
|
O`, D`
and t
, . . . , t
|
. Then, it sends to the prover

(i.e. user) the transIormed images t
(i
), . . . , t
|
(i
|
)
and sets the timer Ior . The prover responds with the
labels |
, . . . , |
|
, each oI which is (the ASCII ID oI) a
character in the English alphabet. The verier accepts
iI |
= (i
) Ior all 1 , / and iI the timer has not

expired. It rejects otherwise.
We describe here our choices Ior the sets and .
The reIerence image Ior each character, Irom the stan-
dard X Window System '9x15 Iont, is shown in Fig-
ure 1. The transIormation process involves the Iollow-
ing steps. First, n
J
distracters are chosen uniIormly
with replacement Irom the set oI all distracter images.
In our implementation, we use n
J
= samples Irom
a set oI 26 9x15 bitmaps that share some Ieatures with
English letters but are easily classied as non-letters
by humans. AIter converting each target and distracter
bitmap to a 9x15 ASCII charmap, we perIorm the Iol-
lowing operations on each charmap:
1. Scale: The charmap is scaled by a random Iactor
between 1.3 and 1.7.
2. Rotate: The charmap is rotated by a randomangle
between -20 and 20 degrees.
3. Translate: The charmap is translated to a random
location on the n
t
n
:
screen with the constraint
that the entire character must still be visible.
4. Row Slide: Each row oI the charmap is optionally
slid leIt or right relative to the row above. We shiIt
leIt with probability 0.33, right with probability
0.33, and otherwise do not shiIt.
In our implementation, all oI the random samples
are Irom a uniIorm distribution and rely on the stan-
dard C library random() Iunction (seeded with the
system time). Finally, the transIormed target and dis-
tracter images are overlaid as Iollows. Each overlay is
given an opaque whitespace boundary, and the target
image is laid down last, to ensure that the target stands
out clearly Irom the background.
We believe that this set oI transIormations gives suI-
cient variability in the output images to make recog-
nition Iairly diIcult Ior machines while preserving the
necessary property that it is easy Ior humans. Experi-
enced users (e.g. the authors) consistently classiIy the
characters with 100 accuracy, and as we shall see
in Experiment 1 below, even naive users perIorm well
enough to make our construction practical Ior deploy-
ment in real systems.
3. CAPTCHA-BASED PASSWORD
AUTHENTICATION
The TGC CAPTCHA can complement any interactive
password authentication scheme, so long as a suI-
ciently large text console is available Ior displaying the
CAPTCHA. In a TGC CAPTCHA-enabled authentica-
tion session, the user is challenged by a CAPTCHA test
and asked to enter his/her password. II the user passes
both the CAPTCHA test and enters the correct pass-
word, access is granted. Otherwise, access is denied.
We have implemented a prototype TGC CAPTCHA
password authentication method compatible with the
SSH user authentication protocol |7|. The implementa-
tion is based on OpenSSH 3.6.1, but the method could
easily be incorporated into any SSH-compliant client
or server. Due to the lack oI space, we leave the de-
scription oI the steps involved in CAPTCHA-enabled
password authentication exchange in the Iull paper |5|.
4. THEORETICAL RESULT
We instantiate a problem Irom the class 12
,
oI |2|
and reduce the security oI the TGC CAPTCHA to the
hardness oI the problem. We describe roughly the de-
nition oI 12
,
and some key terms. See |2| Ior more
precise denitions.
The Iamily oI AI problems 12
,
is indexed by
the distributions oI images and and by the solution
which is a map oI images to their corresponding la-
bels. Intuitively, it is the Iollowing problem: given a
transIormed image t(i) where i and t , nd
the label (i). A problem is ( , )-hara iI no current
program can solve it with probability at least in time
at most . A test is ( )-human executable iI at least
an portion oI the human population can pass it with
at least success probability. A ( , )-CAPTCHA
is a test that is ( )-human executable and has the
property that, iI an algorithm 1 passes it with a suc-
cess probability oI at least , then 1 can be used to
solve a hard AI problem.
Theorem 1 Let / be the security parameter, ana let
, , be as previously dened in Section 2. Let
, , be non-negative real numbers. Assume that
= ( , , , /) is ( )-human executable. If
12
,
is ( , O(/))-hara, then is a ( , )-
CAPTCHA.
Proof: Given an algorithm 1 that has a success proba-
bility oI at least over in time , we construct an
algorithm that is a ( , O(/)) solution to 12
,
.
This proves the theorem.
On input a transIormed image t , the algorithm sim-
ply draws q

1, . . . , / and sets n
= t . Then,
Ior all , such that 1 , / and , = q, it draws
i
and t
, then sets n
= t
(i
). Next,
it submits n
, . . . , n
|
to 1 and obtains the answers
|
, . . . , |
|
Irom 1. Finally, it outputs |
as its answer.
47
It is easy to see that simulates 1 perIectly. Also,
runs in time taken by 1 plus time linear in /. Finally, iI
1 has the probability oI oI getting all oI its answers
right, then `s probability oI success is at least .
5. EXPERIMENTAL RESULTS
We perIormed two experiments to assess the diIculty
oI the TGC CAPTCHAIor humans (Experiment 1) and
machines (Experiment 2).
EXPERIMENT 1. Twelve naive subjects were recruited
Irom the Iaculty and staII oI Thammasat University.
Each subject, in a session lasting approximately 5 min-
utes, responded to 12 TGC CAPTCHAs (two practice
trials and 10 test trials) using the same parameters as
the SSH prototype (/ = 8, n
t
= 80, n
:
= 24). The
CAPTCHAs were displayed in standard terminal win-
dows on PCs running Linux or Mac OS X.
The subjects` average per-character accuracy j
|
on
the test trials was 0.948. Their average word-level ac-
curacy (the number oI 8-letter TGC CAPTCHAs an-
swered with 100accuracy) was 0.708. (Assuming in-
dependence and j
|
= 0.948, we would expect a word-
level accuracy oI (j
|
)
|
= 0.62.)
The Iact that naive users achieve such high accuracy
rates justies the use oI TGC CAPTCHAs in live sys-
tems. Frequent users would very rapidly adapt to the
statistics oI the character set, achieving much higher
accuracy rates.
EXPERIMENT 2. We tested the open source Optical
Character Recognition programGOCR |6| on our TGC
CAPTCHA. Using the same parameters as Experi-
ment 1, we generated 100 TGC CAPTCHAs and con-
verted each oI the resulting text-graphics screens to
bitmaps. We then ran GOCR 0.39 in its deIault con-
guration on the resulting 800 images. We call this
the Naive GOCR adversary to emphasize that diIIer-
ent congurations could in principle yield better adver-
saries.
Unlike our human subjects, Naive GOCR is not con-
strained to respond with exactly one English letter Ior
each English letter oI the CAPTCHA. We thereIore
graded the OCR system using two criteria. As a con-
servative criterion, we judged a response correct iI it
contained the expected letter ana no other letters. As
a less conservative criterion, we judged a response cor-
rect as long as it contained the expected letter.
Naive GOCR had a per-character accuracy j
n
oI
0.241 and 0.329 by the rst (conservative) and second
(loose) criterion, respectively. The word-level accuracy
was 0 by both criteria.
We believe j
n
could be signicantly improved by
incorporating problem-specic knowledge oI the TGC
CAPTCHA algorithm. We leave this as an exercise Ior
interested readers. Clearly, however, the results demon-
strate that Naive GOCR is unsuitable Ior mounting dic-
tionary attacks against TGC CAPTCHA-enabled, pass-
word authenticated systems.
6. CONCLUSION
The TGC CAPTCHA shows promise as a construct Ior
improving the security oI password authenticated sys-
tems like SSH. We have shown that the test is rela-
tively easy Ior humans but would be diIcult Ior 'Naive
GOCR adversaries. OI course, this only puts an upper
bouna on the diIculty oI the problem. We have not
proven that no adversary can do better. (AIter all, cer-
tain AI pundits believe that ALL tasks humans can per-
Iormtoday will be perIormed equally well by machines
in the not-so-distant Iuture!)
In any case, clearly, the security oI a CAPTCHA
password authenticated system increases as the gap
between human and machine perIormance on the test
widens. The TGC CAPTCHA can thus serve as a mod-
est cross-disciplinary challenge in the elds oI pattern
recognition, system security, computer graphics, and
even psychology. This is the approach championed
by |2|. Through Iriendly competition, we hope to en-
courage not only new OCR algorithms, but also a better
understanding oI the strengths and weaknesses oI the
human visual system relative to the best present-day
machine vision systems.
Acknowledgments
We thank the Iaculty and staII oI the Faculty oI Engi-
neering and Sirindhorn International Institute oI Tech-
nology Ior their help with Experiment 1.
7. REFERENCES
|1| B. Pinkas and T. Sander, 'Securing passwords against
dictionary attacks, in Proc. of the 9th CCS, V. Atluri,
Ed. ACM Press, Nov. 2002, pp. 161170.
|2| L. von Ahn, M. Blum, N. J. Hopper, and J. LangIord,
'CAPTCHA: Using hard AI problems Ior security, in
EUROCRYPT 2003, ser. LNCS, E. Biham, Ed., vol.
2656. Springer-Verlag, May 2003, pp. 294311.
|3| CAPTCHA Project, http.//www.captcha.net.
|4| M. Bellare, 'Practice-oriented provable security, in
Lectures on Data Security. Moaern Cryptology in The-
ory ana Practice, ser. LNCS, I. Damg ard, Ed., vol.
1561. Springer-Verlag, 1998, pp. 115.
|5| M. Dailey and C. Namprempre, http.//www.
siit.tu.ac.th/mdailey/captcha/.
|6| J. Schulenburg et al., 'GOCR: open-source character
recognition, version 0.39, 2004, available at http.
//jocr.sourceforge.net/index.html.
|7| T. Ylonen, 'SSH authentication protocol, 2002, draIt
18, available at http.//www.ietf.org/html.
charters/secsh-charter.html.
|8| C. Dwork and M. Naor, 'Pricing via processing or
combatting junk mail, in CRYPTO 1992, ser. LNCS,
E. Brickell, Ed., vol. 740. Springer-Verlag, Aug. 1992,
pp. 139147.
|9| J. Xu, R. Lipton, I. Essa, M. Sung, and Y. Zhu, 'Manda-
tory human participation: Anew authentication scheme
Ior building secure systems, in Twelfth International
Conference on Computer Communications ana Net-
works, IEEE, Ed. IEEE Press, Oct. 2003.
|10| T. Ylonen, T. Kivinen, M. Saarinen, T. Rinne, and
S. Lehtinen, 'SSHtransport layer protocol,2003, draIt
17, available at http.//www.ietf.org/html.
charters/secsh-charter.html.
48

Ieee - 01414527

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ieee - 01414527

Hochgeladen von

Copyright:

Verfügbare Formate

A TEXT-GRAPHICS CHARACTER CAPTCHA FOR PASSWORD

A the act oI sampling each element a

. Then, it sends to the prover

) Ior all 1 , / and iI the timer has not

Das könnte Ihnen auch gefallen