Beruflich Dokumente
Kultur Dokumente
Ronald L. Rivest
MIT CSAIL
Cambridge, MA 02139
rivest@mit.edu
May 2, 2013
Version 2.0
Abstract
Introduction
Passwords are a notoriously weak authentication mechanism. Users frequently choose poor passwords. An adversary who has stolen a file of hashed passwords can often
use brute-force search to find a password p whose hash
value H(p) is equal to the hash value stored for a given
users password, thus allowing the adversary to impersonate the user.
A recent report by Mandiant1 illustrates the significance of cracking hashed passwords in the current threat
environment. Password cracking was instrumental, for instance, in a recent cyberespionage campaign against the
New York Times [32]. The past year has also seen numerous high-profile thefts of files containing consumers
1 http://intelreport.mandiant.com/
2 https://password-hashing.net/index.html
2.2
2
2.1
Attack scenarios
Technical Description
Visible passwords: A users password is compromised when an adversary views it being entered
(shoulder-surfing), or an adversary sees it on a yellow stickie on a monitor. A one-time password generator 3 such as RSAs SecurID token provides good
protection against this threat.
Context
(ui , H(pi ))
for i = 1, 2, . . . , n. On Unix systems the file F might be
/etc/passwd or /etc/shadow.
The system stores password hashes rather than raw
passwords so that an adversary with access to F does
not find out the passwords directly; he must invert the
hash function (compute pi from H(pi )) to find out the
password for user ui (see Evans et al. [1] and Purdy [33]).
The computation of the hash function H may (should!)
involve the use of system-specific or user-specific parameters (salts); these details dont matter to us here. When
a user attempts to log in, the file F is checked for the presence of the hash of the proffered password in the users
entry (see Morris and Thompson [26]).
We focus on the first attack scenario where an adversary has obtained a copy of the file F of usernames and
associated hashed passwords, and has obtained the values
of the salt or other parameters required to compute the
hash function H.
In this scenario, the adversary can perform a bruteforce search over short or likely passwords, hashing each
one (with salting if necessary) until the adversary determines the passwords for one or more users. (See for example Weir et al. [40].) Assuming that passwords are the
3 http://en.wikipedia.org/wiki/One-time_password
2.3
Set: i, j
Sets c(i) to have value j.
Check: i, j
Checks that c(i) = j. May return result of check to
requesting computer system. May raise an alarm if
check fails.
Design principles. The computer system and honeychecker together provide a basic form of distributed security. A distributed security system aims to protect secrets
even when an adversary compromises some of its systems
or software. Diversifying the resources in the systemfor
example, placing the computer system and honeychecker
in separate administrative domains or running their software on different operating systemsmakes it harder to
compromise the system as a whole.
We have designed the protocol so that compromise of
the honeychecker database by itself does not allow an adversary to impersonate a user. In fact, the honeychecker
only stores randomly selected integers (the index c(i) for
each ui ).
Indeed, one of our design principles is that compromise
(i.e. disclosure) of the honeychecker database at worst
only reduces security to the level it was at before the introduction of honeywords and the honeychecker. Disclosure
of the file F then means that an adversary will now no
longer be fooled by the presence of the honeywords; he
will just need to crack the users actual passwords, since
he now knows which hash values are for the real passwords, and which hash values are for the honeywords.
As we discuss in Section 8, other distributed approaches to password protection are possible. Distributed
cryptographic protocols for instance can prevent disclosure of passwords and even password hashes completely
against compromise of the computer system. Unlike such
schemes, though, honeywords can be incorporated into
existing password systems with few system changes and
little overhead in computation and communication.
We also design the honeychecker interface to be extremely simple, so that building a hardened honeychecker
should be realistic. Importantly, the honeychecker need
Honeychecker
2.4
Approach Setup
Here Gen is typically randomized and must involve interaction with the user (otherwise the user cannot create or
know the password). We may represent this user interaction in some cases by allowing an additional argument in
the form of a user-supplied password pi to Gen, so that
Gen(k; pi ) ensures that pi is the password in Wi ; that is,
pi = wi,c(i) .
The table c is maintained in a secure manner; in the
proposal of this note it is stored on the honeychecker.
wi,c(i) = pi .
Although we call the wi,j entries potential passwords, they could be phrases or other strings; a potential
password could be a potential passphrase or a sweetphrase.
The correct password is also called the sugarword.
The other (k 1) words wi,j are called honeywords,
chaff, decoys, or just incorrect passwords.
The list Wi of sweetwords thus contains one sugarword
(the password) and (k 1) honeywords (the chaff).
We also allow a sweetword to be what we call a tough
nutthat is, a very strong password whose hash the
adversary is unable to invert. We represent a tough nut
by the symbol ? . A honeyword, or the password itself,
may be a tough nut.
The definition of the file F is changed so that it now
contains an extended entry for each user ui , of the form:
2.5
Approach Login
(ui , Hi ) ,
where
vi,j = H(wi,j )
If the adversary has entered one of the users honeywords, obtained for example by brute-forcing the password file F , then an appropriate action takes place (determined by policy), such as
2.6
Check: i, j
meaning: Someone has requested to login as user ui and
has supplied sweetword j (that is, wi,j ) in response to the
login password prompt. Please determine if j = c(i), and
take the appropriate action according to policy.
The honeychecker determines whether j = c(i); if not,
an alarm is raised and other actions may be taken. The
honeychecker may (or may not, depending on policy) then
respond with a (signed) message indicating whether login
should be allowed.
It may be desirable for a Check message to be sent
to the honeychecker, even when the proffered password g
is not on the list Wi of sweetwords; in this case the check
message could specify j = 0. In this variant the honeychecker is notified of every login attempt, and can observe
when a password guessing attack is in progress.
Set: i, j
meaning: User ui has changed or initialized her password; the new value of c(i) is now j. (This message
should of course be authenticated by the system to the
honeychecker.)
Security definitions
We define the security of a honeyword generation algorithm Gen, using an adversarial game, an algorithm or
thought experiment that models the capabilities of the
adversary.
Honeyword Generation
Flatness. Let z denote the adversarys expected probability of winning the game, given that the adversary does
not pass. This probability is taken over the users choice
of password pi , the generation procedure Gen(k; pi ), and
any randomization used by the adversary to produce its
guess j. Observe that z 1/k, since an adversary can
win with probability 1/k merely by guessing j at random.
We say a honeyword generation method is -flat
(epsilon-flat) for a parameter if the maximum value
over all adversaries of the adversarys winning probability z is .
If the generation procedure is as flat as possible (i.e.,
1/k-flat), we say it is perfectly flat (for a given distribution U ). If it is -flat for not much greater than 1/k,
we say that it is approximately flat.
Our recommended value of k = 20 means that an adversary who has compromised F and inverted H successfully k times to obtain all 20 sweetwords has a chance of
at most 5% of picking the correct password pi from this
list, if Gen is perfectly flat. In this ideal case, = 1/20.
4.1
With a legacy-UI method, the password-change procedure asks the user for the new password (and perhaps
4.1.1
(1)
Chaffing by tweaking
Our first method is to tweak selected character positions of the password to obtain the honeywords. Let t
denote the desired number of positions to tweak (such
as t = 2 or t = 3). For example, with chaffing-by-tailtweaking the last t positions of the password are chosen.
The honeywords are then obtained by tweaking the
characters in the selected t positions: each character in a
selected position is replaced by a randomly-chosen character of the same type: digits are replaced by digits, letters
by letters, and special characters (anything other than a
letter of a digit) by special characters.
For example, if the user-supplied password is
BG+7y45, then the list Wi might be (for tail-tweaking
(2)
4.1.2
xyqi3tbato
#NDYRODD_!!
pizzhemix01
sveniresly
mobopy
a3915
venlorhan
dfdhusZ2
Sb123
WORFmgthness
Modeling syntax. Bojinov et al. [6] propose an interesting approach (based on [40]) to chaffing-with-apassword-model in which honeywords are generated using the same syntax as the password. (Note that with
this method, unlike the one above, honeywords do depend on the password.) In their scheme, the password
is parsed into a sequence of tokens, each representing
a distinct syntactic elementa word, number, or set of
special characters. For example, the password
mice3blind
might be decomposed into the token sequence
W4 | D1 | W5 , meaning a 4-letter word followed by
a 1-digit number and then a 5-letter word. Honeywords
are then generated by replacing tokens with randomly
selected values that match the tokens. For example, the
choice W4 gold, D1 50 , W5 rings would
yield the honeyword
Chaffing-with-a-password-model
Our second method generates honeywords using a probabilistic model of real passwords; this model might be
based on a given list L of thousands or millions of passwords and perhaps some other parameters. (Note that
generating honeywords solely from a published list L as
honeywords is not in general a good idea: such a list may
also be available to the adversary, who could use it to help
identify honeywords.) Unlike the previous chaffing methods, this method does not necessarily need the password
in order to generate the honeywords, and it can generate
honeywords of widely varying strength.
Here is a list of 19 honeywords generated by one simple
model (see Appendix for details):
gold5rings.
Replacements for word tokens are selected from a dictionary provided as input to the generation algorithm.
Further details are given in [6]
4.1.3
kebrton1
02123dia
a71ger
forlinux
1erapc
sbgo864959
aiwkme523
aj1aob12
9,50PEe]KV.0?RIOtc&L-:IJ"b+Wol<*[!NWT/pb
4.3
possibly among them. For example, what should the adversary do with the following list?
4.2
Comparison of methods
We now consider a few other ways of generating honeywords and some practical deployment considerations.
5.1
We now propose another method, take-a-tail, which utilizes a modified UI for password-changes; the passwordchange UI is just a slight variant of the standard one.
The take-a-tail method is identical to the chaffing-bytail-tweaking method, except that the tail of the new
password is now randomly chosen by the system, and required in the user-entered new password.
That is, the password-change UI is changed from:
4Tniners
sin(pi/2)
all41&14all
\{1,2,3\}
i8apickle
AB12:YZ90
and the system could then inform the user that password
c(i) = 6 (the last one) is her password.
The random pick method is perfectly flat, no matter
how the list Wi of sweetwords was generated, since the
given procedure is equivalent to choosing c(i) uniformly
at random from {1, 2, . . . , k} independent of the actual
sweetwords; there is thus no information in Wi that can
aid in determining c(i).
It is probably a bad idea, however, to ask the user for k
sweetwords. Not only is this burdensome on the user, but
the user may remember and mistakenly enter a sweetword
supplied by her and used by the system as a honeyword.
The random pick approach is probably better applied
to a set of k sweetwords generated by an algorithmic password generator.
5.2
Typo-safety
Honeyword
method
Tweaking (4.1.1)
Password-model (4.1.2)
Tough nuts (4.1.3)
Take-a-tail (4.2)
Hybrid (5.5)
Flatness
DoS
resistance
(3)
(1/k) if U constant over T (p)
(1/k) if U G
N/A
(1/k) unconditionally
(1/k) if U G and
U constant over T (p)
(7.5)
weak
strong
strong
weak
strong
Storage
cost (#
of hashes)
(5.4)
1
k
k
k
k
Legacy
-UI?
(4)
yes
yes
yes
no
yes
Multiplesystem
protection
(7.6)
no
no
no
yes
no
Table 1: Comparison of honeyword-generation methods. All methods can achieve excellent (1/k)-flatness
under some conditions. By weak DoS (denial of service) resistance, we mean that an adversary can with
non-negligible probability submit a honeyword given knowledge of the password; by strong DoS resistance
we mean that such attack is improbable. Multiple-system protection is the property that compromise of the
same users account in different systems will not immediately reveal pi . Finally, G denotes the probability
distribution of honeywords generated by chaffing-with-a-password-model. (Thus U G means that these
honeywords are distributed like user passwords in the view of the adversary.) The means tough nuts
are not useful on their own; they are best used together with other methods. The storage costs assume
generation of k 1 honeywords. For further details, see the indicated sections.
5.3
10
5.4
5.5
It is possible to combine the benefits of different honeyword generation strategies by composing them into a
hybrid scheme.
As an example, we show how to construct a hybrid
legacy-UI scheme that combines chaffing-by-tweakingdigits with chaffing-with-a-password-model. We assume
a password-composition policy that requires at least one
digit, so that tweaking digits is always possible.
Here is a simple hybrid scheme:
1. Use chaffing-with-a-password-model on user-supplied
password p to generate a set of a ( 2) seed sweetwords W 0 , one of which is the password. Some seeds
may be tough nuts.
2. Apply chaffing-by-tweaking-digits to each seed sweetword in W 0 to generate b ( 2) tweaks (including
the seed sweetword itself). This yields a full set W
of k = a b sweetwords.
Storage optimization
11
abacad513
abacad941
abacad004
abacad752
snurfle672
snurfle806
snurfle772
snurfle091
the honeychecker. The cost in terms of increased password guessability is small. Temporary communication
failures can be addressed by buffering messages on the
computer system for later delivery to and processing by
the honeychecker.
6.3
Per-user policies
We can have policies that vary per-user; this is not uncommon already.
Policy choices
6.1
Password Eligibility
6.4
7
6.2
Failover
12
Per-sweetword policies
Attacks
This section reviews more carefully various attacks possible against the methods proposed here.
7.1
Legacy-UI methods dont affect how users choose passwords, so they have no beneficial effect against adversaries
who try common passwords in an online guessing attack.
7.2
Personal information about a user could help an adversary distinguish the users password from her honeywords.
It is often feasible to deanonymize users, that is, ascertain
their real-world identities, based on their social network
graphs [27] or just their usernames [31]. Given a users
identity, there are then many ways to find demographic
or biographical data about her onlineby exploiting information published on social networks, for example [5].
Knowing a users basic demographic information,
specifically his/her gender, age, or nationality, is known to
enable slightly more effective cracking of the users hashed
password [7, 8]. Similarly, attackers often successfully exploit biographical knowledge to guess answers to personal
questions in password recovery systems and compromise
victims accounts [35]. (The hacking of Governor Sarah
Palins Yahoo! account is a well known example.) As
chaffing-with-a-password-model creates honeywords independently of users password, this method of honeyword
generation may enable adversaries to target data-mining
attacks against users and gain some advantage in distinguishing their passwords from their honeywords.
allowing login based on a honeyword and buffering messages for later processing by the honeychecker.
While our intention is that the honeychecker should be
hardened and of minimalist design, the deployment of the
computer system and the honeychecker as two distinct
systems itself brings the usual benefits of separation of
duties in enhancing security. The two systems may be
placed in different administrative domains, run different
operating systems, and so forth.
7.4
Likelihood Attack
G(wi,j 0 ) = C R(wij )
j 0 6=j
where
C=
7.3
G(wi,j 0 )
j0
and where
R(x) = U (x)/G(x)
is the relative likelihood that the user picks x compared
to the honeyword generator picking x. Note that it is
desirable that for all eligible x, G(x) > 0 (that is, the
honeyword generator is capable of generating all possible words); otherwise the password may be recognizable
as one the honeyword generator could not possibly have
produced.
The adversary wants to maximize his likelihood of
picking the password, so he will pick the one maximizing R(wij ). This is the password that is maximally more
likely to be picked by the user than to be generated by
the honeyword generator. As an example, a password like
13
NewtonSaid:F=ma
7.6
Multiple systems
As users commonly employ the same password across different systems, an adversary might seek an advantage in
password guessing by attacking two distinct systems, system A and system Bor multiple systems, for that matter. We consider two such forms of attack, an intersection attack and a sweetword-submission attack.
7.5
Denial-of-service
We briefly discuss denial-of-service (DoS) attacksa potential problem for methods such as chaffing-by-tweaking
that generate honeywords by predictably modifying
user-supplied passwords. (In contrast, chaffing-with-apassword-model and the hybrid scheme of Section 5.5 offer strong DoS resistance.)
The concern is that an adversary who has not compromised the password file F , but who nonetheless knows a
users passworde.g., a malicious user or an adversary
mounting phishing attackscan feasibly submit one of
the users honeywords. For example, with chaffing-bytweaking-digits, with t = 2, such an adversary can guess
a valid honeyword with probability (k 1)/99. A false
appearance of theft of the password file F results.
An overly sensitive system can turn such honeyword
hits into a DoS vulnerability. One (drastic) example is
a policy that forces a global password reset in response
to a single honeyword hit. Conversely, in a system inadequately sensitive to DoS attacks, an adversary that has
stolen F can guess passwords while simulating a DoS attack to avoid triggering a strong response. So a policy of
appropriately calibrated response is important. Reducing
the potency of DoS attacks can help.
14
Related Work
Password strength.
The current, state-of-the-art
heuristic password cracking algorithm, due to Weir et al.,
is based on probabilistic, context-free grammars [40]. In a
recent study, Kelley et al. [23] characterize the vulnerability of user-generated passwords to Weir-style cracking attacks under various password-composition policies. One
such policy is a common, weak one dubbed basic8, in
which users are instructed, Password must have at least
8 characters. One billion guesses suffice to crack 40.3% of
such passwords. Recent work shows that cracking speeds
for some hash functions (e.g., MD5) can approach threebillion guesses per second on a single graphical-processing
unit (GPU); see, e.g., Table 15 of [3]. Also in recent work,
Bonneau develops a framework to assess the strength of
passwords (and other user secrets). Based on study of
published password corpora, including one representing
70 million Yahoo! users, he estimates that a majority of
passwords have little more than 20 bits of effective entropy against an optimal attacker [7, 8].
Together, these results underscore the weakness of current password protections even with the use of sound
practices, such as salting. There is good reason to believe that many systems dont even make use of salt [29].
While the reason for this lapse is unclear, we emphasize
that honeywords may be used with or without salt (and
even in principle with or without hashing).
Bonneau and Preibusch [9] offer an excellent survey of
current password management practices on popular web
sites, including password composition requirements and
advice to users, account lockout policies, and update and
recovery procedures. Herley and van Oorschot [21] argue
that use of passwords will persist for many years, and
highlight key research questions on how to create strong
passwords and manage them effectively.
Password strengthening. The take-a-tail method may
be viewed as a variant on previously proposed password
strengthening schemes. Forget et al. [18], randomly interleave system-generated characters into a password. The
user may request a reshuffling of these characters until she
obtains a password she regards as memorable. The extra
characters here are essentially sugar. (Rejected or unpresented interleavings could serve as honeywords.) Houshmand and Aggarwal [22] recently proposed a related system that applies small tweaks to user-supplied passwords
to preserve memorability while adding strength against
cracking, specifically via [40]. Various schemes, e.g., PwdHash [34], have also been proposed to strengthen passwords within password managers.
15
16
Open Problems
This paper is just an initial stab at the issues surrounding the use of honeywords to protect password hash files;
many open questions remain, such as:
How should an adversary act optimally when some
tough nuts are included among the honeywords?
What is the best way to enforce password-reuse policies?
Can the password models underlying cracking algorithms (e.g., [40]) be easily adapted for use in
chaffing-with-a-password-model?
How effective is targeted password guessing in distinguishing passwords from honeywords?
How can a honeyword system best be designed to
withstand active attacks, e.g., code modification, of
the computer system (or the honeychecker)?
How well can targeted attacks help identify users
passwords for particular honeyword-generation
methods?
How user-friendly in practice is take-a-tail?
10
reverts to current practice if auxiliary server files are compromised, and is even robust against auxiliary server failure (if one allows logins with honeywords).
Honeywords also provide another benefit. Published
password files (e.g., one stolen from LinkedIn [30]) provide attackers with insight into how users compose their
passwords. Attackers can then refine their models of user
password selection and design faster password cracking algorithms [23]. Thus every breach of a password server has
the potential to improve future attacks. Some honeyword
generation strategies, particularly chaffing ones, obscure
actual user password choices, and thus complicate model
building for would-be hash crackers. It may even be useful to muddy attacker knowledge of users composition
choices intentionally by drawing some honeywords from
slightly perturbed probability distributions.
Despite their benefits over common methods for password management, honeywords arent a wholly satisfactory approach to user authentication. They inherit many
of the well known drawbacks of passwords and somethingyou-know authentication more generally. Eventually,
passwords should be supplemented with stronger and
more convenient authentication methods, e.g., [16], or
give way to better authentication methods completely,
as recently predicted by the media [24, 39].
In the meantime, honeywords are a simple-to-deploy
and powerful new line of defense for existing password
systems. We hope that the security community will benefit from their use. (See our note below on IP.)
References
[1] A. Evans, Jr., W. Kantrowitz, and E. Weiss. A user
authentication scheme not requiring secrecy in the
computer. Commun. ACM, 17(8):437442, August
1974.
[2] R. J. Anderson and T.M.A. Lomas. On fortifying key
negotiation schemes with poorly chosen passwords.
Electronics Letters, 30(13):10401041, 1994.
[3] M. Bakker and R. van der Jagt. GPU-based password cracking. Technical report, Univ. of Amsterdam, 2010.
[4] T. A. Berson, L. Gong, and T.M.A. Lomas. Secure,
keyed, and collisionful hash functions. Technical Report SRI-CSL-94-08, SRI International Laboratory,
1993 (revised 2 Sept. 1994).
[5] L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda. All
your contacts are belong to us: automated identity
theft attacks on social networks. In WWW, pages
551560, 2009.
[6] H. Bojinov, E. Bursztein, X. Boyen, and D. Boneh.
Kamouflage: loss-resistant password management.
In ESORICS, pages 286302, 2010.
Acknowledgments
17
[14] F. Cohen. The use of deception techniques: Honeypots and decoys. In H. Bidgoli, editor, Handbook
of Information Security, volume 3, pages 646655.
Wiley and Sons, 2006.
[28] U.S. House of Representatives. H.R. 624: The Cyber Intelligence Sharing and Protection Act of 2013.
113th Cong., 2013.
18
[41] J. Yuill, M. Zappe, D. Denning, and F. Feer. Honeyfiles: deceptive files for intrusion detection. In Information Assurance Workshop, pages 116122, 2004.
[42] Y. Zhang, F. Monrose, and M. K. Reiter. The security of modern password expiration: an algorithmic
framework and empirical analysis. In ACM CCS,
pages 176186, 2010.
A simple model for generating a single honeyword: The password list L is initialized to a list of
many thousands of real passwords, as well as some truly
random passwords of varying lengths.
A tough nut is generated with some fixed probability
(e.g. 8%).
Otherwise a honeyword is generated as follows. A target length d is first determined by picking a random password w from L and measuring its length.
Let the characters of the new password be denoted c1 ,
c2 , . . . , cd . These are determined sequentially. The first
character c1 is just the first character w1 of w. Let w =
w1 w2 . . . , wd .
To determine the jth character of c, for j = 2, 3, . . . , d:
With probability 0.1, replace w by a randomly chosen password in L of length t. Then let cj = wj .
19