Beruflich Dokumente
Kultur Dokumente
Abstract. MD5 is one of the most widely used cryptographic hash func-
tions nowadays. It was designed in 1992 as an improvement of MD4, and
its security was widely studied since then by several authors. The best
known result so far was a semi free-start collision, in which the initial
value of the hash function is replaced by a non-standard value, which is
the result of the attack. In this paper we present a new powerful attack
on MD5 which allows us to find collisions efficiently. We used this attack
to find collisions of MD5 in about 15 minutes up to an hour computation
time. The attack is a differential attack, which unlike most differential
attacks, does not use the exclusive-or as a measure of difference, but
instead uses modular integer subtraction as the measure. We call this
kind of differential a modular differential. An application of this attack
to MD4 can find a collision in less than a fraction of a second. This attack
is also applicable to other hash functions, such as RIPEMD and HAVAL.
1 Introduction
People know that digital signatures are very important in information security.
The security of digital signatures depends on the cryptographic strength of the
underlying hash functions. Hash functions also have many other applications
in cryptography such as data integrity, group signature, e-cash and many other
cryptographic protocols. The use of hash functions in these applications not only
ensure the security, but also greatly improve the efficiency. Nowadays, there are
two widely used hash functions – MD5 [18] and SHA-1 [12].
MD5 is a hash function designed by Ron Rivest as a strengthened version of
MD4 [17]. Since its publication, some weaknesses has been found. In 1993, B.
den Boer and A. Bosselaers [3] found a kind of pseudo-collision for MD5 which
consists of the same message with two different sets of initial values. This attack
discloses the weak avalanche in the most significant bit for all the chaining vari-
ables in MD5. In the rump session of Eurocrypt’96, H. Dobbertin [8] presented
a semi free-start collision which consists of two different 512-bit messages with
a chosen initial value IV0 .
a0 = 0x12ac2375, b0 = 0x3b341042, c0 = 0x5f62b97c, d0 = 0x4ba763ed
A general description of this attack was published in [9].
Although H. Dobbertin cannot provide a real collision of MD5, his attack
reveals the weak avalanche for the full MD5. This provides a possibility to find
a special differential with one iteration.
In this paper we present a new powerful attack that can efficiently find a col-
lision of MD5. From H. Dobbertin’s attack, we were motivated to study whether
it is possible to find a pair of messages, each consists of two blocks, that pro-
duce collisions after the second block. More specifically, we want to find a pair
(M0 , M1 ) and (M0 , M1 ) such that
(a, b, c, d) = MD5(a0 , b0 , c0 , d0 , M0 ),
(a , b , c , d ) = MD5(a0 , b0 , c0 , d0 , M0 ),
MD5(a, b, c, d, M1 ) = MD5(a , b , c , d , M1 ),
where a0 , b0 , c0 , d0 are the initial values for MD5. We show that such collisions
of MD5 can be found efficiently, where finding the first blocks (M0 , M0 ) takes
about 239 MD5 operations, and finding the second blocks (M1 , M1 ) takes about
232 MD5 operations. The application of this attack on IBM P690 takes about
an hour to find M0 and M0 , where in the fastest cases it takes only 15 minutes.
Then, it takes only between 15 seconds to 5 minutes to find the second blocks
M1 and M1 . Two such collisions of MD5 were made public in the Crypto’04
rump session [19].
This attack is applicable to many other hash functions as well, including
MD4, HAVAL-128 and RIPEMD ([17], [20], [15]). In the case of MD4, the attack
can find a collision within less than a second, and can also find second pre-images
for many messages.
In Crypto’04 Eli Biham and Rafi Chen presented a near-collision attack on
SHA-0 [2], which follows the lines of the technique of [4]. In the rump session they
described their new (and improved) results on SHA-0 and SHA-1 (including a
multi-block technique and collisions of reduced SHA-1). Then, A.J̃oux presented
a 4-block full collision of SHA-0 [14], which is a further improvement of these
results. Both these works were made independently of this paper.
This paper is organized as follows: In Section 2 we briefly describe MD5.
Then in Section 3 we give the main ideas of our attack, and in Section 4 we
give a detailed description of the attack. Finally, in Section 5 we summarize the
paper, and discuss the applicability of this attack to other hash functions.
2 Description of MD5
In order to conveniently describe the general structure of MD5, we first recall
the iteration process for hash functions.
Generally a hash function is iterated by a compression function X = f (Z)
which compresses l-bit message block Z to s-bit hash value X where l > s. For
MD5, l = 512, and s = 128. The iterating method is usually called the Merkle-
Damgard meta-method (see [6], [16]). For a padded message M with multiples
of l-bit length, the iterating process is as follows:
Hi+1 = f (Hi , Mi ), 0 ≤ i ≤ t − 1.
Here M = (M0 , M2 , · · · , Mt−1 ), and H0 = IV0 is the initial value for the hash
function.
In the above iterating process, we omit the padding method because it has
no influence on our attack.
The following is to describe the compression function for MD5. For each
512-bit block Mi of the padded message M , divide Mi into 32-bit words, Mi =
(m0 , m1 , ...., m15 ). The compression algorithm for Mi has four rounds, and each
round has 16 operations. Four successive step operations are as follows:
a = b + ((a + φi (b, c, d) + wi + ti ) ≪ si ),
d = a + ((d + φi+1 (a, b, c) + wi+1 + ti+1 ) ≪ si+1 ),
c = d + ((c + φi+2 (d, a, b) + wi+2 + ti+2 ) ≪ si+2 ),
b = c + ((b + φi+3 (c, d, a) + wi+3 + ti+3 ) ≪ si+3 ),
where the operation + means ADD modulo 232 . ti+j and si+j (j = 0, 1, 2, 3)
are step-dependent constants. wi+j is a message word. ≪ si+j is circularly left-
shift by si+j bit positions. The details of the message order and shift positions
can be seen in Table 3.
Each round employs one nonlinear round function, which is given below.
Φi (X, Y, Z) = (X ∧ Y ) ∨ (¬X ∧ Z), 0 ≤ i ≤ 15,
Φi (X, Y, Z) = (X ∧ Z) ∨ (Y ∧ ¬Z), 16 ≤ i ≤ 31,
Φi (X, Y, Z) = X ⊕ Y ⊕ Z, 32 ≤ i ≤ 47,
Φi (X, Y, Z) = Y ⊕ (X ∨ ¬Z), 48 ≤ i ≤ 63,
where ∆H0 is the initial value difference which equals to zero. ∆H is the output
difference for the two messages. ∆Hi = ∆IVi is the output difference for the
i-th iteration, and also is the initial difference for the next iteration.
It is clear that if ∆H = 0, there is a collision for M and M . We call the
differential that produces a collision a collision differential.
Provided that the hash function has 4 rounds, and each round has 16 step
operations. For more details, we can represent the i-th iteration differential
(Mi ,Mi )
∆Hi −→ ∆Hi+1 as follows:
1 P 2 P3 4 P P
∆Hi −→ ∆Ri+1,1 −→ ∆Ri+1,2 −→ ∆Ri+1,3 −→ ∆Ri+1,4 = ∆Hi+1 .
where
b1 = b1
a2 = a2 [7, ..., 22, −23]
d2 = d2 [−7, 24, 32]
c2 = c2 [7, 8, 9, 10, 11, −12, −24, −25, −26, 27, 28, 29, 30, 31, 32, 1, 2, 3, 4, 5, −6]
b2 = b2 [1, 16, −17, 18, 19, 20, −21, −24]
b2 = c2 + ((b1 + F (c2 , d2 , a2 ) + m7 + t7 ) ≪ 22
b2 = c2 + ((b1 + F (c2 , d2 , a2 ) + m7 + t7 ) ≪ 22
φ7 = F (c2 , d2 , a2 ) = (c2 ∧ d2 ) ∨ (¬c2 ∧ a2 )
In the above operations, c2 occurs twice in the right hand side of the equation.
In order to distinguish the two, let cF NF
2 denote the c2 inside F , and c2 denote
the c2 outside F .
The derivation is based on the following two facts:
(b) The conditions d2,i = a2,i ensure that the changed i-th bit in cF 2 result
in no change in b2 , where i ∈ {1, 2, 4, 5, 25, 27, 29, 30, 31}.
(c) The conditions c2,i = 1 ensure that the changed i-th bit in a2 result in
no change in b2 , where i ∈ {13, 14, 15, 16, 18, 19, 20, 21, 22, 23}.
(d) The condition d2,6 = a2,6 = 0 ensures that the 6-th bit in cF 2 result in
no change in b2 .
(e) The condition a2,32 = 1 ensures that the changed 32-th bit in cF 2 and
the 32-th bit in d2 result in no change in b2 .
(f) The condition d2,i = 0 ensures that the changed i-th bit in a2 and the
2 result in no change in b2 , where i ∈ {8, 9, 10}.
i-th bit in cF
(g) The condition d2,12 = 1 ensures that the changed 12-th bit in a2 and the
12-th bit in cF2 result in no change in b2 .
(h) The condition a2,24 = 0 ensures that the changed 24-th bit in cF 2 and
the 24-th bit in d2 result in no change in b2 .
(i) The changed 7-th bits in cF 2 , d2 and a2 result in no change in b2 .
By the similar method, we can derive a set of sufficient conditions (see Table 4
and Table 6) which guarantee all the differential characteristics in the collision
differential to hold.
mnew
2 ← ((cnew
1 − cold
1 ) ≫ 17) + m2 .
old
From the above description, it is very easy to show our attack on MD5.
The following is to describe how to find a two-block collision, of the following
form
(M0 ,M0 ),2−37 (M1 ,M1 ),2−30
H0 −→ ∆H1 −→ ∆H = 0.
(∆H1 , ∆M1 ) −→ ∆H = 0
The complexity of finding (M0 , M0 ) doesn’t exceed the time of running 239 MD5
operations. To select another message M0 is only to change the last two words
from the previous selected message M0 . So, finding (M0 , M0 ) only needs about
one-time single-message modification for the first 14 words. This time can be ne-
glected. For each selected message M0 , it is only needs two-time single-message
modifications for the last two words and 7-time multi-message modifications for
correcting 7 conditions in the second round, and each multi-message modification
only needs about a few step operations, so the total time for both kinds of mod-
ifications is not exceeds about two MD5 operations for each selected message.
According to the probability of the first iteration differential, it is easy to know
that the complexity of finding (M0 , M0 ) is not exceeds 239 MD5 operations.
Similarly, we can show that the complexity of finding (M1 , M1 ) is not exceeds
32
2 MD5 operations.
Two collisions of MD5 are given in Table 2. It is noted that the two collisions
Table 2. Two pairs of collision for MD5. H is the hash value with little-endian and
no message padding, and H ∗ is the hash value with big-endian and message padding.
start with the same 1-st 512-bit block, and that given a first block that satisfies
all the required criteria, it is easy to find many second blocks M1 , M1 which lead
to collisions.
5 Summary
Acknowledgements
It is a pleasure to acknowledge Dengguo Feng for the conversations that led to
this research on MD5. We would like to thank Eli Biham, Andrew C. Yao, and
Yiqun Lisa Yin for their important advice, corrections, and suggestions, and
for spending their precious time on our research. We would also like to thank
Xuejia Lai, Hans Dobbertin, Magnus Daum for various discussions on this paper.
The research is supported by the National Natural Science Foundation of China
(Grant No. 90304009).
References
1. E. Biham, A. Shamir. Differential Cryptanalysis of the Data Encryption Standard,
Springer-Verlag, 1993.
2. E. Biham, R. Chen, Near collision for SHA-0, Advances in Cryptology, Crypto’04,
2004, LNCS 3152, pp. 290-305.
3. B. den. Boer, A. Bosselaers. Collisions for the compression function of MD5, Ad-
vances in Cryptology, Eurocrypt’93 Proceedings, Springer-Verlag, 1994.
4. F. Chabaud, A. Joux. Differential collisions in SHA-0, Advances in Cryptology,
Crypto’98 Proceedings, Springer-Verlag, 1998.
5. S. Cotini, R.L. Rivest, M.J.B. Robshaw, Y. Lisa Yin. Security of the RC6T M Block
Cipher, http://www.rsasecurity.com/rsalabs/rc6/.
6. I. B. Damgard. A design principle for hash functions, Advances in Cryptology,
Crypto’89 Proceedings, Springer-Verlag, 1990.
7. H. Dobbertin. Cryptanalysis of MD4, Fast Software Encryption, LNCS 1039,
Springer-Verlag, 1996, 53-69.
8. H. Dobbertin. Cryptanalysis of MD5 compress, presented at the rump session of
Eurocrypt’96.
9. H. Dobbertin. The status of MD5 after a recent attack, CryptoBytes 2 (2), 1996,
ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto2n2.pdf.
10. H. Dobbertin. RIPEMD with two round compress function is not collision-free,
Journal of Cryptology, 10:51-69, 1997.
11. H. Dobbertin, A. Bosselaers, B. Preneel. RIPEMD-160: A strengthened version of
RIPEMD, Fast Software Encryption, LNCS 1039, Springer-Verlag, 1996.
12. FIPS 180-1. Secure hash standard, NIST, US Department of Commerce, Washing-
ton D.C., Springer-Verlag, 1996.
13. FIPS 180-2. Secure Hash Standard, http://csrc.nist.gov/publications/, 2002.
14. A. Joux. Collisions for SHA-0, rump session of Crypto’04, 2004.
15. RIPE. Integrity Primitives for Secure Information Systems. Final Report of RACE
Integrity Primitives Evaluation (RIPE-RACE 1040), LNCS 1007, Springer-Verlag,
1995.
16. R.C. Merkle. One way hash function and DES, Advances in Cryptology, Crypto’89
Proceedings, Springer-Verlag, 1990.
17. R.L. Rivest. The MD4 message digest algorithm, Advances in Cryptology,
Crypto’90, Springer-Verlag, 1991, 303-311.
18. R.L. Rivest. The MD5 message-digest algorithm, Request for Comments (RFC
1320), Internet Activities Board, Internet Privacy Task Force, 1992.
19. X.Y. Wang, F.D. Guo, X.J. Lai, H.B. Yu, Collisions for hash functions MD4, MD5,
HAVAL-128 and RIPEMD, rump session of Crypto’04, E-print, 2004.
20. Y.L. Zheng, J. Pieprzyk, J. Seberry. HAVAL–A one-way hashing algorithm with
variable length of output, Advances in Cryptology, Auscrypt’92 Proceedings,
Springer-Verlag.
Table 3. The Differential Characteristics in the First Iteration Differential
Step The output wi si ∆wi The output difference The output in i-th step for M0
in i-th step in i-th step
for M0
4 b1 m3 22
5 a2 m4 7 231 −26 a2 [7, . . . , 22, −23]
6 d2 m5 12 −26 + 223 + 231 d2 [−7, 24, 32]
7 c2 m6 17 −1 − 26 + 223 − 227 c2 [7, 8, 9, 10, 11, −12, −24, −25, −26,
27, 28, 29, 30, 31, 32, 1, 2, 3, 4, 5, −6]
8 b2 m7 22 1 − 215 − 217 − 223 b2 [1, 16, −17, 18, 19, 20, −21, −24],
9 a3 m8 7 1 − 26 + 231 a3 [−1, 2, 7, 8, −9, −32]
10 d3 m9 12 212 + 231 d3 [−13, 14, 32]
11 c3 m10 17 230 + 231 c3 [31, 32]
12 b3 m11 22 215 −27 − 213 + 231 b3 [8, −9, 14, . . . , 19, −20, 32]
13 a4 m12 7 224 + 231 a4 [−25, 26, 32]
14 d4 m13 12 231 d4 [32]
15 c4 m14 17 231 23 − 215 + 231 c4 [4, −16, 32]
16 b4 m15 22 −229 + 231 b4 [−30, 32]
17 a5 m1 5 231 a5 [32]
18 d5 m6 9 231 d5 [32]
19 c5 m11 14 215 217 + 231 c5 [18, 32]
20 b5 m0 20 231 b5 [32]
21 a6 m5 5 231 a6 [32]
22 d6 m10 9 231 d6 [32]
23 c6 m15 14 c6
24 b6 m4 20 231 b6
25 a7 m9 5 a7
26 d7 m14 9 231 d7
27 c7 m3 14 c7
... ... ... ... ... ... ...
34 d9 m8 11 d9
35 c9 m11 16 215 231 c9 [∗32]
36 b9 m14 23 231 231 b9 [∗32]
37 a10 m1 4 231 a10 [∗32]
38 d10 m4 11 2 231
31
d10 [∗32]
39 c10 m7 16 231 c10 [∗32]
... ... ... ... ... ... ...
45 a12 m9 4 231 a12 [∗32]
46 d12 m12 11 231 d12 [32]
47 c12 m15 16 231 c12 [32]
48 b12 m2 23 231 b12 [32]
49 a13 m0 6 231 a13 [32]
50 d13 m7 10 231 d13 [−32]
51 c13 m14 15 231 231 c13 [32]
52 b13 m5 21 231 b13 [−32]
... ... ... ... ... ... ...
58 d15 m15 10 231 d15 [−32]
59 c15 m6 15 231 c15 [32]
60 b15 m13 21 231 b15 [32]
61 aa0 = a16 + a0 m4 6 2 231
31
aa0 = aa0 [32]
62 dd0 = d16 + d0 m11 10 215 231 dd0 = dd0 [26, 32]
63 cc0 = c16 + c0 m2 15 231 cc0 = cc0 [−26, 27, 32]
64 bb0 = b16 + b0 m9 21 231 bb0 = bb0 [26, −32]
Table 4. A Set of Sufficient Conditions for the First Iteration Differential
Step The output wi si ∆wi The output Difference The output in i-th step for M1
in i-th step in i-th step
for M1
IV aa0 , dd0 aa0 [32], dd0 [26, 32]
cc0 , bb0 cc0 [−26, 27, 32], bb0 [26, −32]
1 a1 m0 7 225 + 231 a1 [26, −32]
2 d1 m1 12 25 + 225 + 231 d1 [6, 26, −32]
3 c1 m2 17 25 + 211 + 216 c1 [−6, −7, 8, −12, 13,
+225 + 231 -17,. . . ,-21,22,-26,. . . ,-30,31,-32]
4 b1 m3 22 −2 + 25 + 225 + 231 b1 [2, 3, 4, −5, 6, −26, 27, −32]
5 a2 m4 7 231 1 + 26 + 28 + 29 + 231 a2 [1, −7, 8, 9, −10, −11, −12, 13, 32]
6 d2 m5 12 −216 − 220 + 231 d2 [17, −18, 21, −22, 32]
7 c2 m6 17 −26 − 227 + 231 c2 [7, 8, 9, −10, 28, −29, −32]
8 b2 m7 22 215 − 217 − 223 + 231 b2 [−16, 17, −18, 24, 25, 26, −27, −32]
9 a3 m8 7 1 + 26 + 231 a3 [−1, 2, −7, −8, −9, 10, −32]
10 d3 m9 12 212 + 231 d3 [13, −32]
11 c3 m10 17 231 c3 [−32]
12 b3 m11 22 −215 −27 − 213 + 231 b3 [−8, 14, 15, 16, 17, 18, 19, −20, −32]
13 a4 m12 7 224 + 231 a4 [−25, . . . , −30, 31, 32]
14 d4 m13 12 231 d4 [32]
15 c4 m14 17 231 23 + 215 + 231 c4 [4, 16, 32]
16 b4 m15 22 −229 + 231 b4 [−30, 32]
17 a5 m1 5 231 a5 [32]
18 d5 m6 9 231 d5 [32]
19 c5 m11 14 −215 217 + 231 c5 [18, 32]
20 b5 m0 20 231 b5 [32]
21 a6 m5 5 231 a6 [32]
22 d6 m10 9 231 d6 [32]
23 c6 m15 14 c6 [32]
24 b6 m4 20 231 b6 [32]
25 a7 m9 5 a7
26 d7 m14 9 231 d7
27 c7 m3 14 c7
... ... ... ... ... ... ...
34 d9 m8 11 d9
35 c9 m11 16 −215 231 c9 [∗32]
36 b9 m14 23 231 231 d9 [∗32]
37 a10 m1 4 231 a10 [∗32]
31
38 d10 m4 11 2 231 d10 [∗32]
39 c10 m7 16 231 c10 [∗32]
... ... ... ... ... ... ...
49 a13 m0 6 231 a13 [32]
50 d13 m7 10 231 d13 [−32]
31
51 c13 m14 15 2 231 c13 [32]
52 b13 m5 21 231 b13 [−32]
... ... ... ... ... ... ...
59 c15 m6 15 231 c15 [32]
60 b15 m13 21 231 b15 [32]
61 a16 + aa0 m4 6 231 a16 + aa0 = a16 + aa0
62 d16 + dd0 m11 10 −215 d16 + dd0 = d16 + dd0
63 c16 + cc0 m2 15 c16 + cc0 = c16 + cc0
64 b16 + bb0 m9 21 b16 + bb0 = b16 + bb0
Table 6. A Set of Sufficient Conditions for the Second Iteration Differential