Sie sind auf Seite 1von 89

Statistical and Performance Analysis of

SHA-3 Hash Candidates


by
student
A Project Report Submitted
in
Partial Fulfillment of the
Requirements for the Degree of
Master of Science
in
Computer Science

Supervised by

Department of Computer Science


B. Thomas Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Rochester, New York
08-15-2011

Project Report Release Permission Form


Rochester Institute of Technology
B. Thomas Golisano College of Computing and Information Sciences

Title: Statistical and Performance Analysis of SHA-3 Hash Candidates

I, student, hereby grant permission to the Wallace Memorial Library reproduce my


project in whole or part.

student

Date

iii

The project Statistical and Performance Analysis of SHA-3 Hash Candidates by student has been examined and approved by the following Examination Committee:

Project Committee Chair

iv

Abstract
Statistical and Performance Analysis of
SHA-3 Hash Candidates
student
Supervising Professor:

A hash function takes input data, called the message, and produces a condensed representation, called the message digest. Security flaws have been detected in some of the most
commonly used hash functions like MD5 (Message Digest) and SHA-1 (Secure Hash Algorithm). Therefore, NIST started the design competition for a new hash standard to be
called SHA-3. The SHA-3 competition is currently in its final round with five candidates
remaining. The following is a gist of the tasks that were carried out for the project:
Randomness - A good hash function should behave as close to a random function as
possible. Statistical tests help in determining the randomness of a hash function and
NIST recommends a series of tests in a statistical test suite [?] for this purpose. This
tool has been used to analyze the randomness of the final five hash functions. The
test results indicated no big deviation from the randomness of the candidates.
Performance - It is another one of those critical factors that determines a good hash
function. Performance of the all the fourteen Round 2 candidates was measured
using Java as the programming language on Sun platform machines for small sized
messages. No such tests have been carried out with this combination.
Security - Security is the most important criteria when it comes to hash functions.
Grstl is one of the final five candidates and its architecture, design and security features have been studied in detail. Some of the successful attacks on reduced versions
have been explained. Also, the lesser known candidates, Fugue and ECHO, from
Round 2 have been studied.

vi

Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Hash Function Applications . . . . . . . . . . . . . . . . .
1.3 SHA-3 Competition . . . . . . . . . . . . . . . . . . . . . .
1.4 Background . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Why are MD5, SHA-0 and SHA-1 no longer secure?
1.4.2 Security of Hash Functions . . . . . . . . . . . . . .
1.4.3 Random Oracle Model . . . . . . . . . . . . . . . .
1.4.4 Weaker Notions of Security for Hash Functions . . .
1.4.5 Birthday Attacks . . . . . . . . . . . . . . . . . . .
1.4.6 The Merkle-Damgard Transform . . . . . . . . . . .
1.4.7 Overview of the SHA Family . . . . . . . . . . . .
1.5 SHA-3 Competition . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Round 1 . . . . . . . . . . . . . . . . . . . . . . . .
1.5.2 Round 2 . . . . . . . . . . . . . . . . . . . . . . . .
1.5.3 Final Round . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
1
. . 1
. . 1
. . 2
. . 2
. . 2
. . 3
. . 4
. . 4
. . 5
. . 6
. . 7
. . 9
. . 9
. . 9
. . 10

Implementation . . . . . . . . . . . . . . .
2.1 Randomness Tests . . . . . . . . . . . .
2.1.1 Statistical Tests . . . . . . . . .
2.1.2 Statistical Tests and Computing
2.1.3 ENT . . . . . . . . . . . . . . .
2.1.4 NIST Statistical Test Suite . . .
2.2 Performance Analysis . . . . . . . . .
2.3 Theoretical Security Analysis . . . . . .
2.3.1 Fugue . . . . . . . . . . . . . .
2.3.2 ECHO . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
. .
. .
. .
. .
. .
. .
. .
. .
. .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

14
14
15
16
23
30
39
40
41
45

vii

2.3.3

Grstl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Analysis . . . . . . . . . . . . . . . . . . . . . .
3.1 Statistical Analysis . . . . . . . . . . . . . .
3.2 Performance . . . . . . . . . . . . . . . . . .
3.3 Security . . . . . . . . . . . . . . . . . . . .
3.3.1 Cryptanalysis of Grstl . . . . . . . .
3.3.2 Rebound Attacks on Reduced Grstl .
3.3.3 Internal Differential Attack on Grstl

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.1 Frontrunners among the final five . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

A Matrices . .
A.1 Fugue
A.1.1
A.1.2
A.2 Grstl
A.2.1
A.2.2

. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
SuperMix - Matrix N . . . . . .
Initialization Value - 256 . . . .
. . . . . . . . . . . . . . . . . .
Number of rounds . . . . . . .
Initial Values . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
. .
. .
. .
. .
. .
. .

.
. .
. .
. .
. .
. .
. .

54
54
66
72
72
75
76

79
79
79
79
80
80
80

B Code Listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
B.1 Code for computing hashes for Statistical Testing . . . . . . . . . . . . . . 81
B.1.1 StatisticalHash.java . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Chapter 1
Introduction
1.1

Hash Functions

Hashing is a procedure that takes a string as input and produces another string of predetermined length as output. The output produced is unique and is analogous to a fingerprint.
In other words, when the input is modified then the fingerprint for that input will also be
modified. This property helps in verifying the integrity of a message. Even if the message
is unsecured, its integrity can be verified from time to time by computing its fingerprint to
check if is has been altered. This fingerprint of the message is technically referred to as the
message digest.

1.2

Hash Function Applications

Cryptographic hash functions have a wide variety of security applications because of the
above mentioned property that when the input changes so does the hash of the input. A
significant application is digital signatures or as a MAC (Message Authentication Code).
Suppose there are a couple of people who want to share a private message via an unsecured
channel. In order to achieve this they first need to share a key and this key will determine
the hash function they would be using. Now, the first person sends the message, x and the
hash, y = hash(x). The second person (receiving the message) can verify the integrity
of the message received by computing the hash of x and comparing it with y. One real
world application is the password storage and verification. Passwords are never stored in
plaintext for obvious security reasons. The user password is first hashed and then stored.
During authentication, the hash of the password keyed in is verified against the one that
is stored and if they are similar then the user is authenticated. Other applications include
identifying files on P2P file sharing networks and in generating pseudorandom bits [?].

1.3

SHA-3 Competition

As of 2004, SHA-1 and MD5 were among the most commonly used message digest algorithms. In August 2004, the security of these hash functions was under threat when vulnerabilities were discovered in MD5, SHA-0 and RIPMEND. Even though no weaknesses
in SHA-1 were revealed there were major doubts on its security because of its similarity
to SHA-0 [?]. Added to this, the rate of growth of hardware power and the boom in parallel computing has made the need for a new hash standard all the more important. It was
required to design a hash function considering that it must remain secure in the future in
spite of the growing hardware power. With this in mind, NIST started the competition to
find the new hash standard, which was to be named SHA-3 [?].
The competition is in its final round with five potential winners.

1.4

Background

The need for a new function is clear, since the most commonly used hash functions like
SHA-1 and MD5 are no longer deemed to be secure. So, what constitutes the security
of a hash function? What does a hash function need to have or do to make it secure?
This section tries to answer these questions and also explains the basic steps involved in
constructing a good hash function.

1.4.1

Why are MD5, SHA-0 and SHA-1 no longer secure?

We know that researches have broken MD5 and SHA-0 and they are no longer secure. This
is because collisions have been discovered in MD5 and SHA-0. Simply put, a collision
occurs when two or more inputs produce the same output.
Security experts have known about collisions in MD5 at least since 2004. Since then
there have been more successful attacks and in one such attack, the attacker can assign
himself with Intermediate Certificate Authority (CA) credentials and then generate trusted
certificates used in legitimizing phishing sites [?].
Similar to MD5, full-length SHA-0 collisions were found on August 2004. Finding the
collisions took around 80,000 CPU hours on a computer with a complexity of 251 . With
this in mind and considering the similar architectural structure between SHA-0 and SHA-1,
the use of SHA-1 was being seriously considered. Even though SHA-1 has still not been
broken, NIST wanted to discontinue utilizing SHA-1 before 2011 [?].

1.4.2

Security of Hash Functions

In the previous section, we came across the term collision. Here it is explained mathematically and what features constitute in making a hash function secure. For a cryptographic
hash function the only way to obtain a valid hash pair is by picking a x and finding h(x),
to get y. Ideally, the hash function should be resistant to any other ways of generating the
valid pair (x, y). For this reason the three problems mentioned below should be difficult to
solve or in other words the hash function should be resistant, the capacity to withstand, to
the following:
Preimage
The problem of preimage is solved if given y, x can be found such that h(x) = y. Now,
(x, y) is a usable hash pair and this for a secure hash function should not be feasible.
Second preimage
Given x, a second preimage is solved when any y can be found with x 6= y and h(x) =
h(y). For a function to be considered safe this should not be practically computable.
Collisions
A collision is found when two messages, x and y, are identified in such a way that x 6= y
and h(x) = h(y). Only if a function is resistant to the above three properties it can be
considered as a secure hash [?].
When a hash function is preimage, second preimage and collision resistant it signifies that
a message cannot be modified without modifying its resulting hash value. This is the minimum requirement for a hash candidate. These properties also ensure that when two input
messages produce the same hash output then the messages have to be identical. However,
satisfying these requirements does not imply that the hash function is ideal and will not
be broken by any other attacks. One such attack is the length extension attack. When the
attacker has the length of the message m and its hash value, he can find another message
m and concatenate the two of them to produce h(m||m). This can crack some simple
authentication systems.
Ideally, we would prefer to have even stronger properties for a hash function. Given a
message digest, it should not be viable to obtain a couple of inputs that hash to it. Also,
given the message digest of an input it should not be possible to obtain any useful data from
the hash. Having these properties makes the hash functions exhibit statistical randomness
but still being bound by a function and easy to compute.

1.4.3

Random Oracle Model

A random oracle model is an idealized model for a hash function. A simple way to think
about the oracle model is as a box, a box that takes as input a binary string and returns as
output another binary string. The internal operations of the box are unknown to anyone,
either the honest parties or the adversaries. They can query the oracle by entering a binary
string x as input and receiving another binary string y as output [?]. The box is guaranteed
to be consistent: if the output from the box for an input x is y, then the box will always
output the same y whenever the input is x. In mathematical terms, the box can be viewed as
implementing a function H and our knowledge of the function is constrained to the values
of H on the input strings queried explicitly.
Even though the oracle model introduced by Bellare and Rogaway is abstract it serves
well in the quest for designing a close to ideal hash function [?]. The model provides a
formal methodology that can be used to design and validate cryptographic schemes via the
following two steps:
The first step is to design a scheme that is proven secure in the random oracle model.
We assume there exists a random oracle in the world, then construct and analyze a
cryptographic scheme.
When the scheme needs to be implemented in the real world, a random oracle is not
available. We replace the random oracle in the scheme with an instance of a cryptographic hash function like SHA-1 that has been modified appropriately. That is,
whenever the scheme calls for querying the random oracle for input X, the instantiated hash function is queried.
We would hope that the instantiated hash function used instead of the random oracle is good
at emulating the random oracle, so that the security proof provided in the first step will
carry over to the real-world instantiation of the scheme. However, there is little theoretical
justification for this hope. There are schemes that can be proven to be safe in the random
oracle but are not safe when it comes to the instantiation in the second step. As a result, the
security proof of a scheme in the random oracle should not taken as an exact proof that any
real world instantiation of it would be secure but just as an evidence that the scheme has no
intrinsic design flaws.

1.4.4

Weaker Notions of Security for Hash Functions

Modeling a hash function based on a random oracle model providing absolute security is
rather difficult. Collision resistance is a strong security requirement and is quite difficult

to achieve. However, in some applications it suffices to rely on some relaxed requirements. Second-preimage is a weaker security condition than collision and similarly, preimage is weaker compared to second-preimage. Meaning, if a hash function is resistant to the
stronger security notion then it is automatically resistant to the weaker notion. In the first
case, given an input x, if it is possible to find y such that x 6= y and h(x) = h(y) then it
is feasible to find two colliding inputs x and y. In the second case, given y, if it is possible
to invert y to find x such that h(x) = y then given an input x, compute y = h(x), then
invert y to find x where x = x in which case a second pre-image has been found. So, we
can conclude that collision is the most important security feature and in the order collision,
second pre-image and pre-image - the former implies the latter [?].

1.4.5

Birthday Attacks

The most important security feature of a hash function is to be collision resistant. In order
to avoid collision attacks we can use the birthday paradox, which determines the lower cut
off for the length of the hash output. The following is an algorithm to find collision [?]
Find-Collision (h, Q)
select Y0 Y, | Y0 |= Q
For each y Y0
Do xy h(y)
If xy = xy0 for some y 0 6= y
return (y, y 0 )
else return (failure)
The above algorithm can be analyzed using a probability argument similar to the birthday paradox. The birthday paradox states that if there are 23 random people there is at least
50% probability that two people out of the 23 have the same birthday. Reformulating this
problem gives us a better idea on its relevance to hash functions. Suppose that a function h
has as its domain the set of all living human beings, and for all y, h(y) denotes the birthday of person y. Then the range of h consists of the 365 days in a year (366 days if we
include February 29). When it comes to this function h, finding a collision is analogous
to finding two people with the same birthday. Therefore, according to the birthday paradox the success probability of the above algorithm is at least half when Q = 23 and M = 365.
Theorem [?]: For any Y0 Y with | Y0 |= Q, the success probability of the algorithm is


 

M 1
M 2
M Q+1
=1
...
(1.1)
M
M
M

If we take  = 0.5, solving the equation for Q we get

Q 1.17 M

(1.2)

The detailed proof can be found in [?].


So, we can conclude
that in order to obtain a collision with a probability of 50% we
need to hash a little over M values of Y. With this theorem, we can come to a conclusion
on the minimum size of the message digest required for it to be secure. For a 40-bit message
digest, a collision can be found with a probability of 12 with just over 220 random hashes
making it insecure. The lower bound on the size of the hash output is 128 bits, which
requires over 264 random hashes according to the birthday paradox. To be well on the safer
side, a message digest of size 160-bits or larger is generally recommended [?].

1.4.6

The Merkle-Damgard Transform

The Merkle-Damgard (MD) transform is one of the most popular and commonly used techniques for designing hash functions. The basic principle is to use a compression function
that is resistant to collisions as the focal point of the construction algorithm. Ralph Merkle
first explained this principle in his Ph.D. thesis [?]. Later on, the structure was also proven
to be secure by Ivan Damgard in an independent work. They both showed that if a proper
padding mechanism is used along with a secure compression function then the resulting
hash function will also be secure against collisions. Interestingly, this construction principle has been widely adapted and has been used in popular hash functions like MD5 and
SHA family [?].
The compression function can handle message sizes that are multiples of either 512 or
1024. However, input messages can be of arbitrary size and in order to handle this the
input messages are padded. Padding also helps in preventing length extension attacks by
including the length of the original input message at the end [?]. After passing is complete,
the input is divided into blocks of equal size and then processed one by one. Processing
means passing each block through the compression function and then combining the output
of one block with the next input block.

Merkle-Damgard hash construction

In the above diagram taken from [?], f represents the compression function. This
function takes two inputs that are of fixed length and process them to produce an output of
fixed length. In every step of the function f , the input block is combined with the output
produced so far to produce an intermediate output. To start the compression function we
would need a fixed value called the initialization value (IV). This value is always specific
to a particular algorithm. Once the processing is done, based on the output size required
zeros can be padded to the last block and then the real input length is affixed.
Finally, to further harden the results the final output from the compression function is
fed through a finalisation function. It is a function that is generally constructed based on
the compression function. The finalisation function serves different proposes in different
algorithms. Commonly, it can be used to truncate some bits of the output to bring it to the
required size or it can be used to better mix the output bits [?].

1.4.7

Overview of the SHA Family

SHA-0 and SHA-1


SHA-0 and SHA-1 were both developed by the NSA in 1993 and 1995 respectively. SHA-1
was published by NIST as the U.S. Federal Information Processing Standard in 1995. It is
still the current hash standard and the most widely used hash function. It is an iterated hash
function that has 80 iterations and follows the Merkle-Damgard construction. The input
can be of any length, at most 26 4 but the output is always a 160-bit value. The compression
function, however, can only handle inputs that are multiples of 512. This requires the input
to be padded and the following values are added to the original input in the order specified:
number one, variable number of zeros making the length a multiple of 448, the real input

length as a 64-bit value.


Each block of 512 bits is split into 32 bit words and sent through the 80 rounds of iteration. There are different constants and operations defined for the whole SHA-1 process.
Bitwise and, or, XOR and circular left rotation are some of the operations used. The different constants also vary depending on which iteration is currently being carried out. After
all the 80 iterations are completed, the outputs are five 32-bit words that are concatenated to
produce the 160-bit output [?]. In 2005, mathematical weaknesses were identified in SHA1. Due to these security flaws, there was a need for an even stronger hash function [?].
SHA-2
As weaknesses were identified in SHA-1, SHA-2 was introduced which was an extension
of SHA-1, with increase in the length of the message digest. It can produce outputs of 4
different bit lengths: 224, 256, 384 and 512. Each variation is given a name based on the
hash output length, SHA-length, and together called the SHA-2 family. SHA-2 differs from
SHA-1 in that it uses different shift amount and additive constants.
SHA-256 is computed with 32 bits word size, 256-bits internal state size, 512-bits block
size and 64 rounds. SHA-512 is computed with 64 bits word size, 512-bits internal size,
1024-bit block size and 80 rounds. SHA-224 and SHA-384 are the truncated versions
of 256 and 512 variations, respectively. SHA-2 provides more security according to the
birthday paradox. The birthday paradox states that a message digest of size n produces a

collision with a work factor of approximately n, which enhances the security from 28 0
to 22 56. Even with much better security, SHA-2 is not as widely used since it has a similar
algorithm as SHA-1 and also has implementation difficulties. Along with these difficulties
and with the SHA-3 competition currently in progress, the world is waiting for SHA-3.
SHA-256 is used to authenticate Debian Linux software packages [?] and in
the DKIM message signing standard; SHA-512 is part of a system to authenticate archival video from the International Criminal Tribunal of the Rwandan genocide [?].
SHA-3
Even though SHA-2 hasnt been broken yet there is a need for a new hash standard because
because there are many parallels between SHA-0 and SHA-1 in terms of construction. This
hash function will join already present hash functions and become a part of the Federal
Information Processing Standard (FIPS) 180-3, Secure Hash Standard. The winning candidate of the NIST competition will be termed SHA-3.

1.5

SHA-3 Competition

The competition was officially announced by NIST on November 2, 2007 as an open contest [?] in an effort to find a substitute for SHA-1 and SHA-2. It was similar to the effort
made for the development of Advanced Encryption Standard (AES), which is the current
encryption standard. Being an open contest meant that anyone could make a submission as
long as the requirements described in [?] were met. The deadline for the submissions was
set for October 31, 2008.

1.5.1

Round 1

The day after the official submission deadline, the first round began on Nov. 1, 2008. There
were a total of 64 submissions and out of the 64, 56 are available to everyone. NIST had
set some prerequisites for the candidates, based on this some of them were rejected and the
final number was down to 51. This announcement was made on Dec. 9, 2008 [?].
he first conference took place at Katholieke Universiteit Leuven, Belgium between the
th
25 and 28th of February, 2009. There were two items on the agenda. The first was for
the authors of the 51 hash functions to talk about their respective contenders. The second
was to discuss the conditions based on which the contenders would be evaluated and the
list eventually trimmed down.
The first round of the competition was officially completed when NIST announced 14
candidates for the second round on July 24, 2009.

1.5.2

Round 2

To assess the second round candidates an entire year was alloted. The following is a list of
the fourteen candidates selected for the second round
Blake
Grstl
JH
Skein
Keccak
Blue Midnight Wish (BMW)
CubeHash

10

ECHO
Fugue
Hamsi
Luffa
Shabal
SHAvite-3
SIMD

1.5.3

Final Round

The final candidate conference took place at the University of California, Santa Barbara
between the 23rd and 24th of August, 2010. The agenda was to confer on important aspects
like security and performance of the remaining fourteen submissions based on its own
review and also the feedback received from public reviews. With all this information NIST
selected the final five candidates on Dec. 9, 2010. This effectively started the final round
of the competition. It is expected that the final SHA-3 conference will take place early in
2012 and the final winner will be decided afterwards.
NIST released a document after the completion of Round 1, which had an analysis of
each of the fourteen candidates that were selected for Round 2 [?]. It contained information
about the basic design principles used, performance and any attacks (if known). The information on attacks was summarized based on the analysis done by external cryptanalysts
and based on such analysis; NIST encouraged the more detailed analysis of some of the
less known candidates.
The information on the fourteen hash functions from this report [?] is provided in Appendix A. The following section contains a brief description of the final five candidates
selected as finalists from Round 2. This analysis is based on studying the final document
submitted by each of the candidates to NIST and also contains information about the final tweaks made by each candidate for the final submission. This short description of the
candidates is not a report submitted by NIST and hence is different from the report in the
appendix. The analysis of each candidate can be broadly classified into three sections the hash functions design principles and the components used, description of the compression function and measures taken to ensure security and finally the process of hashing a
message.

11

BLAKE
The authors of BLAKE start their documentation by saying that they havent reinvented the
wheel in their attempt to build BLAKE, meaning that they have used components that have
been widely used before. The three main structural components of BLAKE are composed
of a HAIFA iteration mode, a local wide-pipe internal structure used in the LAKE hash
function [?] and a modified version of the ChaCha compression algorithm.
Modifications have been made to these tried and tested components to overcome some
of the drawbacks they have. For example, HAIFA is a modified version of the MerkleDamgard construction and thus provides more resistance to long message second pre-image
attacks. The ChaCha compression functions security has been strongly analyzed and also
provides good performance because of its parallelization capabilities.
The message to be hashed is first padded and then is processed block-by-block using
the compression function. The compression of each block is governed by the total bits
processed currently and the salt. The initial state is set using the salt and the value of the
counter; it is then processed using the compression function to produce the value for the
next round.
Modifications made for final round: The number of rounds for the 224/256-bit digest
versions is changed from 10 to 14 and for the 384/512-bit digest versions is changed from
14 to 16.
Grstl
Grstl is different from other common and popular hash functions like those in the SHA
family in the sense that it is based on a few individual permutations. However, the differences end there as like many SHA-3 candidates it has many components borrowed from
AES.
The compression function is based on the Merkle-Damgard structure and it is composed
of two permutations P and Q. Both P and Q were inspired from Rijndael but they both
have a much larger internal state. P and Q consist of a number of rounds with each round
in itself comprising of a number of round transformations. Each round transformation is
composed of adding round constants, substituting bytes from the AES S-box, shifting bytes
and mixing bytes.
Hashing a message using Grstl starts by padding the message to the required length,
passing it to the compression function a certain number of times and finally through the
output transformation. The output transformation, XORs the compressed value X with
the permutation P of X and finally truncates this value to the required number of bits in
the output.

12

Modifications made for final round: Certain tweaks have been made to make the permutations P and Q inside Grstl more different. This has been achieved in two ways: one,
changing the shift values in permutation Q and two, using larger round constants in P and
Q.
JH
The design of JH has two main techniques, the first is to construct a large cipher from
small components and the next is to construct a compression function using the large block
cipher.
JH has two main techniques, the first is to construct a large block cipher (LBC) from
small components (SC) and the next is to construct a compression function f using the
cipher from the previous step. A key of fixed value is used along with the block cipher
to built the function f . The input is processed in blocks and a large state is maintained
throughout the process. JH ensures that the size of the internal state is at a minimum
double to that of the hash output. In other words, if the required hash size is n bits then
the state size is at least 2n bits. The 2n-bit block is processed with the n-bit input message
to generate the next block of size 2n bits. The LBCs are derived from SCs and JH uses
the components used in AES for this purpose. The authors believe this AES based design
mechanism is the best to design LBCs from SCs. Hashing a message with JH can be
broken down into three major steps: padding the original input, processing the message
through the compression function but setting the initialization value and finally computing
the output hash value H (N ) (truncation is carried out if necessary).
Hashing a message can be broken down into three steps: starts with the input message being padded, followed by processing it through the compression function and finally
computing the output hash value H (N ) (truncation is carried out if necessary).
Modifications made for final round: The round number of JH is changed from 35.5 to
42 rounds.
Keccak
Keccak can be best described as a family of sponge functions that use a set of seven permutations as the building block. The sponge construction can be described as a function f
that carries out simple iterations on inputs of varying size and produces a digest of variable
size. The iteration is a transformation or permutation that operates on bits of predetermined length. The sponge construction is composed of two stages: an absorption stage
and a squeezing stage. In the first stage, comprises of a XOR operation with the input and
state bits. The result from this operation is fed to the function f . The squeezing phase
also involves application of the function F but its purpose is to truncate the number of bits

13

required by the user for the output.


The permutation of Keccak is nothing but a repeated application of a round function
without a key schedule. There are four versions of the Keccak f , the permutation, as
it can operate on four state sizes and the default being 1600-bit state. The operations used
are bitwise XOR, AND, NOT and rotations. It involves no S-box substitutions or any
mathematical functionalities, except those as bit indices.
Modifications made for final round: The padding rule has been shortened and modified.
The restriction of the supported values of r has been removed. Previously r could only take
values that are a multiple of 8 bits. Now all values 0 < r b are supported.
Skein
Unlike most of the hash functions described above, Skein doesnt incorporate already
known components like AES S-box. Its innovative idea is to build a hash function out
of a tweakable block cipher, Threefish. Every instance of the compression function can be
made unique as Threefish allows configuration data to be hashed with every input block.
The design is based on the Threefish block cipher, which was designed specifically for
the Skein hash function. Along with this, the other two important components are Unique
Block Iteration (UBI) and Optional Argument System.
The most interesting component is the Threefish block cipher and it is based on the
principle that a large number of simple rounds are better than a small number of complex
rounds. Thereby the non-linear mixing function, MIX, central to Threefish is composed of
XOR, addition and rotation by constant on 64-bit words. UBI is a chaining mode that uses
Threesh to build a compression function that maps an arbitrary input size to a fixed output
size.
Skein hashing is built on multiple invocations of UBI. Three calls to UBI are made,
one each for the configuration block, one for each block of the message and the output
transformation. The configuration mode provides flexibility by providing support for tree
hashing and also usage of a key for MAC transformations. The output transformation
allows Skein to produce output of any size.
Modifications made for final round: The only change to the Skein hash function is in
the key schedule parity constant.

14

Chapter 2
Implementation
FRN-Nov07 [?] recognized three important criteria for evaluating the hash functions throughout the competition. They are listed below:
Security.
Cost and performance.
Algorithm and its application.
Much research has been done on the first two aspects mentioned above but it still leaves
many issues to work on.
When it comes to performance, most of the work done so far has been targeted on the
Intel machines and C as the programming language. Talking about security, there are some
candidates, which have been studied extensively and at the same time there are other candidates that havent been investigated much. The following sections provide information
on how some of these issues have been addressed in the project by providing a detailed
explanation about the choices made and the tools used for implementation.

2.1

Randomness Tests

Hash functions are typically designed with the explicit criteria that they should be resistant
to collisions, preimage and second preimage attacks.Also, the bits in the output should be
uniformly distributed. This often is extended by a requirement that a hash function behaves
indistinguishably from a random function. Hence, a hash function must act like a random
function but should have good performance and be deterministic.
Randomness is important because more diverse and random the results are, the less are
the chances of finding patterns leading to collision. Statistical randomness tests are used to
determine the randomness of a hash function. When a sequence has no reoccurring patterns
then it is said to exhibit statistical randomness. The throw of a dice is a good example
for statistical randomness. The outcome of the dice throw is random and the next result

15

cannot be determined using the previous outcomes. However, statistical randomness may
not necessarily imply objective unpredictability , which means true randomness. In the
case of hash functions pseudorandomness is sufficient. Sequences that exhibit statistical
random behavior but are being produced by completely deterministic process are called
pseudorandom sequences. The advantage is that they can be generated much easily than
a genuine random sequence and when used again and again they produce the exact same
numbers. The algorithm used for generating such numbers is called the pseudorandom
number generator (PRNG) or deterministic random bit generator (DRBG). The sequence is
not truly random because only a small set of initial values called the PRNGs state is used
to generate them. Even though PRNGs is not ideal and may even be insecure, there are
certain features that can make them cryptographically secure.
The design of cryptographically secure pseudorandom number generators (CSPRNG)
can be classified into three broad categories
A CSPRNG can be obtained by running a secure block cipher in counter mode. A
random key is chosen and first a zero is encrypted, followed by a 1, 2 and so on.
The counter need not necessarily start at 0; it can start at an arbitrary number. However good the CSPRNG construction is, the attacker must never know the key and
plaintext values else all security is compromised.
A CSPRNG can also be dervied from the secure hash of a counter. Similar to the
previous case, the initial value of the counter should be random and should not be
divulged. However, the use of this method is not trusted by some authors as it hasnt
been studied extensively.
One good CSPRNG can be used to generate another good CSPRNG. The pseudorandom sequence of bits is XORed (commonly used) with the bits of the input text and
if this procedure is run on a counter it produces a new sequence. Understandably, the
new sequence is only as good as the old sequence. Finally, as in previous cases, the
initial state should not be divulged.
The design principles of each of the SHA-3 candidates is based on one of the above
three methods and therefore, NIST has provided a set of tools that statistically test for
random number generators and are also publicly available for use [?]. Before we delve into
the tool for randomness tests we need to understand what statistical testing is.

2.1.1

Statistical Tests

The aim of a test of hypothesis is to determine if a certain statement is true or false. In


order to explain it better we will follow an example provided in [?]. A company claims

16

that the mean value of a certain pollutant it releases into the atmosphere is 3 parts per
million. An investigator from the Environmental Protection Agency (EPA) wants to verify
this claim. If 3 parts per million is the average limit allowed by the EPA, the investigator can decide if the company is violating the law by using sample data (daily pollution
measurements). That is, he needs to decide if > 3.
The method used in the test of hypothesis is similar to the proof by contradiction. The
claim we want to support is that > 3, where is the true mean level of pollution, in
parts per million. This is the alternative/research hypothesis Ha . The contradicting
theory is the null hypothesis H0 . This states that is at most equal to 3. By obtaining
sample evidence against the null hypothesis, H0 : 3, we hope to show support for the
alternative hypothesis, Ha : > 3.
We determine whether the null hypothesis is to be rejected or accepted based on the
test statistic, which is computed from the sample data. Lets suppose we plan to base
our decision on a sample size of n = 30 daily pollution readings. If the sample mean y
of the n sample readings is much larger than 3 then we conclude that > 3 and tend to
reject the null hypothesis. There may also be situations where there is no sufficient proof to
reject the null hypothesis. For example, the value of the sample is less than 3, say y = 2.7.
Thus, in this example the sample mean y serves as the test statistic. We can divide the
values that y can assume into two sets. One set of these values, say y 3.1, will fall under
what is called the rejection region for the test. This is because such values will lead to
the rejection of the null hypothesis and acceptance of the alternative hypothesis. Therefore,
this particular rejection region of y 3.1 combined with the null and alternative hypothesis
represents one particular test possessing specific properties. We can obtain a different test
with different properties if the rejection region is changed to y 3.5 [?].
A statistical test consists of four elements summarized as follows
Null Hypothesis, H0 , about one or more population parameters.
Alternative Hypothesis, Ha , that will be accepted if the null hypothesis is rejected.
Test statistic, computed from sample data.
Rejection region, indicates the test statistic values that lead to rejecting the null hypothesis.

2.1.2

Statistical Tests and Computing

In this section we describe some of the statistical concepts that will be used

17

Entropy
When it comes to predicting the outcome of a random variable we may lack certainty but
the degree of uncertainty is not the same for all cases. The uncertainty depends upon the
random variable in question. Lets consider an example to illustrate this point; consider a
random variable X that can take two values, zero and one, with probabilities of p and q. We
would feel less certain about the value X would take if the probabilities were p = 0.99 and
q = 0.01 when compared to the probabilities p = 0.6 and p = 0.4. It can be inferred that
the uncertainty will be maximum when the probabilities are the same while there will be
certainty when one of the probabilities is zero.
Consider a random variable X with probability mass function f (x) defined on n values
x1 , x2 , ..., xn , with f (xi ) = pi , for i = 1, 2, ..., n. As mentioned above, the uncertainty is
increased when the probabilities pi are more identical than when one or two of them have
a larger probability rate. A measure of uncertainty should satisfy the following criteria:
Uncertainty is nonnegative and equal to zero if and only if there exists i such that
pi = 1.
Uncertainty is maximum when the outcomes of X are equally likely.
If two random variables X and Y possess n and m as their equally likely outcomes,
respectively, with n < m, then the uncertainty of X is less than the uncertainty of Y .
Uncertainty is a continuous function of p1 , p2 , ..., pn , meaning that a minor variation
in the probabilities will only result in a minor variation in the uncertainty.
We can define entropy by providing a formalization of uncertainty, which satisfies
the four preceding criteria. Let X be a discrete random variable possessing codomain
{x1 , x2 , ..., xn } and probability mass function
f (xi ) = P (X = xi ) = pi

(2.1)

for i = 1, 2, ..., n. Then the entropy of X, is


H[X] =

n
X
i=1

pi log2 pi =

n
X

pi log2 (p1
i )

(2.2)

i=1

where pi log2 pi = 0 if pi = 0. H[X] is measured in bits.


When the definition of entropy is considered for a variable X, it only involves the
probability point masses but ignores the manner in which they are distributed on the x axis.
As a result, from the perspective of entropy any two random variables having the same
probability masses cannot be distinguished.

18

Monte Carlo Simulation


A systems characteristics are determined by the characteristics of its components. Mathematically, analytical methods can be applied to explain system behavior in terms of component behavior but practically analytic methods have severe limitations. The mathematics
can also become complicated especially when facing problems like evaluating the performance of large systems, where the systems performance in itself is dependent upon many
independent small components. One way of handling such problems is to employ a technique known as Monte Carlo simulation.
Instead of representing a system distribution by an analytic approach, we can take measurements by observing the system in action and then use these measurements to arrive
at estimates of important characteristics. For example, rather than using a mathematical
model and probabilistic analysis to estimate the time of failure of a computer network, we
can collect failure data by observing the system in action and then estimate the failure time.
However, the disadvantage of this approach is that it can be time consuming and expensive,
even more so in large complex systems. One way around the problem is to build a computer model of the system and simulate its behavior by using the knowledge of component
distributions to generate synthetic system data. When compared to observing the system
in action, generating synthetic data is much faster and cheaper. We can generate the performance of the system and the statistical results can be analyzed. Another problem that
arises from this solution is the manner in which the computer is to simulate observations
governed by a particular random variable.
The key to the computer generation of random values is the simulation of the uniform
distribution U defined on (0,1). In practice, we use pseudorandom generators to simulate
actual randomness. Given a method to generate random values corresponding to U , we
can employ these values to generate values of other distributions. One way of doing so is
shown below. Suppose F is a continuous probability distribution function which is strictly
increasing and U is the uniform random variable over (0,1). Then
X = F 1 (U )

(2.3)

is a random variable possessing the probability distribution function F .


Monte Carlo Value for
One of the simplest techniques used to explain the Monte Carlo method is the hit and
miss. Based on the hit and miss technique, a simple experiment is explained below that
is used to estimate the value of . To understand the Monte Carlo value of , lets consider
a unit circle circumscribed inside a square. Consider one quadrant of this circle as shown
in the diagram:

19

Among the total number of darts hitting the region inside the square, there is a ratio
between the darts hitting the shaded region to the area of that region. Meaning,
1
r2
no. of darts hitting shaded area
1
= 4 2 =
no. of darts hitting inside square
r
4

=4

no. of darts hitting shaded area


no. of darts hitting inside square

(2.4)
(2.5)

Any dart throw that falls in the shaded area is called a hit. From the above formula
we can see that the value of is four times the ratio of hit to throws. If this experiment is
carried out practically, it would take a lot of throws (at least a thousand) to obtain a good
value of .
Instead of carrying out the experiment practically we can use a computer to generate
throws and use it to determine the value of . For each throw, we can generate a X and
Y coordinate and determine if this point lies inside or outside the shaded area. To determine this we can estimate the distance between the point and the center and if it is greater
than one, it is miss. Generating 1000 or more values results in a decent estimate of .
The goodness of the value depends only the total throws and not the generated random
values. [?].
Chi-Square Distribution
To compute the chi-square distribution, consider a sample from the population with size n
and standard deviation (SD) of . Let the SD of the sample we have selected be s. Now,
chi-square is calculated using the formula
X 2 = [(n 1) s2 ]/ 2

(2.6)

20

The sampling distribution for the chi-square statistic can be obtained by repeating this
experiment an infinite number of times.
Y = Y0 (X 2 )v/21 eX

2 /2

(2.7)

Y0 is a constant with area under the curve as one. Its value also depends on the degrees of
freedom v. e is the inverse of natural log and has a value equal to 2.718281828.
The figure taken from [?], shows the different distributions for different samples and
degrees of freedom. The red, green and blue curves have sample size of 3, 5, 11 with degrees of freedom 2, 4 and 10, respectively.

The four important properties of the distribution are given below:


= v, where is the mean and v is the number of degrees of freedom.
2 = 2 v, where is the standard deviation of the normal population.
Y has its maximum value when the degrees of freedom 2 and X 2 = v 2.
With increasing degrees of freedom, the curve edges towards becoming a normal
distribution.
Procedure for goodness-of-fit test
The hypotheses for the goodness of fit test concern the manner in which the observed
data fits the hypothesized distribution. This can be done by analyzing the degree to which
observed frequencies match the expected frequencies. Thus, we can set the null hypothesis to state that there is no considerable variation between the two frequencies while the
alternative hypothesis will state otherwise.

21

The next step is to compute the goodness of fit value using the formula
(O E)2
X =
E
2

(2.8)

In the above equation, O and E represent the observed and expected values, respectively.
To determine the degree of freedom in the test we need to know the sample distribution.
The table 2.1 shows the various distributions and their degrees of freedom.
Type of distribution
Binominal distribution
Poisson distribution
Normal Distribution

N of constraints
1
2
3

Degree of freedom
n1
n2
n3

Table 2.1: Degrees of freedom for different distributions.

The final step is analyzing the hypothesis test where the calculated goodness of fit value
and the values from the table are compared. We consent to the null hypothesis if the test
value is smaller than the table value. If its the other way around then the null hypothesis is
not accepted.
Correlation Coefficient
It evaluates the closeness of the relationship among a couple of parameters. A variation
of this is the Pearson product-moment correlation, which evaluates only the linear
relationship and is the most common of them all. The correlation coefficient of a sample
and a population are represented by r and , respectively. The following hints show how to
interpret a correlation coefficient [?]
The value of the correlation coefficient lies between -1 and 1.
The sign describes the direction and the absolute value describes the magnitude.
A stronger relationship is indicated by a higher absolute value.
Since the range is between -1 and 1, those edges represent the strongest relationship
while 0 represents the weakest.
A positive relationship is indicated by one value getting longer as a result of another
getting longer while a negative relationship is indicated by one value getting smaller
when the other gets longer.

22

Another point to stress is that when the pearson correlation coefficient between variables is 0, it doesnt mean that there isnt an association between them but it just means
that there is no liner association. The formula to calculate the product-moment correlation
coefficient (r) is
r
X
X  X 
r=
(xy)/ [
x2
y2 ]
(2.9)
where where x = xi x and y = yi y. xi and yi represent the ith observation value for
x and y, respectively. x and y represent the mean values.
Example: This example is taken from the tuorial in [?]. We need to find the correlation of
X and Y with values shown in table 2.2.
X Values Y Values
60
3.1
61
3.6
62
3.8
63
4
65
4.1
Table 2.2: Observed input values.

Step 1 : Total number of values, N = 5.


Find XY, X 2 , Y 2
P P P
P
P
Find X, Y, XY, X 2 , Y 2
P
X = 311
P
Y = 18.6

X V alues Y values
60
3.1
61
3.6
62
3.8
63
4
65
4.1

X Y
60 * 3.1 = 186
61 * 3.6 = 219.6
62 * 3.8 = 235.6
63 * 4 = 252
65 * 4.1 = 266.5

X X
Y Y
60 * 60 = 3600 3.1 * 3.1 = 9.61
61 * 61 = 3721 3.6 * 3.6 = 12.96
62 * 62 = 3844 3.8 * 3.8 = 14.44
63 * 63 = 3969
4 * 4 = 16
65 * 65 = 4225 4.1 * 4.1 = 16.81

Table 2.3: Compute product and squares of the values.

23

P
XY = 1159.7
P 2
X = 19359
P 2
Y = 69.82
Now, substitute in the formula
Correlation(r) =
p P
P
P
P
P
P
P
= [N XY ( X)( Y )/ p[N X 2 ( X)2 ][N Y 2 ( Y )2 ])
= ((5)(1159.7) (311)(18.6))/
[(5)(19359) (311)2 ][(5)(69.82) (18.6)2 ]
p
= (5798.5 5784.6)/ [96795 96721][349.1 345.96]

= 13.9/ 74 3.14
= 13.9/15.24336
= 0.9119

2.1.3

ENT

ENT, a pseudorandom number sequence test program, is one of the tools recommended by
NIST for statistical analysis [?]. It has a total of five tests and these tests are applied to the
input provided in the file. ENT is useful in situations that place importance on the density
data of the input like compression algorithms. ENT can handle data either in the bit or byte
format but the input file is always in the form of bytes. It has a -b option, which treats the
byte stream as a sequence of bits. The results are produced on the standard output. The
following is an example of one such output obtained from the hashed value of a small java
code.
Entropy = 5.625000 bits per byte.
Optimum compression would reduce the size of this 64 byte file by 29 percent.
Chi square distribution for 64 samples is 288.00,
and randomly would exceed this value 7.61 percent of the times.
Arithmetic mean value of data bytes is 119.9531 (127.5 = random).
Monte Carlo value for Pi is 3.600000000 (error 14.59 percent).
Serial correlation coefficient is -0.036720 (totally uncorrelated = 0.0).
Entropy
In computing, entropy can be defined as the randomness collected by an application to be
used in cryptography or places using random data. In data compression, entropy may refer
to the randomness of the data being inputed to the compression function. The more the
entropy, the lesser the compression ratio. That means the more random the text is, the
lesser you can compress it.

24

In ENTs statistical analysis, the entropy test decides the density of the information of
the input and provides the output as number of bits for every byte in the file. A large value
indicates that the file is packed with data. The output for the sample file indicates that the
file would have been compressed by at least another 29 percent if it was an optimal compression. An ideal output should have information that is highly dense and compression
should not be able to reduce the size further.
Chi-Square Test
In ENT, the chi-square result is expressed as two values: one as an absolute value and
another as a percentage. The results show the closeness of the input data to a genuine
random sequence, a high percentage indicates that the data is random. If the percentage
result is < 1 or > 99 then the data can be considered as non-random. Similarly, if 95 <
f > 99 and 1 > f < 5 then the data is considered as being on the borderline.
The value of 7.61% in the above sample output indicates that the value is dangerously
close to being classified as being suspect in randomness.
Arithmetic Mean
To calculate the mean we simply add all the bytes and divide the value by the file length.
Input that is random would produce an output of 127.5. The bigger the deviation from this
value, the more non-random the value is.
In the sample output above, the arithmetic value of nearly 120 indicates that the data is
reasonably random as it quite close to the ideal 127.5 value.
Monte Carlo Value for
The Monte Carlo principle explained in section 2.1.2 is used to calculate the result. The
bits of consecutive six bytes are treated as horizontal and vertical co-ordinates to generate
points on the square. Based on their position with respect to the radius of the circle they
are determined as hits or misses. The percentage of hits is used to calculate the value of
and for large streams that are close to random, this represents a more accurate assessment
of .
The value generated in the output is 3.6, which has an error of 14.6%. The error indicates how far off the output value is from the value of , 3.14159. Essentially, the output
is random as the error percentage is not very high.

25

Entropy
Optimum compression
Chi square distribution

0.543564 bits per bit


Would reduce the size of this 80 bit file by 45 percent.
For 80 samples is 45.00, and randomly would exceed this
value less than 0.01 percent of the times.
Arithmetic mean
0.1250 (0.5 = random).
Monte Carlo value for
4.000000000 (error 27.32 percent).
Serial correlation coefficient -0.142857 (totally uncorrelated = 0.0).
Table 2.4: ENT output for two sequences, 0123456789 and 2222222222.

Serial Correlation Coefficient


The serial correlation coefficient assesses the dependence of one byte over the next. As
described before this value can fall between -1 and 1 but for random sequences it is expected
to be close to zero. The value would be zero if there is no linear association between the
successive bytes.
The serial correlation value for the given output is -0.03 and being very close to zero it
can be deemed as random. All the above results calculated for each function will help in
determining if the output is random.
ENT Analysis
Before going ahead and using ENT to statistically analyze the SHA-3 candidates, there
was a need to understand ENT. We have seen the theoretical aspects related to ENT above
but we also need to understand it practically. In the sense, how its output is influenced by
input and more importantly how accurate (or reliable) the results provided by ENT are. To
understand this, certain simple inputs were given to ENT and we can analyze the output
here. Along with this experimentation, results and material available online were also used
in coming to a conclusion.
To establish a base line for the tests, two sets of input were fed to the ENT. The first
input was a sequence of 2s (Ex. 22222222) and second input was the range of numbers
from 0-9 (Ex. 0123456789). The output produced for both these inputs were the same and
is shown in table 2.4.
It is clear from the input that the data is not random in any way and the output from
ENT reflects this. There is 0.01% chance that the data would deviate from these values,
indicating that it is totally non-random. Other values like compression percentage and
arithmetic mean add weight to this prediction. With that established, the second input used
was meant to be more random. The in-built random function in the Java programming

26

language was used to generate ten values and they were written to a file. This file with the
random numbers was fed to the ENT and it produced the output in table 2.5.

Entropy
Optimum compression
Chi square distribution

0.992774 bits per bit.


Would reduce the size of this 80 bit file by 0 percent.
For 80 samples is 0.80, and randomly would exceed this
value 37.11 percent of the times.
Arithmetic mean
0.5500 (0.5 = random).
Monte Carlo value for
4.000000000 (error 27.32 percent).
Serial correlation coefficient -0.111111 (totally uncorrelated = 0.0).
Table 2.5: ENT output for input generated using the Java random function.

The Chi Square value gives us a 37.11%, which places it well in the safe zone and
confirms it is random (see above). The file cannot be further compressed indicating a high
entropy, which also indicates the data is random. The ideal mean value is 0.5 to be considered random and this produced an output of 0.55. Also the serial correlation coefficient,
which determines the extent to which one bit is dependent on another, produced a value
closer to the ideal 0. The only deterring value is that the Monte Carlo error percentage is
quite high. We know that the random function is not ideally random and this is backed up
by the results.
Hotbits is a free online source that generates true random numbers. The numbers are
produced based on the intrinsic ambiguity in the mechanical laws of nature. The time
frame between two radioactive decays is measured to produce the numbers. These bits
generated are more random than those produced by the random function in Java. Some
statistical tests were carried out on these HotBits to determine their randomness and ENT
was one of the tools used for the test. Complete information can be found here [?] but we
will concentrate only on the ENT results. A data set of size 1,468,800 bytes was used and
the results are shown in table 2.6.
Except for the Chi square percentage, other results generated are very close to being
ideal. Meaning, the data is highly random. Based on these tests and other comprehensive
randomness tests, the authors determine that while ENT can recognize patterns that indicate
significant deviations from randomness, it may be miss those minute changes that can be
recognized by more thorough test packages. The test results produced for the input having
numbers from 0-9 also adds weight to this conclusion (shown above).
Conclusion: Based on the results and information gathered, we can determine that
ENT does not produce perfect results and it can give errors based on certain inputs (data

27

with subtle differences). With this in mind, it is imperative to analyze each of the five test
results produced by an ENT run with equal weight before determining the randomness of
the given function or data. If all or most of the tests result in poor values then the input can
be safely categorized as not random or as a poor hash function.
Dealing with bits and the bit option
A small set of experiments was conducted to see how ENT deals with bits and the bit
option (-b, data is dealt as bits with each byte treated as 8 bits) it provides. In the first set of
experiments, the output is written to a file in the form of bits. The input is obtained in the
form of bytes and each byte is converted to bits using the BitSet class in Java. The resulting
bits are converted to an int array consisting of 1s and 0s. This int array is finally written
to file and fed to the ENT as the input. The following are the three inputs that were given
to the ENT
Constant values: Ten bytes of value 10. Converted to bits the value is
0000101000001010000010100000101000001010
0000101000001010000010100000101000001010.
Java Random function: Ten bytes produced using the random function in Java. One
run of the function produced the following values and these were used in the test 49, 98, 16, 103, 62, 36, 27, 99, 126, 37. In bits the value is
0011000110011110111100001001100111000010
0010010000011011100111010111111000100101.
HotBits (see above): Ten bytes from the HotBits function produced
83, 72, 80, 4, 128, 54, 46, 34, 126, 20. The bits equivalent is
1010110110111000101100000000010010000000
0011011011010010110111100111111000010100.

Entropy
Optimum compression

7.999975 bits per byte.


Would reduce the size of this 11468800 byte file by 0 percent.
Chi square distribution
For 11468800 samples is 402.53, and randomly would exceed this value 0.01 percent of the times.
Arithmetic mean
127.5423 (127.5 = random).
Monte Carlo value for
3.141486168 (error 0.00 percent).
Serial correlation coefficient -0.000053 (totally uncorrelated = 0.0).
Table 2.6: ENT output for input from HotBits.

28

As mentioned above, the bits representation of the three inputs were written into the
file. Surprisingly, when these were run with ENT they all produced the same result that is
shown in table 2.7.
Entropy
Optimum compression
Chi square distribution

0.954434 bits per bit.


Would reduce the size of this 640 bit file by 4 percent.
For 640 samples is 40.00, and randomly would exceed this
value less than 0.01 percent of the times.
Arithmetic mean
0.3750 (0.5 = random).
Monte Carlo value for
4.000000000 (error 27.32 percent).
Serial correlation coefficient 0.013333 (totally uncorrelated = 0.0).
Table 2.7: ENT output with input in bits format.

Clearly, all three inputs have varying degrees of randomness ranging from constant
values to values generated from radioactive decay. ENT is unable to distinguish between
these input differences and hence produces the same statistical analysis. As all the input
values consist and just 0s and 1s, the difference is only in the order in which the 0s and
1s appear and hence is unable to distinguish small changes in values.
In the second set of experiments, the same three inputs were used but the input was
written to the file in the form of bytes and not as bits. These bytes were fed to the ENT and
the -b option was used. The three outputs generated are shown in tables 2.8, 2.9 and 2.10.

Input
Entropy
Optimum compression
Chi square distribution

Constant bytes
0.811278 bits per bit.
Would reduce the size of this 80 bit file by 18 percent.
For 80 samples is 20.00, and randomly would exceed this
value less than 0.01 percent of the times.
Arithmetic mean
0.2500 (0.5 = random).
Monte Carlo value for
4.000000000 (error 27.32 percent).
Serial correlation coefficient -0.333333 (totally uncorrelated = 0.0).
Table 2.8: ENT output for constant values with -b option.

These results are a more accurate reflection on the inputs. The entropy value of x
bits per bit indicate that the input is treated as a sequence of bits and not as bytes. The
Chi-Square results clearly indicate the randomness of the inputs. The first value of 0.01%
indicates that the values are not random at all while the other percentages of 50.23 and 37.11

29

Input
Entropy
Optimum compression
Chi square distribution

Java Random function


0.995939 bits per bit.
Would reduce the size of this 80 bit file by 0 percent.
For 80 samples is 0.45, and randomly would exceed this
value 50.23 percent of the times.
Arithmetic mean
0.4625 (0.5 = random).
Monte Carlo value for
4.000000000 (error 27.32 percent).
Serial correlation coefficient 0.044626 (totally uncorrelated = 0.0).
Table 2.9: ENT output for Java random function with -b option.

Input
Entropy
Optimum compression
Chi square distribution

HotBits
0.992774 bits per bit.
Would reduce the size of this 80 bit file by 0 percent.
For 80 samples is 0.80, and randomly would exceed this
value 37.11 percent of the times.
Arithmetic mean
0.4500 (0.5 = random).
Monte Carlo value for
4.000000000 (error 27.32 percent).
Serial correlation coefficient 0.040404 (totally uncorrelated = 0.0).
Table 2.10: ENT output for HotBits with -b option.

30

indicate good level of randomness. Similarly, for the first input the file can be compressed
further indicating that the data is non-random and its size can further be reduced. On
the other hand, compression cannot reduce the size of the other inputs, which is a good
indicator of randomness.
Conclusion: In both the experiments the aim was to treat the input as bits. With the
outputs obtained, analyzing them reveals that ENT treats them both in different ways. Inputting the data as bytes and then treating them as bits using the b option in ENT produces
proper results from the test. However, inputting them as 0s and 1s does not. One possible
reason is that since the default input type is bytes, ENT treats each 0/1 as a separate byte (
For ex., it treats 0 as byte 0 rather than as a 0 bit that is part of a byte). So, even if the b
option is used each 0 and 1 is treated as a byte and is converted to a stream of bits.

2.1.4

NIST Statistical Test Suite

ENT is a simple tool that has five standard tests for randomness and they are all straightforward mathematical metrics. The tests are able to identify patterns that hugely deviate
from non-random behavior but are unable to recognize minute changes that may be recognized by more comprehensive tests. NIST has provided one such comprehensive tool,
a statistical test suite for random and pseudorandom number generators for cryptographic
applications [?]. NIST believes that these set of tests will be able to recognize any nonrandom behavior from the sequence. Even though this test suite is not specifically designed
to deal with hash functions, the tests are very relevant in determining the randomness of
hash functions as an ideal hash function should not be discernible from a random function.
The test suite has a total of 15 tests. Each test produces one value called the P-value,
which in statistical terms is called the test statistic of the hypothesis. This P-value is like a
summary of the overall test result and helps in deciding if the null hypothesis (being that the
sequence is random) is accepted or rejected. The P-value is a probability and it determines
the chance of a random number generator producing a sequence that is more non-random
than the sequence currently tested. P-values of 1 and 0 indicate perfect randomness and
complete non-randomness, respectively. Generally, we also utilize a significance level to
decide the results of the test. The significance level is the Type I error in a statistical test and
its value is chosen between [0.001, 0.01]. The test is considered successful if the P-value
is greater than or equal to the significance level. If it is not, the null hypothesis is rejected
and the sequence is considered non-random.
For a sequence that is random, a value of 0.001 indicates that probably 1 in 1000
sequences would be rejected. A P-value greater than this value would help us
categorize the sequence as random with a 99.9% confidence. For a P-value lesser

31

than the value, we would categorize the sequence as non-random with the same
confidence.
If the value is 0.01, the confidence percentage would be 99% and everything else
remains the same as above.
Description of tests
There are a total of fifteen tests in the test suite and a brief description of these tests is
provided below. Frequency and Longest Runs tests will be explained in detail with mathematical examples and the procedure to compute P-values. The test descriptions and the
examples provided are obtained from [?].
Frequency (Monobits) Test: The ratio of ones and zeros in a purely random number
is the same, meaning exactly half of the total bits in a sequence are ones and the other
half are zeros. This test determines this ratio for the input sequence and compares it
to the ratio of a random sequence.
Test Description: The bits in the input sequence () are added by equating the zeros
to 1 and ones to +1.
Sn = X1 + X2 + + Xn , where Xi = 2i 1.

(2.10)

For example, if = 1011010101, then n = 10 and


Sn = 1 + (1) + 1 + 1 + (1) + 1 + (1) + 1 + (1) + 1 = 2.

(2.11)

Compute the test statistic

In this case,

| Sn |
Sobs =
n

(2.12)

|2|
= .632455532.
10

(2.13)

Finally, compute

P value = erf c

Sobs


,

(2.14)

where erfc is the complementary error function.


P value = 0.527089.

(2.15)

32

With the P value > 0.01, the sequence is considered random.


Test for Frequency within a Block: The previous test is extended to N bit blocks.
The test determines if the number of ones in a block of size N bits is almost the
same as number of bits in N/2.
Runs Test: A run is defined as a continuous series of a similar bit; it can be a 0 or
a 1. A run of length l means that there are exactly l identical bits and these bits are
preceded and followed by a bit of the opposite value. This test determines the total
number of such runs in the input sequence and compares it to a random sequence. It
verifies that the alternations between the zero runs and one runs is not too fast or too
slow.
Fast alterations would have too many runs. For example, in the sequence 101010,
there is a change in every other bit. In slow alternations, the length of the run is long
and the number of runs will be less, lesser than anticipated from a random sequence.
For example, consider a sequence having fifty ones followed by eighty zeros and
twenty ones. With the bit count totaling to 150, the expected runs count is 75 but
there are just 3 runs.
Longest Run of Ones in a Block: Similar to the previous test but the test is extended
to M bit blocks and the focus is on just the runs of ones. The test determines if the
length of the longest ones run in the sequence is as expected for a longest ones run of
a random sequence. Only the test of longest run of ones will suffice because if there
is an irregularity with longest run of ones then there is bound to be an irregularity
with the longest run of zeros as well.
Test Description: Initially, divide the sequence into M blocks.
Let input = 11001100000101010110110001001100111000000000001001
00110101010001000100111101011010000000110101111100110011100110
1101100010110010.
The sequence length n = 128.
If M = 8, the input can be processed into sub-blocks as shown in table 2.11.
Next, calculate the frequency vi of the longest runs of 1 in each block.
v0 = 4; v1 = 9; v2 = 3; v4 = 0.

(2.16)

v0 represents the number of max-runs = 1, v1 represents the number of max-runs = 2


and so on.
Caluclate, 2 (obs). In M bit blocks, it measures the closeness of the length of the

33

Subblock
11001100
01101100
11100000
01001101
00010011
10000000
11001100
11011000

Max-Run
(2)
(2)
(3)
(2)
(2)
(1)
(2)
(2)

Subblock
00010101
01001100
00000010
01010001
11010110
11010111
11100110
10110010

Max-Run
(1)
(2)
(1)
(1)
(2)
(3)
(3)
(2)

Table 2.11: Number of continuous 1s in a subblock.

longest run to that of the expected longest run length.


2

(obs) =

K
X
(vi N i )2

(2.17)

N i

i=0

where i represents the theoretical probabilities and their values are specified below.
The values of N and K are obtained from the table 2.12 based on the value of M .
If k = 3, the i values are shown in table 2.13.

M
8
128
104

K
3
5
6

N
16
49
75

Table 2.12: K and N values based on M.

Using the formula,


2 (obs) = 4.882457.

(2.18)

Finally,
k 2 (obs)
P value = igamc
,
2
2
Using this formula we get P-value as 0.180609.



.

(2.19)

34

Classes
v1
v=2
v=3
v4

Probabilities
0 = 0.2148
1 = 0.3672
2 = 0.2305
3 = 0.1875

Table 2.13: Different probability values.

With P-value > 0.01, the sequence is considered as random.


Binary Matrix Rank Test: Determines the linear dependence of string subsets that are
of the same length in the sequence. The focus is the rank of disjoint sub-matrices of
the entire sequence.
Discrete Fourier Transform (Spectral) Test: In the discrete fourier transform, the
focus is on the maximum heights. The test detects non-randomness in the sequence
if there are repeating patterns that are adjacent to one another.
Overlapping (Periodic) Template Matching Test: The count of the longest runs of one
determines if the input data is rejected as not being random or not. The sequence is
accepted when the number of runs is close to that anticipated from random sequence.
The test is carried out by maintaining a m bit substring and in the substring it
looks for a pattern. The pattern search continues by shifting the substring by one bit,
irrespective of whether the pattern is found or not.
Non-overlapping (Aperiodic) Template Matching Test: It is similar to the previous
test except for the fact that when a pattern is found, the substring is set to start from
the bit next to the pattern.
Maurers Universal Statistical Test: The purpose of this test is to determine whether
the sequence can be compressed significantly. Any sequence that can be overly compressed is considered as non-random.
Linear Complexity Test: To evaluate the randomness of the sequence, the test measures the length of the generating feedback register. Longer and shorter feedback
registers imply random and non-random behavior, respectively.
Serial Test: To make a decision on the randomness of the sequence, the number of
overlapping m bit substrings are counted and compared to what is anticipated of a
truly random sequence.

35

Approximate Entropy Test: Concentrates on the rate of each and every overlapping
m-bit pattern. This value is compared to the predicted random sequence value.
Cumulative Sum Test: A random walk is described as the sum of the sequence where
a zero is equated to -1 and a one is equated to +1. For a truly random sequence, this
sum is expected to be near zero and this test determines the difference of the sum
of the input sequence from that of the random sequence. For non-random sequences
this value will be very huge.
Random Excursions Test: The description for a random walk is specified in the cumulative sum test. In a random walk, if a series of steps start and finish at the same
point it is called a random excursion. Each step in the series is called a state and random exursions verifies if the number of times this state is frequented in one random
walk is the same as the predicted value from a random sequence. If they are almost
the same, the sequence is accepted as random.
Random Excursions Variant Test: This test deals with the states in a random walk. It
counts the incidences of each state in a walk and determines if it varies from what is
anticipated of random sequence.
Using the Tool:
A step-by-step guide on installing the tool is provided in the documentation [?]. For convenience, we will assume the tool is installed and ready to run. The next thing we need to
know are the input parameters that are to be provided.
Data - The data to be tested is presented in a file. The data can either be in bits or as
a binary string in hex format.
Sequence Length - The length of the sequence to be assessed from the data file should
also be mentioned. It is generally recommended that the sequence length is greater
than 106 bits.
Bit Stream - Determines the sample size. For example, if selected to be 10 then ten
sequences will be parsed from the input file. For the experiments carried out for the
project, the bit stream is selected as 1.
The tool also has the option to select a subset of the available statistical tests to run. When
all these options are selected, the test is run and the results can be analyzed.
Lets consider an example with the following input parameters:
Data - Hash of numbers 0-3999 computed using the BLAKE-256 hash function. The
file consists of 1,024,000 bits.

36

Sequence Length - The entire length, 1024000, is chosen.


Number of Bit Streams - 1.
With the input parameters chosen, the process of running the tool is described below
To invoke the test suite, type assess 1024000 [assess <seq. length>].
The first screen appears as follows

Since input is being selected from a file 0 is entered.


Then type the path of the file. For ex., NumBlake256.txt if the file is present under
the current directory.
With the input selected the following screen appears

37

In this example, all the fifteen tests are being run so choice 1 is entered. If only a
subset of tests need to be run then enter 0 and select the masks as needed.
With the tests selected, the following screen appears

This displays the current block sizes for the tests to be run. If no changes need to be
made, as is the case with the example, press 0 to continue.
The next screen requests the number of bit streams

Only 1 bit stream is chosen.


Next screen requests the format of data in the file. The data can either be in bits or as
a binary string in hex format.

Since the file consists of bits, option 0 is chosen.


With all the parameters selected, the test suite proceeds to analyze the sequences.

When the tests finish, the results are generated under the experiments/ subdirectory.

38

Interpretation of Results:
When one run of the test completes, the test suite will generate an output file with test
statistics and P-values for each of the fifteen tests. The results for the above sample test
will be found under /experiments/AlgorithmTesting/ subdirectory. Under this subdirectory,
there will be two text files generated: finalAnalysisReport.txt and freq.txt. The finalAnalysisReport contains a summary of the empirical results and is represented in a tabulated
form. The frequency text file contains the total number of bits and the distribution of 1s
and 0s in those bits.
Each of the fifteen tests has a separate folder and under each folder there are two text
files: results.txt and stat.txt. The stat file contains the statistics of that particular test and
whether the test was successful or not. The results file contains the P-values for that test. It
is these P-values that contribute towards a decision on the randomness of the sequence.

Approx. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complexity
Longest Run
Overlapping Template
Rank
Runs
Serial
Universal
Non-overlapping Temp.
Random Excursions
Rand Excur. Variant
Total Bits
No. of 0s
No. of 1s

0.531403
0.550332
0.324573, 0.201009
0.204233
0.187412
0.867403
0.095483
0.099496
0.077948
0.753526
0.876547, 0.838931
0.861028
0.272553, 0.156433
0.560459, 0.148643
0.612882, 0.582494
1024000
511333
512667

Table 2.14: P-values from the example test run.

The tests, non-overlapping template, random excursions and random excursions variant, generate more than two P-values. Random excursion variant, for example, produces
more than fifteen P-values. Considering the space requirements only two values are shown
in the table.

39

The siginifiance level () is set as 0.01 by default. As displayed in the table, the Pvalues for all the tests resulted in having values > 0.01 and hence have passed the test,
confirming the null hypothesis that the sequence is random.

2.2

Performance Analysis

One of the most important characteristics of a good hash function is performance. It is


one of the factors looked into by NIST during the selection process of the hash functions
submitted. It is expected that the SHA-3 candidate offers improved performance when
compared to SHA-2 at a given security strength.
Each submission was required to include a performance estimate of their candidate on
the platform specified by NIST. The platform was an Intel Core 2 Duo machine with performance measurements required in 32-bit and 64-bit modes. Along with this, the estimates
from 8-bit processers were also required. Apart from that, performance routines for the
hash functions have also been documented in [?] and [?]. The authors of [?] have used the
performance estimates provided in the submission package and carried out a relative study
with respect to SHA-2. The SHA-2 performance forms the benchmark for the comparison
and each of the SHA-3 candidates are given a classification from AA to E based on how
they fare when compared to SHA-2s performance. Each version of SHA-2 falls under the
classification C for 32-bit performance and B for 64-bit performance. In line with that the
candidates having a classification of A and AA are faster than SHA-2 and those having a
classification of D and E are slower than SHA-2. eBASH [?] on the other hand contains a
new initiative to measure the performance of hash functions. The benchmarking is done on
a wide variety of machines mainly composed of machines in the Intel and AMD bracket.
A tool called the SUPERCOP is used for benchmarking all the hash functions and many
consider these results as the unofficial standard.
For the project, the performance of these candidates have been measured with the following base that has not been studied comprehensively before this work
Language - Java
Platform - Sun
Message Size - Small messages
Though performance routines of these candidates have been measured extensively, the
above combination has rarely been used. The authors in [?] have not carried out any software performance measurements on their own but have used the results submitted along
with the initial hash function submission. These measurements have been done on Intel

40

Core 2 Duo machines as requested for the submission. Most of the eBASH benchmarking
is done on Intel/AMD machines and C as the programming language.
With the results obtained, the candidates are analyzed based on their performance and
also a comparative study similar to that done in [?] is made. It is interesting to see how
the hash functions perform when coded in one of the modern programming languages like
Java.

2.3

Theoretical Security Analysis

Hash functions were developed to keep information secure, so it should come as no surprise
that when it comes to evaluating hash functions security is at the top of the list. The real
purpose hash functions were developed was for digital signatures but these days they have
a varied range of applications like password protection, generating pseudorandom numbers
and authentication codes. With such varied uses comes varied requirements and the modern
hash function should cater to all these needs.
The random oracle model captures the desired properties of a hash function. As of now,
no real world algorithms behave like a random oracle in all situations and it is extremely
difficult to design a hash function like that. So, the most that can be expected of the SHA-3
candidates is that they resemble an oracle model as closely as possible. NIST set this as the
security definition to follow. However, those in the cryptographic community criticized this
security definition as they considered it too theoretical to act as a benchmark. Therefore,
NIST later announced a set of better-defined security properties expected from the SHA-3
candidates. These were no hard and fast rules for the candidates but some were provided
so that they could act as guidelines for the submitters.
Each of the fourteen hash function algorithms has a unique architecture. This architecture is composed of certain components that provide the non-linearity and hence the
security that is required of a hash function. It is these components that define the performance and most importantly the security of the hash function, and ultimately influence how
weak or strong a particular hash function is.
Among the fourteen candidates in Round 2, there are number of hash functions that
have been widely studied like CubeHash and Skein. On the other hand, there are also
a few candidates like ECHO and Fugue that havent been studied extensively or in some
cases studied very little. NIST has made an open request that the cryptographic community
study these candidates more in the future.

41

2.3.1

Fugue

Overview
Most of the hash functions today are based on the Merkle-Damgard paradigm. There are
many issues with the MD paradigm and the main reason is that MD is wasteful by design.
After each round of the compression function the entire internal state is thrown away and
only the output is left as the chaining variable for the next round. Due to these existing
problems the designers of Fugue have decided to go with another approach for designing
hash functions, one used in the Grindahl hash function. In this approach a large evolving
internal state is maintained at all times. Even in the compression rounds, a new block of
message is added every time to the large state and it is processed maintaining the state size.
When the processing is completed, the state is passed through a finalization function. The
output from this step is truncated if necessary and forms the output.
Grindahl maintains a large evolving state and also uses AES-like primitives. Fugue
adopts the same principal but with some changes, which helps in keeping the hash function
secure from new attacks like message modification, auxiliary differentials and others. The
main tools used in Fugue are the following:
The 256 variation of Fugue has an internal state of 4x30 matrix.
The column-mix round in AES is modified from operating on a single column to
operate on all of the four columns. This function is called the Super Mix.
The round function is not applied uniformly to the entire state but to select areas
of the state. To parts that do not undergo the transformation, XOR operations are
applied to some while the others undergo no transformation.
Design
The important components in Fugue have been mentioned above. In this section we will
study them in detail. Initially, every 16 bytes of input data are mapped to a 4x4 matrix.
This step is similar to that of the round functions in AES.
Substitution Box
The substitution box (S-box) of Fugue is similar to what is utilized in AES. The
transformation that takes place is a non-linear transformation and it takes place in
two steps. The 8 bits in each byte is treated as a GF(28 ) variable. If the resulting
variable is not a zero then its multiplicative inverse is found else it is left as it is. The
next step involves treating the resulting GF(28 ) variable as a vector and performing
an affine transformation on it over GF(2).

42

0
0

0
1
1
1
1
1
0
0

0
0
1
1
1
1
1
0

0
0
0
1
1
1
1
1

1
0
0
0
1
1
1
1

1
1
0
0
0
1
1
1

1
1
1
0
0
0
1
1


1
x0
0

1 x1 1

1 x2 1


1
x3 + 0

0
x4 0

0
x5 0

0 x6 1
1
x7
1

Super-Mix (SMIX)
SMIX is the linear transformation in the algorithm. The elements in the 4x4 input
matrix are converted to a single column matrix and this matrix is multiplied with a
16x16 matrix N specified in appendix B. This matrix N is basically built from a 4x4
matrix, say M . The matrix M , as in AES, is a circular matrix but values of matrix M
are different from those used in AES. More specifically the matrix M used in Fugue
is shown below

1
M =
7
4

4
1
1
7

7
4
1
1

4
1

To differentiate between the two transformations in AES and Fugue, first lets consider the AES Column Mix Transformation. IIn AES, the 4x4 input matrix U and
circular matrix M are multiplied, V = M.U . When it comes to Fugue there is an
additional condition. When i 6= j, every Uij .Mi is transposed and the row is added to
Vj . In AES and in Fugue every row, based on its position in the resulting matrix, is
shifted to the left.
Note: M i represents ith row of matrix M and Mj represents the j th column of the
matrix M .
The Hash Function F
Central to Fugue is the hash function which the authors like to call 0 F 0 followed by the
version of the hash function. For example, underlying Fugue-256 is the hash function F256 and from this point onwards we will be talking about F-256 in particular. However, the
same principle and steps are followed for F-512 as well with some minor changes.
The 256-bit variation of Fugue takes 4-byte multiple strings as input and produces a
32-byte output. The compression function has an internal state of a 30-column matrix with

43

each element size being four bytes. To get the compression function started an initialization
vector (IV ) of size 32 bytes is used. Broadly speaking, once the input is fed it undergoes
two transformations: a round transformation R and a final round G.
The Round Transformation R
A word is defined as four bytes of data, say I = I 0 I 1 I 2 I 3 . Each round of the transformation function R takes a word as input, combines it with the current state and processes it.
The output is another state of the same size. The transformation consists of the following
sequence of steps
TIX(I); ROR3; CMIX; SMIX; ROR3; CMIX; SMIX;

TIX - Stands for XOR, truncate, insert and XOR. It has the following steps:
S10 + = S0 ;
S0 = I(i.e., S0i = I i f or i = 0, 1, 2, 3);
S8 + = S0 ;
S1 + = S24 .
ROR3 - ROR stands for rotation to the right and 3 stands for rotation by 3 columns.
Meaning, Si = Si3 . Being a 30-column state, all calculations are done modulo 30.
CMIX - Implies column mix and is performed as follows
S0 + = S4 ; S1 + = S5 ; S2 + = S6 ;
S1 5+ = S4 ; S1 6+ = S5 ; S1 7+ = S6 ;
SMIX SMIX works on only the first 4 columns of the 30-column matrix and it is
viewed as a square matrix of size 4x4. Before the matrix undergoes the Super-Mix
linear transformation (described above), each byte undergoes an S-box substitution.
If the matrix is denoted by W and S-Box[W] represents the matrix after each value
in the original matrix is substituted with its S-Box value, then SMIZ stands for
S0...3 = Super M ix(S box[S0...3 ]).
The sequence of steps ROR3; CIMX; SMIX is called a sub-round.
The Final Round G
The final round G takes the output from the round transformation, which is a 30-column
state and this acts as the input. The output is a different state of the same size. It can be
described as follows:
repeat 5 times

44

{
ROR3; CIM X; SM IX
ROR3; CIM X; SM IX
}
repeat 13 times
{
S4 + = S0 ; S15 + = s0 ; ROR15; SM IX;
S4 + = S0 ; S16 + = s0 ; ROR14; SM IX;
}
S4 + = S0 ; S15 + = S0 ;
Initial State
The round transformation is started by setting the last eight columns to the initial value,
IV , as the state. All the other columns are set to zero. The initialization vector is treated
as eight 4-byte words (a word is a four byte entity) IV0 , ..., IV7 , and for j = 0...7 we set
S22+j = IVj .
Hash Output
Final round G produces a 30-column state as output, say words S0 ...S29 . The hash value is
composed of the following eight words
S1 S2 S3 S4 S15 S16 S17 S18
For example, if the final state begins with the five columns below

00

01

02
03

04
05
06
07

08
09
0a
0b

0c
0d
0e
0f

10
11
12
13

...

...

...
...

then the output ignores the first column and the first 16 bytes would be 04 05 06 07 08
09 ... 12 13.
The Hash Function Fugue-256
The transformations on Fugue-256 can only deal with inputs that are a multiple of 4-byte
words. However, the hash function receives data of arbitrary size, at most 264 1 bits,
and pads it if necessary. Padding is done by initially adding the required number of zeros

45

and finally adding the length of the original message as a 8-byte value. Once the input is
padded, it is fed along with the initialization vector (IV ) to the transformation function.
The value for the initialization vector, shown in appendix B, is derived from executing the
Fugue-256 transformation on one word representing the number 256 (00 00 01 00) and the
rest of the words initialized to zero.
Lets take an example to better understand how the hash function works. Consider a
35-bit input
X = 10101001 10111000 11000111 11010110 010
In order to make it a multiple of 4-bytes (32-bits), twenty-nine 0-bits are appended
to resulting in a 64-bit value X 0 . Finally, the original message length 35 (23 in hex) is
appended in the form of a 8-bye integer, forming the final padded value X.
X = a9 b8 c7 d6 40 00 00 00 00 00 00 00 00 00 00 23.
Finally, with the input in necessary format the hash function is applied to it.

2.3.2

ECHO

Overview
ECHO is another one of the hash functions that NIST requested cryptanalysts pay more
attention and do more analysis. The hash function takes a message and salt as input and
produces an output with message length anywhere between 128 to 512 bits. ECHO follows
the Merkle-Damgard paradigm, consisting of the serial application of a compression function. The deficiencies of the MD model are avoided by maintaining a large internal state
during and in between iterations and by adopting features from the HAIFA model.
Compression Function
The hash function has two versions and they differentiated by the size of the chaining
variable, Csize . They are called COMPRESS512 and COMPRESS1024 . Based on the size
of the hash output required we choose between one of them. The compression function,
irrespective of the version, takes the following inputs:
Vi1 , chaining variable of length Csize .
Mi , message block of length Msize = 2048 Csize bits.
Ci , total number of bits that are not padded and are processed after current round.
SALT.

46

COMPRESS512 is used for any hash output of length less than or equal to 256 bits and
COMPRESS1024 for hash output of length between 257 and 512 bits. Outputs that are not of
length 256 or 512 bits are obtained by truncating the original hash output. The compression
function needs a starting value for the chaining variable represented by V0 and at iteration
i, for 128 Hsize 256, we can compute
Vi = COM P RESS512 (Vi1 , Mi , Ci , SALT )

(2.20)

while for 257 Hsize 512 we can compute


Vi = COM P RESS1024 (Vi1 , Mi , Ci , SALT )

(2.21)

Initialization
When the hashing starts the counter C is set to a value zero. It keeps track of the total
number of bits processed.
As mentioned before, the chaining variable needs to be initialized. The value set depends on the size of the hash output required. As a matter of fact, it is a 128-bit representation of the digest size. The following are the values for the four hash sizes required by
NIST:
Hsize = 224 : E0000000 00000000 00000000 00000000
Hsize = 256 : 00010000 00000000 00000000 00000000
Hsize = 384 : 80010000 00000000 00000000 00000000
Hsize = 512 : 00020000 00000000 00000000 00000000
Message Padding
An input message M is always padded. To pad and bring the message to the desired length
the following values are added in the order specified below:
First, to the end of message M the bit 1 is added.
Then, x (can be zero) 0 bits are appended with x = Msize ((L+144)modMsize )
1. L represents the length of the original message.
A 16-bit representation of the hash output size is then appended. The four values are
mentioned above.
Finally, the length of the original message as a 128-bit value is appended.
COMPRESS512 and COMPRESS1024 are almost similar except for certain small changes,
so we will concentrate on one compression function.

47

COMPRESS512
After input padding, the input is broken down to blocks of equal length of size 1536 bits.
Then each block is passed through the compression function along with the chaining variable. The function is composed of 8 rounds of BIG.ROUND. In turn, BIG.ROUND is
consists of three smaller functions:
BIG.SUBWORDS(S, SALT, K)
As the name suggests, this round is a substitution round using S-boxes. Two rounds
of AES are used without any modifications. The function can be described as seen
below with w being the word of size 128 bits and k the subkey.
0
w = AES(w, k)
It consists of one application of SubBytes, ShiftRows, MixColumns and AddRoundKey, in the order specified.
The subkeys generated for the rounds are obtained using the SALT and an internal
counter k. k is initialized with the value of CI . During the compression function,
only the value of the internal counter is updated. Ci only keeps track of the total bits
processed at the completion of every round.
The length of the internal counter is same as that of the counter Ci , which is 64 bits.
The subkeys are generated as follows
K1 = k||0...0 and K2 = SALT.
The BIG.SUBWORDS(S, SALT, K) operation can be described by putting it all together as shown below. To begin the initial word undergoes two AES operations to
give
0
w0 = AES(AES(w0 , k1 ), k2 ), followed by adding 1 to k.
0
w1 = AES(AES(w1 , k1 ), k2 ), followed by adding 1 to k.
.
.
.
After computation the counter k is incremented by 1 modulo 264 . Since the value of
the counter changes after every round the value of the subkey k1 generated from the
counter also changes.
BIG.SHIFTROWS(S)
The state is represented as a 4x4 array and is processed in a very similar fashion to the
one in AES. The process involves entire rows being shifted and it can be described

48

as follows
wi+4j 0 =wi+4 ((i+j)mod4) f or 0 i, j 3.

(2.22)

BIG.MIXCOLUMNS(S)
Again, the operations here and in AES are very similar. In AES, this process involves
MDS-based mixing on the 4-column state. Since, ECHO has a much larger internal
state, it is viewed as 64 columns. Then, the AES mix operations are applied on every
column of the state.
A column in a state is composed of 4 128-bit values, say wi , ..., wi+3 for i 0, 4, 8, 12.
Each value of B below represents 8 bits.
wi = (B16i , B16i+1 , ..., B16i+15 )
wi+1 = (B16i+16 , B16i+17 , ..., B16i+31 )
wi+2 = (B16i+32 , B16i+33 , ..., B16i+47 )
wi+3 = (B16i+48 , B16i+49 , ..., B16i+63 )
From this we can calculate the following

0
02
B16i+j

0
B16i+16+j 01
=
0
B16i+32+j 01
0
03
B16i+48+j

03
02
01
01

01
03
02
01


B16i+j
01


01 B16i+32+j

03 B16i+48+j
B16i+64+j
02

Above is the application of AES mix columns operation with the field arithmetic
described by the polynomial x8 + x4 + x3 + x + 1.
Finalizing Compression
The complete COMPRESS512 operation can be described as the following:
repeat
BIG.SUBWORDS(S,SALT,K)
BIG.SHIFTROWS(S)
BIG.MIXCOLUMNS(S)
eight times
BIG.FINAL
There are eight applications of the operation BIG.FINAL. It incorporates the feedforward of the input values to derive the chaining variable output. If the last state can be

49

viewed as words w0 , ..., w15 then the operation can be explained as follows
vi0
vi1
vi2
vi3

0
= vi1
m0i m4i m8i w0 w4 w8 w12
1
m1i m5i m9i w1 w5 w9 w13
= vi1
2
= vi1
m2i m6i m10
i w2 w6 w10 w14
7
3
3
= vi1 mi mi m11
i w3 w7 w11 w15

Hash Output
The output from the last application of the BIG.FINAL operation is a string of 512 bits.
vt0 ||vt1 ||vt2 ||vt3 .
If the required hash output is of size less than that of 512 then some of the rightmost
bits are truncated. For example, if Hsize = 256 then the rightmost 256 bits are truncated to
give the output as h = vt0 ||vt1 .
In a similar way, the hash outputs for other sizes can also be derived.

2.3.3

Grstl

Overview
Grstl is one of the one of the candidates that has made it to the final round of the hash
competition. Cryptographers from Technical University of Denmark (DTU) and TU Graz
designed it. One of the interesting facts about this hash function is that the name is derived
from an Austrian dish that is very similar to the American dish hash, hence, the relevance
to the hashing competition and the name Grstl.
Apart from the interesting name and it being a final five candidate, what makes Grstl
interesting is that it is different from other common and popular hash functions like the
MD and the SHA family. It does not make use of block ciphers in its compression function;
rather it is based on few individual permutations. The authors have stated that though block
ciphers have been studied extensively there is very little knowledge on what constitutes a
good key schedule. The key schedule becomes really critical when the attacker has total
control of the input. Apart from this and like many other SHA-3 candidates, the hash
function has borrowed many components from AES.
Design
The compression function is based on the Merkle-Damgard construction. In the MD construction, the input is first padded to form a message of fixed length. The padded message

50

is then broken down into blocks of equal length and processed one at a time with the compression function. Each time the block of input is combined with the block of output from
the previous round.
The message is padded so that it becomes a multiple of l, which can take two values
based on the size of the message digest required. For H(x) 256 bits, l = 512 bits and
H(x) > 256 bits, l = 1024 bits. This ensures that l is always greater than 2n (n is the
size of the message digest) and this is done to prevent any common attacks like birthday
attacks.
The message is then broken into blocks and passed through the compression function
which is defined as follows:
hi = f (hi1 , mi ).
(2.23)
To start the iteration the value of h, h0 has to be predefined. The different values of h0
have been mentioned in Appendix B. After all the iterations in the compression function
are finished, the output is sent to the output transformation function denoted by . The
output of the compression function is n bits and being the final round of the algorithm it is
the size of the message digest.
Compression Function
The compression function mainly consists of two permutations P and Q. As with l, there
are two variations of P and Q. P512 and Q512 are defined for l = 512 and P1024 and Q1024
are defined for l = 1024. The function f can be expanded as shown below
f (h, m) = P (h m) Q(m) h

(2.24)

where both h and m are l bits.


Design of P and Q
The design of P and Q was inspired from AES. As with AES, P and Q consist of a number of rounds with each round in itself comprising of a number of round transformations.
However, AES was defined only for 128 bit state sizes but P and Q have to deal with much
larger data so they had to be redesigned.
The four round transformations are the following
AddRoundConstant
SubBytes
ShiftBytes
MixBytes

51

The order in which they are executed is reverse to the order mentioned above. We know
that Grstl deals with bytes and these bytes have to be transformed to a suitable form before
they can be processed with these rounds. So, the 64-byte value represented as 00 01 02 ...
3f is converted to a 8x8 matrix

00

01

02

03

04

05

06
07

08
09
0a
0b
0c
0d
0e
0f

10
11
12
13
14
15
16
17

18
19
1a
1b
1c
1d
1e
1f

20
21
22
23
24
25
26
27

28
29
2a
2b
2c
2d
2e
2f

30
31
32
33
34
35
36
37

38

39

3a

3b

3c

3d

3e
3f

AddRoundConstant
With input now in the desired form, the first step to be executed is AddRoundConstant.
This does nothing but add a constant whose value depends on the current round to the input
matrix and the transformation is as follows
A = A C[i]

(2.25)

where A is the input matrix and C is the round dependent constant matrix. All the values in
the matrix are zero except for a single byte, which depends on the round and is shown below

00

00

00
CP [i] =
00

00

00
00

00
00
00
00
00
00
00
00

...
...
...
...
...
...
...
...

00
00

00
00

00
00

00
and CQ [i] = 00
00

00

00
00

00
00
00
i ff

00
00
00
00
00
00
00
00

...
...
...
...
...
...
...
...

00

00

00

00

00

00

00
00

For the permutation P , the non-zero value is in the first row and first column and it is equal
to the value of round i. For Q, it is the last row and first column and the value i is XORed
with constant f f .
There are two important points to be noted here. The first is that the matrix 8x8 shown
above is for P512 and Q512 versions. For the 1024 version, the matrix is 8x16. The other
point is that, the difference in the non-zero value is the only difference between the two

For the permutation P, the non-zero value is in the first row and first column and it is
equal to the value of round i. For Q, it is the last row and first column and the value I is
XORed with constant ff.
52
There are two important points to be noted here. The first is that the matrix 8x8 shown
above is for P512 and Q512 versions. For the 1024 version, the matrix is 8x16. The other
point is that, this is the only difference between the two permutations P and Q. All other
permutations
P and
Q. All
functionalities remain the same.
functionalities
remain
theother
same.

ShiftBytes
ShiftBytes
all the
elements
in column
the column
are replaced
a corresponding
from
In In
thisthis
stepstep
all the
elements
in the
are replaced
with with
a corresponding
valuevalue
from the
the S-box. The S-box is same as the one used in Rijndael. The transformation is as
S-box. The AES S-box is used. The transformation is as follows with a being the value at
follows
row i and column j
aai,j
= S(a
< 08,"0j
(2.26)
i,j !
S(ai,ji,j),),00"i <i 8,
< vj < v
For example, if the input is D4 then it is first ANDed with f0 giving D0. Then it is
a is the element at row i and column j.
ANDed with 0f and it gives 04. The value at which the D0 row and 04 column meet is the
value
which
is input,
substituted
and ANDed
from thewith
AESf0S-box
we can
determine
the value
48and
in it
If D4
is the
it is first
that gives
D0.
Then it isthat
ANDed
withis0f
thisgives
case.04.
So,The
thevalue
value at
ofwhich
D4 is replaced
with
the04value
48 inmeet
the matrix.
The same
the D0 row
and
column
is the value
whichprocess
is
substituted
andall
inthe
thisvalues
case itinisthe
48.initial
So, the
value of D4 is replaced with the value 48 in
is carried
out for
matrix.
the matrix. The same process is carried out for all the values in the initial matrix.
SubBytes
SubBytes
The third step in the transformation is the SubBytes where all the values in a particular row
The third step in the transformation is the SubBytes where all the values in a
areparticular
shifted torow
the are
left shifted
by a particular
predefined
value. predefined
The matrixvalue.
that defines
the constant
to the left
by a particular
The matrix
that
values
is
defined
by
.
defines the constant values is defined by #.
512 = [0, 1, 2, 3, 4, 5, 6, 7]
#512
==
[0,[0,
1,1,2,2,3,3,4,4,5,5,6,6,7]
#1024 = [0, 1, 2, 3, 4, 5, 6, 11]
11]
1024

|8
In the diagram above, the bytes in the first row are shifted by 0 positions and the bytes
in the second row are shifted by 1 position to the left and so on.
Mix Bytes
This is the final step in the round and in this each column is transformed independently.
The bytes of the matrix are expressed in the form of an irreducible polynomial, X8 X4
X3 X 1. The least significant bit of the byte forms the co-efficient of X0 and the most
significant bit forms the co-efficient of X8 .
After the values have been mapped, a constant 8x8 matrix is multiplied with every
column in the matrix and the transformation is defined as

The bytes of the matrix are expressed in the form of an irreducible polynomial, X8 ! X4
! X3 ! X ! 1. The least significant bit of the byte forms the co-efficient of X0 and the
most significant bit forms the co-efficient of X8.
53

After the values have been mapped, each column in the matrix is multiplied with a
constant 8x8 matrix and the transformation is defined as
A!B"A
A=BA

(2.27)

The matrixThe
B ismatrix
the constant
and matrix
it is circular.
means
that the
second
B is thematrix
constant
and it Circular
is circular.
Circular
means
thatrow
the second
is one shift right of the first row, the third row is one shift right of the second row and so
row is one shift right of the first row, the third row is one shift right of the second row and
on. The first row values of the matrix B are shown in the diagram.
so on. The first row values of the matrix B are shown in the diagram.

Output Transformation Construction


Output
Transformation
The output
of the
compression function x forms the input of the output
transformation function which is defined as follows
The output of the compression function x forms the input of the output transformation
function which is #(x)
defined
as follows
= trunc
(P(x) ! x)
n

(x) = truncn (P (x) x)


(2.28)
The n in the diagram is the size of the message digest.
truncn (x) discards all but the
trailing n bits of x and this forms the output of the entire Grstl function.
The n in the diagram is the size of the message digest. truncn (x) leaves only the
rightmost
n bits of x and this forms the output of the entire Grstl function.
Number
of rounds
The authors have recommended a certain number of rounds for the permutation based on
the size of the message digest required and these values are shown in the table below

|9

54

Chapter 3
Analysis
3.1

Statistical Analysis

For statistical testing, the tool chosen initially was ENT. However, as described in section
2.2.3, it turned out not to be a great choice as it is a very simple tool with just five tests
and it is also unable to distinguish between minor changes in values. The NIST Statistical
Test Suite was a better choice as it had a comprehensive set of tests and also computed
the test statistic (P-values) helpful in proving or disproving the null hypothesis. The null
hypothesis being that a hash function produces a random hash output irrespective of what
the input is. The tests were carried out with four different inputs. For each hash candidate,
these four inputs were hashed and the hash output was analyzed using the NIST tool. The
four different inputs were obtained as follows.
Numbers - The first input consisted of hashing numbers starting from 0 to 3999.
Since some of the tests require a sequence length of at least 106 bits, it was calculated
that for the output size to be greater than 106 bits at least 4000 numbers had to be
hashed. For ex., hashing 0 gives a 256 bit or 512 bit output. Meaning, 256 x 4000 =
1,024,000 bits. So consecutive numbers starting from 0 to 3999 were hashed, their
outputs were concatenated and written to a file to form the first input.
KAT Inputs - In each of the official candidate documentation there is a set of 2048
hexadecimal inputs that are given in order to verify the functionality of the code.
These hex inputs were hashed, their outputs concatenated and written to a file. These
inputs were used as they seem to be well known in the cryptographic community.
From file (every 10Kb) - In this case, the input blocks were obtained from a file. The
file used was the official NIST document for the Statistical Test Suite [?]. The size
of this file is around 6.5Mb. Each input block has n = 10Kb. The first input is the
first 10Kb, the second input skips the first m = 1Kb and takes the next n = 10Kb.
The third input skips 2mKb and so on. Each input block is then hashed, appended to
the previous hash output and written to a file.

55

From file (every 100Kb) - Similar to the previous input but in this case we take
n = 100Kb and we skip every m = 100 bytes. This means that the first input is
the first 100Kb, the second input skips the first m = 100 bytes and takes the next
n = 100Kb. The input in these cases is chosen in such a way so that there is some
over-lapping in the data and also some non-overlapping. In the current case, there
will be more overlapping as only 100 bytes are neglected. The process the input
blocks go through to obtain the final output is similar to the previous case.
The code written to compute these hashes is provided in appendix C. Since the smallest
unit in Java is a byte and also because the hash functions provided by sphlib [?] take a
byte array as input for the hash computation, the different input formats (integers in case
of numbers, hex in case of KAT inputs etc.) were converted to a byte array. The results
obtained were also in byte format. These bytes were converted to bits before they were
written to the file as a binary string. It is in this format that the input is fed to the NIST
statistical tool. Every input to the NIST tool is in the form of bits, i.e., a sequence of 0s
and 1s.
The other input parameters (discussed in the previous section) need to be set to run
the tool. The input sequence was set to the total number of bits present in each file. The
0
T otalBits0 row in the tables shown below represents the sequence length set for each input. The number of bit streams was set to 1. The following tables show the P-values for
each of the inputs and are categorized based on the hash function. Some of the tests like
non-overlapping test generate more than two P-values that are shown in the table but considering the space requirements only two values are shown.
The last three rows in the tables indicate the total number of bits in the file and the
distribution of 0s and 1s in the those bits. The results are obtained from the freq.txt file.
This simple test is probably the most important test in the suite and if this test fails then
there is no point in carrying out the other tests. In a perfect random function, the ratio of
0s to 1s in the file has to be the same. In other words, the number of 1s (or 0s) should be
about half the number of total bits in a file. In all the 40 results obtained, there are no big
deviations in the distribution of 1s and 0s to indicate any non-random behavior.
Coming to the other results, there are up to 60 P-values calculated for each hash function. Some of the tests and their corresponding P-values have fallen below the 0.01 or 0.001
significance level, , like the cumulative sum for Grstl-256 with 10Kb file input. However, considering the overall result these become less significant and can be categorized as
outliers. There simply arent enough failed tests or enough P-values below 0.01 or 0.001
to categorize the hash function as non-random. Based on the results obtained, all the hash
functions in either of the two variants, 256 and 512, are random.

56

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp.
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.531403
0.550332
0.324573, 0.201009
0.204233
0.187412
0.867403
0.095483
0.099496
0.077948
0.753526
0.876547, 0.838931
0.861028
0.272553, 0.156433
0.560459, 0.148643
0.612882, 0.582494
1024000
511333
512667

KAT Inputs
0.132928
0.999349
0.988702, 0.943249
0.655976
0.765466
0.312439
0.382246
0.718846
0.162680
0.978062
0.252703, 0.520978
0.057151
0.748985, 0.001491
0.997930, 0.945050
0.163078, 0.205123
524288
262036
262252

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp.
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.365077
0.105159
0.000432, 0.001383
0.255107
0.000966
0.551978
0.697027
0.180799
0.946797
0.863215
0.625307, 0.854685
0.382927
0.013372, 0.593525
0.000000, 0.000
0.000000, 0.000
1677056
840665
836391

100Kb Inputs
0.476437
0.634999
0.129711, 0.221312
0.617123
0.127740
0.693519
0.936944
0.214866
0.843130
0.048920
0.988346, 0.986553
0.833105
0.376109, 0.329376
0.381784, 0.935452
0.219435, 0.393705
16936192
8464962
8471230

Table 3.1: P-values for BLAKE-256.

57

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.515296
0.835213
0.007317, 0.024581
0.772921
0.014457
0.470496
0.520203
0.729607
0.110234
0.519393
0.103883, 0.177322
0.891749
0.604758, 0.205241
0.000000, 0.000000
0.000000, 0.000000
2048000
1022250
1025750

KAT Inputs
0.732793
0.972337
0.997606, 0.999442
0.160504
0.916004
0.508970
0.517883
0.018138
0.273873
0.771031
0.973604, 0.723422
0.157230
0.011224, 0.256405
0.393777, 0.037591
0.083564, 0.073177
1048576
524234
524342

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.741010
0.337351
0.194303, 0.330217
0.294533
0.360694
0.941661
0.762814
0.325537
0.861378
0.366703
0.940209, 0.448019
0.270848
0.415333, 0.421968
0.180242, 0.650077
0.726111, 0.819205
3354112
1677893
1676219

100Kb Inputs
0.613062
0.817322
0.680462, 0.727874
0.503187
0.959712
0.216147
0.765893
0.004400
0.708287
0.944385
0.145848, 0.428061
0.403062
0.834705, 0.946844
0.346430, 0.206817
0.097023, 0.137331
33872384
16936045
16936339

Table 3.2: P-values for BLAKE-512.

58

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.000000
0.607410
0.061970, 0.139550
0.336423
0.089555
0.459433
0.281872
0.364500
0.653952
0.612105
0.340566, 0.445552
0.500352
0.025038, 0.389087
0.121197, 0.323005
0.581922, 0.516856
1024000
512859
511141

KAT Inputs
0.902424
0.771330
0.544834, 0.862596
0.182655
0.642620
0.169959
0.294630
0.648863
0.588357
0.516465
0.171929, 0.244215
0.255437
0.162687, 0.277945
0.194306, 0.237082
0.104283, 0.089435
524288
262312
261976

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.114307
0.793780
0.186100, 0.043826
0.806317
0.115193
0.997292
0.005352
0.673425
0.008650
0.577970
0.760024, 0.567917
0.153408
0.273675, 0.017486
0.823015, 0.526915
0.731558, 0.831729
1677056
839548
837508

100Kb Inputs
0.585805
0.311399
0.552145, 0.246244
0.498984
0.281724
0.405977
0.310346
0.995357
0.869860
0.088083
0.021053, 0.005627
0.073878
0.283347, 0.715961
0.014780, 0.211473
0.363900, 0.316665
16936192
8470311
8465881

Table 3.3: P-values for Grstl-256.

59

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.809044
0.860336
0.927044, 0.700048
0.714735
0.789523
0.755744
0.745555
0.316661
0.178015
0.998925
0.769917, 0.391783
0.671094
0.949443, 0.761595
0.827312, 0.926649
0.970345, 0.965562
2048000
1024191
1023809

KAT Inputs
0.378732
0.593022
0.344275, 0.722019
0.556614
0.642043
0.375607
0.255356
0.568919
0.945688
0.985807
0.722030, 0.703363
0.093836
0.155646, 0.265522
0.391332, 0.830677
0.882686, 0.906210
1048576
524050
524526

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.397207
0.655565
0.962614, 0.817745
0.844283
0.682160
0.481904
0.073481
0.140392
0.773375
0.799923
0.347699, 0.166863
0.206326
0.490371, 0.292017
0.728787, 0.876474
0.623877, 0.775601
3354112
1676681
1677431

100Kb Inputs
0.189365
0.919318
0.349985, 0.365956
0.234742
0.262450
0.898111
0.135982
0.231059
0.351537
0.062122
0.262039, 0.422393
0.767386
0.824742, 0.242845
0.660208, 0.349309
0.582362, 0.985701
33872384
16939453
16932931

Table 3.4: P-values for Grstl-512.

60

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.133227
0.353875
0.533296, 0.970217
0.696578
0.592228
0.441044
0.286764
0.084172
0.944470
0.642115
0.578575, 0.342775
0.222565
0.578066, 0.672288
0.000000, 0.000000
0.000000, 0.000000
1024000
512271
511729

KAT Inputs
0.334743
0.810537
0.698603, 0.940108
0.677171
0.771798
0.217102
0.912564
0.141390
0.901392
0.450742
0.115251, 0.056550
0.082687
0.988037, 0.125891
0.000000, 0.000000
0.000000, 0.000000
524288
262249
262039

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.697418
0.285979
0.090838, 0.251783
0.895141
0.224201
0.857476
0.749475
0.340796
0.485696
0.056732
0.446947, 0.227868
0.889890
0.619259, 0.133347
0.164306, 0.816652
0.652193, 0.760105
1677056
839315
837741

100Kb Inputs
0.085054
0.593754
0.082814, 0.052841
0.769861
0.127376
0.101835
0.536185
0.236505
0.805645
0.984428
0.476576, 0.214390
0.970547
0.086713, 0.462511
0.966055, 0.234042
0.208658, 0.187941
16936192
8471233
8464959

Table 3.5: P-values for JH-256.

61

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.272178
0.630425
0.626083, 0.734182
0.386672
0.907656
0.654735
0.580373
0.988408
0.280102
0.833948
0.278196, 0.469091
0.669160
0.345100, 0.858230
0.387523, 0.708049
0.248590, 0.261115
2048000
1024083
1023917

KAT Inputs
0.634913
0.348473
0.251407, 0.409611
0.771545
0.792033
0.840722
0.050951
0.154539
0.148698
0.196045
0.576637, 0.870482
0.083475
0.811576, 0.203011
0.336768, 0.965301
0.770650, 0.910145
1048576
524153
524423

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.703221
0.550133
0.075187, 0.284938
0.407806
0.142777
0.333867
0.276858
0.505037
0.418844
0.280091
0.611288, 0.637523
0.375929
0.976195, 0.508864
0.000000, 0.000000
0.000000, 0.000000
3354112
1675714
1678398

100Kb Inputs
0.603310
0.636771
0.630008, 0.949623
0.758010
0.703129
0.275888
0.263241
0.907840
0.558893
0.392927
0.282664, 0.127601
0.687501
0.126974, 0.811156
0.561506, 0.001082
0.902613, 0.862113
33872384
16935083
16937301

Table 3.6: P-values for JH-512.

62

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.608314
0.860717
0.960553, 0.917797
0.074021
0.940132
0.481772
0.155896
0.316476
0.172085
0.405370
0.675784, 0.564788
0.981722
0.371193, 0.865667
0.340793, 0.994795
0.093939, 0.063052
1024000
511962
512038

KAT Inputs
0.126450
0.551654
0.273797, 0.496581
0.801373
0.738215
0.891648
0.986450
0.258356
0.132806
0.795022
0.784533, 0.637085
0.419960
0.988026, 0.357031
0.924502, 0.722291
0.247583, 0.192466
524288
262023
262265

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.257946
0.738236
0.554300, 0.816830
0.045993
0.455704
0.303311
0.273283
0.256325
0.126583
0.653441
0.134204, 0.060668
0.005296
0.937471, 0.794026
0.000000, 0.000000
0.000000, 0.000000
1677056
839011
838045

100Kb Inputs
0.667913
0.865577
0.564558, 0.325818
0.816266
0.367839
0.618002
0.181952
0.949869
0.801426
0.174761
0.552831, 0.747801
0.967773
0.208176, 0.380679
0.892466, 0.737737
0.391986, 0.365599
16936192
8466243
8469949

Table 3.7: P-values for Keccak-256.

63

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.551033
0.523038
0.638890, 0.556425
0.529734
0.411216
0.899223
0.961663
0.441640
0.552244
0.891436
0.583562, 0.572475
0.502347
0.608468, 0.428246
0.401251, 0.902533
0.581321, 0.442108
2048000
1023412
1024588

KAT Inputs
0.219662
0.480014
0.621202, 0.603527
0.975693
0.984417
0.903551
0.394820
0.405849
0.749262
0.744295
0.366762, 0.484418
0.308217
0.617063, 0.708614
0.955329, 0.649854
0.548264, 0.547118
1048576
524298
524278

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.753048
0.183050
0.859128, 0.454915
0.404974
0.647263
0.689938
0.964572
0.704253
0.078716
0.709689
0.881262, 0.966809
0.450477
0.370839, 0.250427
0.594314, 0.412416
0.494973, 0.453991
3354112
1676637
1677475

100Kb Inputs
0.859796
0.676615
0.143775, 0.161247
0.057561
0.095239
0.285786
0.869153
0.404861
0.462529
0.316047
0.224569, 0.051195
0.658268
0.581159, 0.656636
0.398785, 0.898589
0.533600, 0.545967
33872384
16931337
16941047

Table 3.8: P-values for Keccak-512.

64

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.702849
0.285814
0.801912, 0.443351
0.136955
0.512985
0.307576
0.841608
0.901018
0.944513
0.366630
0.039158, 0.146043
0.054983
0.144961, 0.674816
0.170573, 0.790013
0.511000, 0.750440
1024000
511669
512331

KAT Inputs
0.515636
0.583631
0.860320, 0.583292
0.665158
0.558163
0.791893
0.796315
0.985926
0.651178
0.574670
0.819160, 0.947398
0.233798
0.790900, 0.691908
0.575754, 0.558847
0.551764, 0.804986
524288
261932
262356

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.841105
0.676158
0.641280, 0.765392
0.730556
0.894338
0.486535
0.885194
0.086319
0.265871
0.161753
0.046122, 0.398758
0.082936
0.167726, 0.858838
0.000000, 0.000000
0.000000, 0.000000
1677056
838614
838442

100Kb Inputs
0.532866
0.050728
0.253894, 0.392174
0.858068
0.815173
0.317608
0.293454
0.179649
0.210747
0.820827
0.393217, 0.325803
0.626271
0.041179, 0.324888
0.245920, 0.363176
0.669454, 0.744401
16936192
8468577
8467615

Table 3.9: P-values for Skein-256.

65

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

Numbers
0.547688
0.460054
0.071985, 0.244694
0.281356
0.125940
0.942337
0.347026
0.192887
0.094501
0.945594
0.138071, 0.159005
0.086292
0.470345, 0.649987
0.000000, 0.000000
0.000000, 0.000000
2048000
1022905
1025095

KAT Inputs
0.144157
0.880225
0.285905, 0.225946
0.276625
0.180932
0.028749
0.294517
0.178866
0.952731
0.823962
0.577880, 0.286753
0.926317
0.587348, 0.917516
0.579648, 0.066502
0.389206, 0.265923
1048576
523603
524973

Tests
App. Entropy
Block Freq.
Cum Sums
FFT
Freq.
Linear Complex
Longest Run
Overlap. Temp
Rank
Runs
Serial
Universal
Non-overlapping
Rand Excur.
RE Variant
Total Bits
No. of 0s
No. of 1s

10Kb Inputs
0.238073
0.554294
0.557392, 0.626440
0.856065
0.678958
0.057951
0.954462
0.700186
0.267496
0.966974
0.258276, 0.758090
0.885233
0.637835, 0.872021
0.092348, 0.794479
0.299321, 0.355857
3354112
1676677
1677435

100Kb Inputs
0.708568
0.416355
0.339627, 0.279806
0.897625
0.220543
0.500068
0.936727
0.492899
0.921687
0.642645
0.933964, 0.784878
0.066699
0.051694, 0.448591
0.565203, 0.243285
0.281860, 0.216138
33872384
16932627
16939757

Table 3.10: P-values for Skein-512.

66

For some of the inputs, the random excursions and the random excursions variant tests
have produced 0.0 as the P-values. These random excursion tests, as described previously,
calculate the number of cycles. If there is insufficient number of cycles then the test becomes inapplicable and P-value is computed as 0.0. This has been the case for all the
candidate hash functions where the results have produced 0.0 as the P-value. Its not that
the tests have failed but the test was just inapplicable for that particular input.
Another point worth noting when it comes to the random excursion tests is that none of
the hash function variants produced P-values of 0.0 for the file containing 100Kb chunks.
These files had input sizes of more than 16,000,000 bits for the 256 variant and more than
33,000,000 bits for the 512 variant. Such large file sizes are essential for the tests to find
cycles in the input and be successful.

3.2

Performance

Performance is one of the important criteria when it comes to selecting a good hash function. Although the performance of the fourteen candidates has been benchmarked before
in eBASH [?] and [?], for this project we measure the performance of these hash functions
when run on a Sun Solaris platform with Java as the programming language.
The machine used is a Sun Microsystems Ultra 20 with an AMD64 2.2 GHz processor. The version of the operating system is OS5.10 or Solaris 10. The java code of all
the fourteen candidates was obtained from [?]. The Sphlib package contains the software
implementation of the SHA-3 candidates in C and Java. The authors are part of a research
project sponsored by the French government but Thomas Pornin wrote all the actual code.
The Java code used was tested with the sample KAT inputs provided in the candidate documentation and it conformed to their specification by producing correct outputs.
The package has a Speed.java program, which was modified a little to suit the project
necessities. This class was used to measure the performance of the candidates with the
input size set to 1024 bytes. The results are measured in Mbytes per second but the most
common performance indicator for cryptographic applications is cycles/byte. One can convert Mbytes/s to cycles/byte if the clock frequency of the processor is known.
Cycles
N o. of P rocessors P rocessor Clock F requency
=
Byte
T hroughput in bytes/second

(3.1)

With the clock frequency for the Sun machine used known to be 2.2GHz, we can calculate cycles/byte for 57.90 Mbytes/s as

67

Cycles
2200
=
= 38.
Byte
57.90

(3.2)

For the project, only small messages were considered. Any message with length of at
least 8192 bytes is considered a long message. Long messages is another problem depending often on other features of hash or architecture available. For the experiments all the
messages will be less than 8192 bytes. The following table shows the results (average of
two runs of the test) obtained for all the 14 candidates with input size = 1024 bytes.
Candidates
SHA-2
BLAKE
Grstl
JH
Keccak
Skein
Hamsi
BMW
CubeHash
ECHO
Fugue
Luffa
Shabal
SHAvite
SIMD

256 output bits


Mbytes/s Cycles/byte
58.05
37.90
45.505
48.35
11.565
190.23
8.37
262.84
12.665
173.71
38.305
57.43
18.445
119.27
42.85
51.34
23.85
92.24
10.88
202.21
22.745
96.72
33.73
65.22
104.33
21.09
24.225
90.82
12.215
180.11

512 output bits


Mbytes/s Cycles/Byte
19.925
110.414
26.045
84.47
6.87
320.23
8.325
264.26
6.91
318.38
30.165
72.93
7.115
309.21
37.125
59.26
23.675
92.93
5.785
380.29
11.705
187.95
19.04
115.55
103.765 21.20
14.08
156.25
0.755
2913.91

Table 3.11: Performance results.

68

It is easier to compare the performance with SHA-2 when viewed as a graph. The documentation of most of the hash functions claim that they are faster than SHA-2 but as seen
above it may not always is the case. With the 256 versions, only Shabal performs better.
For 512, CubeHash, BMW, Skein and BLAKE seem to perform better than SHA-2. The
other three final five candidates are slower than SHA-3 irrespective of whether it is the 256
or 512 version. To be clear, these are results for this particular combination of software
implementation of the hash functions in Java for small messages on a Sun platform. It may
not necessarily reflect on the performance of these candidates on different environments.

69

Remembering the claim made by most of the candidates that they are faster than SHA2, it would be interesting to see how the message length affects the performance. Will
the performance of the candidates compared to SHA-2 improve with increasing message
sizes? Will this justify their claim? To verify this theory, two message sizes of 16 bytes and
4096 bytes were hashed with the final five candidates and SHA-2 and the results are shown
below. The table shows the average values obtained from two separate runs in Mbytes/s.
Candidates
SHA-2
BLAKE
Grstl
JH
Keccak
Skein

16-256
11.91
10.942
2.74
1.75
1.526
9.183

4096-256
61.67
47.57
12.37
8.79
13.43
38.79

16-512
2.39
3.466
0.67
1.65
1.557
3.775

4096-512
22.055
30.165
7.765
8.67
7.285
31.79

Table 3.12: Performance of final five candidates.

The performance improves with increased message sizes but for most candidates the rate
at which their performance improves is not better than that of SHA-2. The rate of hashing
of Keccak-256 is higher than SHA-256 and it is the same for Grstl-512 when compared
to SHA-512. For other cases, the rate of hashing remains below SHA-2. Again, this is the
case for smaller messages.

70

Another interesting point to note from the results above is that for certain candidates,
the performance for 256 and 512 variations remain the same. In the final five candidates, JH
is one such case. On the other hand, for candidates like BLAKE and Keccak the difference
between the 256 and 512 variations is almost twice. In BLAKE, the two main variations,
256 and 512, deal with different block sizes. For the 224 and 256 versions, BLAKE takes as
input 32-bit words and produces a 32-byte (256 bits) message digest. For the 384 and 512
versions, BLAKE takes 64-bit words as input and produces a 64-byte (512 bits) message
digest. All the other values like chaining values, message blocks, salt and counter are
doubled compared to the 256 variation. This goes some way in explaining the reason
behind the performance differences being almost double. JH, on the other hand, irrespective
of whether it is the 256 or 512 variation has only one large internal state of 1024 bits
and always deals with 512-bit input blocks. The variations of the output are obtained by
truncating the 1024-bit final hash value. For JH-256, the rightmost 256 bits in sequence
form the message digest and for JH-512, the last 512 bits become the hash output. With
the entire process including the compression function operation being the same except for
the number of bits truncated, the performance of JH remains almost the same irrespective
of 256 or 512 variation.
The SHA-3 candidates have software as well as hardware implementations. Lets compare the results obtained above with the hardware implementation results. In the paper,
Visualizing area-time tradeoffs for SHA-3 [?] the performances of the SHA-3 candidates
for their hardware implementations have been studied and compared. The author comments
on the graph that was presented in an FPGA-benchmarking talk at the second SHA-3 conference and talks about the different ways this graph can be viewed with each leading to
different results. The initial graph that was presented was plotted with throughput on the
Y axis and area on the X axis. With respect to this project, T hroughput is of interest
to us and in communications, it is defined as the mean rate at which data is transmitted successfully over a network. The general scale of measurement is bits per second (bps). In this
particular case, throughput is measured in gigabits per second. The first graph, however,
has the issue that both the y axis and the x axis have been scaled separately. For this
reason, we will be considering the second graph (shown below) which has been plotted on
a square axes (both axes are scaled equally).

71

throughput
4

best

Grstl

Luffa

!
Keccak
!
2

JH

!
!CubeHash

ShabalSkein

Hamsi

Fugue
1

SHAvite-3

BLAKE

worst

area

Thegraph
dataonly
hasnt
changedbut
the list ofand
candidates
closestwe
to will
best
in the diagram has
The
shows
11 SHA-3 candidates
for this reason
be considering
changed
dramatically:
instead
of (1) Luffa,
Grstl, (3)
Keccak,
JH, and (5)can
CubeHash
only
the performance
of these
candidates.
The(2)
candidates
with
the best(4)
performance
have (1)
(2) CubeHash,
(3) JH, and then a tough fight between Shabal, Skein,
bewearranged
in Keccak,
the following
order
Fugue, Luffa, and BLAKE.
Problem 2: How was the data scaled? The square choice of axes in the second
Keccak
diagram
might seem fair and balanced and uncontroversial. Suppose, however, that a clever
implementor figures out how to make SHA-512 run 1.3 faster within the same number of
CubeHash
FPGA
slices. Throughput in the diagram is relative to SHA-3, so it is divided by 1.3:
JH
throughput
3
Shabal

best

Skein
!

2
Fugue

Luffa

BLAKE
Hamsi !
!

Grstl

Luffa

!Keccak

!
! CubeHash

JH

ShabalSkein
Fugue
2

SHAvite-3

BLAKE
Hamsi
3

worst
4

area

SHAvite-3
In this diagram, CubeHash is as close to best as Keccak is, and Luffa is clearly behind

Grstl
Shabal,
Skein, Fugue, and BLAKE. This makes no sense: how can a speedup in SHA-512
change the list of best SHA-3 candiates?
Problem 3: Cost, or inverse cost? All of these diagrams have a further asymmetry in
the choice of axes: area is a cost, while throughput is an inverted cost. Lets eliminate the

72

The 512 variation of the candidates were used for the above comparison and we will consider the same 512 variation for the software implementation as well. The throughput of
our software implementation is in megabytes per second but a comparison can still be made
because the change in megabytes to gigabits is a constant (1 megabytes = 0.0078125 gigabits). Ranking the above-mentioned candidates based on their software performances we
obtain the following order

Shabal
Skein
BLAKE
CubeHash
Luffa
SHAvite-3
Fugue
JH
Hamsi
Keccak
Grstl
Considering the final five candidates, Grstl remains the last in both the implementations
while Keccak has the biggest difference in terms of position. JH and BLAKE seem to
switch positions (even in the overall list) with BLAKE performing better in software while
JH performs better in hardware. Skein seems to be only candidate that performs reasonably
well in both hardware and software.

3.3
3.3.1

Security
Cryptanalysis of Grstl

The following cryptanalysis information is obtained from the documentation and is provided by the authors.

73

Differential Cryptanalysis
The wide trail design in Grstl ensures that there is diffusion in both permutations, P and Q.
In the compression function, the MixBytes round is based on a property called maximum
distance (MD) separation. This guarantees that the branch number for both linear and
differential property is 9. In any four round differential trail there are a minimum of 92 = 81
s-boxes. Similarly, for higher number of rounds say eight and twelve the number of active
s-boxes become 162 and 243, respectively. The s-box propagation probability of MD is
26 , we get the propagation probability as 2681 = 2486 .
No. of active s-boxes Rounds Propagation probability
81
4
2486
162
8
2972
243
12
21458
Table 3.13: Complexity of best-known attacks.

The table contains the propagation constant for respective rounds, which makes the
probability of any differential trial greater than 2972 and 21458 . Therefore by specifying
the differential trail for each P and Q (classical differential attack) this strategy has very
little chance for a successful attack.
In Grindahl-256, the information of the state is irrelevant since the message block overwrites it. Now the probabilistic behavior of the hash function reveals MixColumns/MixBytes
transformation. Two factors played a part in leading to collisions, one is the slow diffusion
and other is the state being constantly affected. The same approach in Grstl may only
lead to an attack well above the complexity of the birthday attack because of the following
reasons
High diffusion.
The state cannot be controlled constantly by the adversary.
Only in the output transformation some of the state is discarded, it remains unchanged otherwise.
Linear Cryptanalysis
The branch number of linear and differential trail is the same, 9. Hence the number of
active s-boxes for each round holds the same but the s-box has a maximum correlation of
23 . Here we deal with linear trails and for four rounds the correlation is 2381 = 2243 .

74

Similarly, the complete linear trail is 2486 and 2729 for eight-round and twelve-round
trails, respectively.
Integrals
Grstl is based on AES and the well-known attack on AES is called Integrals [?]. Grstl
was tested for it to find patterns, even though it is not yet shown on how to utilize Integrals
on hash functions. There were various multi-sets taken with the chosen rounds and the
author concludes that integrals cannot be used on Grstl since the test does not reveal any
non-random behavior of Grstl.
Algebraic Cryptanalysis
For AES, one can establish 40 quadratic equations for single s-box application and it is
the same for Grstl too. For a single encryption there are 200 s-box applications that lead
to a massive 8000 quadratic equations with 1600 variables. Theoretically, the key can be
obtained from these equations but even the difficulty of this problem is unknown. The
authors claim that to the best of their knowledge even the brute-force attack is faster than
the algebraic attack. If an algebraic attack was successfully established against Grstl then
it is reasonable to say that the same attack would break AES as well.
Generic Collision attacks
The table below shows that the complexity is well above the bounds of some of the bestknown attacks

Attack type
Claimed complexity Best-known attack
Collision
2n/2
2n/2
d-collision
lg(d) 2n/2
(d!)1/d 2n(d1)/d
Preimage
2n
2n
Second preimage
2nk
2n
Table 3.14: Complexity of best-known attacks.

Observation by Barreto
Barreto sees the Grstl function as Davies-Meyer compression over Even-Monsour Block
Cipher [?]. He also sheds light on the daemon attack on Even-Monsour one round block

Observation by Barreto [7]


Barreto Sees the Grstl function as a Davies-Meyer Compression over Even-Monsour
Block Cipher. He also sheds light on Daemon attack on Even-Monsour one round
block cipher.
75
l/2
l/2
The Complexity: To retrieve l- bit in O(2 ) steps using O(2 ) storage units.
n

Barreto infers that Wide pipe is essential to push the effort to compute preimage to O(2 ).

cipher.
The complexity: To retrieve l bits in O(2l/2 ) steps using O(2l/2 ) storage units.

| 11
Barreto infers that the wide pipe construction is essential to push the effort to compute
pre-image to O(2n ). Other important inferences from the paper are given below
Grstl does not reveal any weaknesses directly.
Illustrates the importance of double wide pipe construction.
Does not lead to an effective attack.

3.3.2

Rebound Attacks on Reduced Grstl

The paper [?] presents an extension of a rebound attack that has been used to identify
collisions on reduced versions of Grstl. Collisions have been identified for four rounds
(out of ten) and five rounds (out of twelve) of the 256 and 512 variations, respectively.
The ideology behind a rebound attack is composed of breaking the attack into two steps,
the inbound phase and the outbound phase. The inbound phase is basically an implementation of the meet-in-the-middle attack. In a differential path, it utilizes the available degrees
of freedom to ensure that the the costliest path remains unchanged. The outbound phase
makes use of the results obtained in the inbound phase. It starts from the middle and works
in both the directions to derive an attack on the hash.

76

We know that the meet-in-the-middle attack operates by computing values from both
the directions; it works on both the input and the output. In Grstl, the rebound attack is
applied by using only one of the directions in the meet-in-the-middle phase on both the
permutations. We compute the values from the input at each step and then the differences
between between the values from P and Q are identified and matched. For the attack, the
only variable parameter is the input itself. The chaining parameter is defined as a constant.
The authors have provided three methods to extend this rebound attack on Grstl. One of
them is to use a 64-bit S-box rather than the typical 8 bit S-box, which is used in matching
the variations between the two permutations. Another improvement that can be made in
the inbound phase is using the same input parameters for both the P and Q. The final
improvement is identifying new differential trails for executing the rebound attack. This
suggestion holds good only for the 512 variation of the hash function. By applying these
extensions, the attack has now been increased by one more round for the 256 variation. For
the 512 variation, this paper has provided the only ever rebound attack results on the hash
functions compression function.

3.3.3

Internal Differential Attack on Grstl

In the paper [?], a new method called the internal differential attack is explained. It describes that the differential trails between parallel computations can be exploited and they
should be seriously considered while designing a hash function.
The new technique, internal differential attack, may apply when a function is built
upon parallel computation branches that are not distinct enough. The trick is to devise a
differential path representing the differences between the branches and not between two
inputs of the function. Usually this is avoided by forcing a strong separation between the
two parallel branches. For example, for all steps of the hash functions RIPEMD-160, very
distinct constants are used in the left and right branches. However, in the case of Grstl,
this separation is thin between permutations P and Q. The only difference between the
permutations is the constant addition phase. Even in that step, the distinction is really thin:
a different constant is added on only two different bytes. Hence, this property is exploited to
mount a distinguishing attack (an attack where we can extract information from encrypted
data that is distinguishable from random data) against the Grstl compression function.
Using this technique, the paper concludes that a colliding pair of messages for the
Grstl-256 hash function limited to 5 rounds can be found with 279 computations and 264
memory. In the case of Grstl-512, a colliding pair of messages for the hash function
reduced to 6 rounds can be found with 2177 computations and 264 memory. The authors are
also able to derive a distinguisher with a time complexity of 2192 and 264 memory for the full
(10 rounds) 256-bit version of the Grstl compression function for internal permutations.

77

This work also shows that the designers must be careful when building a function with
parallel branches computations as the internal differential paths may lead to unexpected
attacks.

78

Chapter 4
Conclusions
4.1

Frontrunners among the final five

It is difficult to predict a winner especially with not having a thorough knowledge of all the
final five candidates. However, based on the results from the two major categories, security
and performance, and using the process of elimination it might be possible to narrow down
the results by making some educated guesses.
When it comes to performance, the hands down favorites are BLAKE and Skein. They
are much faster compared to the other candidates. Quite a way back is Keccak, followed
by Grstl and JH. Skein and Keccak have a totally different architecture to the other candidates. Skein has some novel components like Threefish while Keccak has a sponge construction. The results from the randomness tests indicate that all the hash functions are
random. If at all there are some doubts then it has to be on BLAKE as it has the most
number of P-values that have failed or are close to failing.
Based on all this, it comes down to two candidates: Skein and Keccak.

4.2

Future work

As far as the statistical analysis goes this is just the beginning. Even though the tests have
not produced any ground breaking results there are some interesting results where more
work can be carried out. The areas where certain test cases have failed can be explored in
further detail to try and verify if the test fails for more similar inputs. If yes, the internal
architecture can be studied and modified to see what impact they have on these tests.

79

Appendix A
Matrices
A.1

Fugue

A.1.1

SuperMix - Matrix N

N=

A.1.2

1
0
0
0
0
0
0
4
0
0
7
0
0
1
0
0

4
1
0
0
0
1
0
7
0
7
1
0
0
5
0
0

7
0
1
0
0
0
1
1
0
0
6
0
0
4
4
0

1
0
0
1
0
0
0
0
0
0
4
7
0
7
0
4

1
1
0
0
0
0
0
0
7
0
0
4
4
0
7
0

0
1
0
0
4
0
0
0
0
0
0
7
0
0
1
0

0
4
1
0
7
0
1
0
0
0
7
1
0
0
5
0

0
7
0
1
1
0
0
1
0
0
0
6
0
0
4
4

1
0
7
0
1
1
0
0
6
0
0
0
4
0
0
4

0
1
1
0
0
0
0
0
4
7
0
0
0
4
0
7

0
0
1
0
0
4
0
0
7
0
0
0
0
0
0
1

0
0
4
1
0
7
0
1
1
0
0
7
0
0
0
5

1
0
0
4
1
0
7
0
7
1
0
0
5
0
0
0

Initialization Value - 256

e9

52
IV256 =
bd
de

66
71
13
5f

e0
d4
f6
68

d2
b0
b5
94

f9
6c
62
1d

fb
f9
29
de

91
49
e8
99

34

f 8

c2
48

0
1
0
7
0
1
1
0
0
6
0
0
4
4
0
0

0
0
1
1
0
0
0
0
0
4
7
0
7
0
4
0

0
0
0
1
0
0
4
0
0
7
0
0
1
0
0
0

80

A.2

Grstl

A.2.1

Number of rounds

Permutations
Digest sizes Recommended value of r
P512 and Q512
8256
10
P1024 and Q1024 264512
14

A.2.2
n
224
256
384
512

Initial Values
ivn
00 ...
00 ...
00 ...
00 ...

00 00 e0
00 01 00
00 01 80
00 02 00

81

Appendix B
Code Listing
B.1

Code for computing hashes for Statistical Testing

B.1.1

StatisticalHash.java

A brief description of all the important functionalities in the code is given below
getBytesFromFile - Reads the data from a file and converts it to a byte array.
hexStringToByteArray - Takes a hexadecimal string as an input and converts it to a
byte array.
toHex - The input is a byte array and this array is converted to a hexadecimal string.
byteToBits - Uses the BitSet class in java to convert each byte to a stream of bits.
Returns the bits as an int array.
byteArrayToString - Uses the ByteT oBits method to convert the byte array to an int
array which is then converted to a string.
writeToFile - Takes two strings as input, one being the file path with the file name
and the other being the content to be written to the file.
hashKAT - This method is used for hashing the KAT inputs obtained from the documentation. Reads all the inputs from the file, converts each to a byte array, hashes
and then appends the output to a string that is written to a file.
hashFile - The 10Kb and 100Kb files are obtained from this functionality. Reads
the data from the file, goes through a loop that hashes certain blocks of data while
leaving the others (information on which blocks are used is mentioned in section 3.1)
and finally writes the hashed data to a file.

82

hashNumbers - As the name suggests, the numbers 0-3999 hashed input is obtained
from this functionality. Similar to the above functionalities, the hashed output is
written to a file using the W riteT oF ile method.
filetoByteFile - Reads data from a file, converts to a byte array and writes it to file.
Helpful when dealing with large files, improves performance.
main - Loops through all the hash candidates calling the four important hash methods
that produce the four different outputs that serve as input to the NIST tool.