Aes

CHAPTER 1
INTRODUCTION
1.1 EVOLUTION OF ENCRYPTION:
Encryption is an ancient art. Julius Caesar protected his written messages with a
simple code; he just shifted his letters 3 spaces to the left. (In his honor, this substitution
method is still called the Caesar Cipher.) Later, the 9th century Arab scholar Al-Kindi
produced a pioneering text entitled, A Manuscript on Deciphering Cryptographic
Messages.
Today, with the staggering amount of sensitive information that we digitally store
and transmit, encryption plays a vital role in protecting private information. The
elementary Caesar Cipher may have served its purpose in a relatively illiterate age, but
the intricate security matrix in which we live requires more advanced tools.
In 1975,IBM submits a proposal to develop a secure standard for businesses like

banks to communicate electronically. The Standard, called the Data Encryption
Standard (DES) uses what was even then considered a weak form of 56 bit encryption.
Despite being broken on numerous occassions it is still widely used today
In 1976,Whitfield Diffie and Martin Hellman publish their asymmetric key system
to the public. An asymetric key is different because it allows two users to communicate
securely without having access to a shared secret key. For example, the public key acts
as a key to lock a lock, and the private key can only unlock it. The two keys are related
mathematically, and breaking one should not affect the other. This directly led to
modern day encryption methods.
In 1990 the Electronic Frontier Foundation (EFF) revealed that the DES Standard
from 1975 was unsecure by cracking the 56 bit encryption. The Foundation then raised
awareness about the government's involvement in limiting standards to DES levels.
In 1991, The first case of widespread regulation by the government on encryption

methods was the publishing of the Pretty Good Privacy (PGP) Program by Phil
Zimmerman in 1991. The program was initially intended so that people could use
secure BBS systems and store files. The program's source code was openly distributed
and no charges were applied for its use. Zimmerman became the target of a criminal
investigation when the program began to be distributed internationally. At the time,
using systems with over 40 Bits of encryption were considered illegal, and PGP used
just under 128. Charges were eventually dropped due to public response, but the
regulations are still in place
1
In 1996, after the investigations into the PGP program by Zimmermann were
completed, the government relaxed its laws in 1996 regarding local encryption ciphers.
However, international exportation laws still apply.
In 2001, the Advanced Encryption System (AES) replaces the DES as the
Encryption standard, using from 128 to 256 bit level encryption. However, most DES
systems are not upgraded.
1.2 ADVANCED ENCRYPTION STANDARD-RIJNDAEL CIPHER
AES stands for advanced encryption standard. AES is symmetric key encryption
algorithm which replaces the commonly used data encryption standard (DES).AES
provides strong encryption and was selected by NIST as a federal information
processing standard in November 2001 (FIPS-197). The AES algorithm uses three key
sizes:128-, 192-, or 256- bit encryption key. Each encryption key size causes the
algorithm to behave slightly different, so the increasing key sizes not only offer a large
number of bits with which you can scramble the data, but also increases the complexity
of cipher algorithm. AES was developed by two belgain cryptologists, Vincent Rijmen
and Joan Daemen.
BLOCK DIAGRAM:
Encryption/
Plaintext /cipher text output
128 bit decryption 128-bit(plain text)
Round 128 bit key input
Clk2 Key Schedule

Generation
Secret key
128,192,256
[1]
FIGURE 1.1 BLOCK DIAGRAM OF AES ALGORITHM
AES is an algorithm for performing encryption (and reverse, decryption).the

series of well-defined steps that can be followed as a procedure. The original
information is known as plain text,and the encrypted form as cipher text.
The AES algorithm is a symmetric block cipher that can encrypt (encipher) and
decrypt(decipher) information. Encryption converts data to an unintelligible form called
cipher text; decrypting the cipher text converts the data back to its original form, called
plain text. AES can be used to protect electronic data.
2
The cipher text message contains all the information of the plaintext message,
but is not in a format readable by a human or computer without the proper mechanism
to decrypt it.
The encrypting procedure is varied depending on the key which changes the
detailed operation of the algorithm. Without the key, the cipher cannot be used to
encrypt or decrypt.
1.3 ENCRYPTION:
Encryption is the transformation of plain text into cipher text through a

mathematical process
plain text cipher key
ENCRYPTION
cipher text
FIGURE 1.2 BLOCK DIAGRAM OF ENCRYPTION PROCESS
1.4 DECRYPTION:
Decryption is a process to convert cipher text back into plain text
Cipher text plain text
DECRYPTION
Cipher key
FIGURE 1.3 BLOCK DIAGRAM OF DECRYPTION PROCESS
1.5 APPLICATION OF CRYPOTGRAPHY:
 Cryptography helped ensure secrecy in important communications, such as

those of government convert operations.
 This is helpful in wireless security like military communication and mobile
telephony where there is a grayer emphasis on the speed of communication
(military leaders and diplomats).
 Cryptography has come to be in widespread use by many civilians who do not
have extraordinary needs for secrecy.
 Although typically it is transparently built into the infrastructure for computing and
telecommunications.
3
CHAPTER 2
THEORY
2.1 BACKGROUND OF THE ALGORITHM:
Many algorithms were originally presented by researchers from twelve different

Nations. Fifteen (15) algorithms were selected from the first set of submittals. After a
study and selection process five, (5) were chosen as finalists. The five algorithms
selected were MARS, RC6, RIJNDAEL, SERPENT and TWOFISH. The conclusion was
that the five Competitors showed similar characteristics. On October 2nd 2000, NIST
announced that the Rijndael Algorithm was the winner of the contest. The Rijndael
Algorithm was chosen since it had the best overall scores in security, performance,
efficiency, implementation ability and flexibility, [NIS00b]. The Rijndael algorithm was
developed by Joan Daemen of Proton World International and Vincent Fijmen of
Katholieke University at Leuven.
The Rijndael algorithm is a symmetric block cipher that can process data blocks
of 128 bits through the use of cipher keys with lengths of 128, 192, and 256 bits. The
Rijndael algorithm was also designed to handle additional block sizes and key lengths.
However, the additional features were not adopted in the AES. The hardware
implementation of the Rijndael algorithm can provide either high performance or low
cost for specific applications. At backbone communication channels or heavily loaded
servers it is not possible to lose processing speed, which drops the efficiency of the
overall system while running cryptography algorithms in software. On the other side, a
low cost and small design can be used in smart card applications, which allows a wide
range of equipment to operate securely.
2.1.1 AES OVERVIEW:
1976-2000: The Data Encryption Standard (DES) is considered the standard for
block ciphers by NIST.
1997-2001: With des becoming outdated NIST announces competition to

design a successor.
2001:Rijndeal, designed by Joan Daemen and Vincent Rijmen , is selected by

NIST as a AES
4
2.1.2 PROPERTIES OF AES:
Based in finite mathematic, widely analysed and considered secure,
Used for US government top secret data,
Supports 128, 196, 256 bit keys,
Unpatented, Expected to be the standard for 20+years
2.2 TYPES OF CYPHERS:
There are two classes of algorithm in encryption, an asymmetric key and

symmetric key. The following sub sections describe the both classes and a brief
discussion of algorithm is added as well.
2.2.1 ALGORITHM:
Algorithm is a process for completing a task. An encryption algorithm is a

mathematical process (mathematical formula) to encrypt and decrypt messages it
typically has two elements: data (for example , plain text or email message that you
want to encrypt or decrypt) and a key.
2.2.2 SYMMETRIC ENCRYPTION:
Symmetric encryption uses a secret key value to encrypt and decrypt

data. Both the sender and receiver need the same key to encrypt or decrypt. There are
two types of symmetric algorithms: stream algorithms and block algorithms. The stream
algorithm works on one bit or byte at a time, whereas the block algorithm works on
larger blocks of data (typically 64 bits ).the drawback of to this type of system is that if
the key is discovered, all the messages can be decrypted. Symmetric key is the key that
is used to for encrypting and decrypting a file or a message.
Examples of symmetric encryption are DES\3DES, AES, IDEA, RC6 and Blowfish.
5
2.2.3 SYMMETRIC KEY OR PRIVATE KEY:
In a symmetric or private key algorithm, in the ordinary case, the

communication only uses only one key. A user A sends the secret private key K c to a B
user before the start of the communication between them. Both sides use the same
private key to encrypt and decrypt the exchanged information. Data encryption standard
(DES) and CAST128 are example of symmetric algorithm.
The symmetric algorithm is much faster than a asymmetric key algorithm,

which needs a bigger key and complex computation. To encrypt a large amount of data,
symmetric key algorithm is used with one secrete key. the public key algorithm then
used to encrypt and transmit the symmetric key. At the recipient, the symmetric key is
decrypted. After that all communication is made using a symmetrical algorithm. There
are two classes of private key cryptography scheme which are commonly distinguished
as block ciphers and stream ciphers.
Private key kc Private key kc
Cipher text
Plain text ENCRYPTION DECRYPTION plaintext
Unsecured channel
User A User B
Figure 2.1 Private Key Cryptography
Private key is a secret key of public-private key cryptography system (it is used in
asymmetric cryptography). The private key is normally known only to the key owner.
Messages are encrypted using a public key and can be decrypted by the owner of the
corresponding private key. For the digital signatures, however, a document is signed
with a private key and authenticated with the corresponding public key. Private Key
should not be distributed.
2.2.4 ASYMMETRIC ENCRYPTION:
Asymmetric encryption (asymmetric cipher) uses a separate key for encryption and
decryption. The decryption key is very hard or even impossible to derive from the
encryption key . the encryption key is public so that anyone can encrypt a message .
however , the decryption key is private , so that only the receiver is able to decrypt the
message. it is common to set up a pair of keys within a network so that each user has
public key and a private key .The public key is made available to everyone so that they
6
can send messages, but the private key is only made available to the person it belongs
to.Asymmetric cipher that uses different (not trivially related) keys for encryption and
decryption. Asymmetric cipher that uses different (not trivially related) keys for
encryption and decryption.
Examples of asymmetric encryption are RSA, ELGAMAL.
2.2.5 ASYMMETRIC KEY FOR PUBLIC KEY:
In asymmetric key algorithm, there are two keys. One must be public and it is
used to encrypt the data. The other key is a private one and it is used to decrypt the
information. In communication between A and B,A uses the public key ke of B to
encrypt the message ,in a way that only B(neither A).can decrypt this message using
his private key Kd. The system is also used to sign a message digitally. Rivest – Shamir
- Adleman (RSA) is widely used asymmetric key algorithm for decrease elliptic curve
cryptography (ECC) as an alternative to RSA which offers highest security its small bit
length of key.
Private key ke Private key kd
Cipher text
Plain text ENCRYPTION DECRYPTION plaintext
User A User B
Unsecure channel
Figure:2.2 Public Key Cryptography.
A public key is the public key of a public-private key cryptography system. Public
key is used in asymmetric cryptography. Public keys are used to enable someone to
encrypt messages intended for the owner of the public key. Public keys are meant for
distribution, so anyone who wants to send an encrypted message to the owner of the
public key can do so, but the owner of the public key can do so, but only the owner of
the corresponding private key can decrypt the message. Cryptography based on
methods involving a public key and a private key.
2.2.6 CIPHER TEXT:
This is the encrypted message produced by applying the algorithm to the

plaintext message using the secret key.
7
2.2.7 BLOCK CIPHER:
Block chipper is a type of the symmetric-key encryption algorithm that transforms

a fixed-length block of plaintext data into block of cipher text data of the same length.
This transformation takes places under the action of a user-provided secret key.
Decryption is performed by applying the reverse transformation to the cipher text block
using the same secret key. The fixed length is called the block size, and for many block
ciphers, the block size is 64 and the block size increase to 128, 192 or 256 bits as
processors become more sophisticated. Below figure illustrates the block cipher
transformation. The cipher like DES, triple-DES and blowfish are example of block
cipher.
Plaintext ciphertext
key
Block cipher key
Block cipher
encryption decryption
Ciphertext plaintext
FIGURE 2.3 BLOCK CIPHER
Since different plaintext blocks are mapped to different cipher text blocks (to
allow unique decryption), a block cipher effectively provides a permutation of the set of
all possible messages. The permutation effected during any particular encryption is a
secret, since it is a function of the secret key
8
2.3 THE AES CIPHER:
Block length is limited to 128 bit

The key size can be independently specified to 128,19 or 256 bits
Key 4/16/192 6/24/192 8/32/256

size(word/byte/bits)
Number of rounds 10 12 14
Expanded key 44/176 52/208 60/240
size(words/byte)
TABLE 2.1: REPRESENTED OF KEY SIZE, NUMBER OF ROUNDS, EXAPANDED KEY SIZE
2.4 NOTATION AND CONVENTIONS:
2.4.1. INPUTS AND OUTPUTS:
The input and output for the AES algorithm consists of sequences of 128 bits.
These sequences are referred to as blocks and the numbers of bits they contain are
referred to as their length. The Cipher Key for the AES algorithm is a sequence of
128,192 or 256 bits. Other input, output and Cipher Key lengths are not permitted by
this standard. The bits within such sequences are numbered starting at zero and ending
at one less than the sequence length, which is also termed the block length or key
length. The number ―i‖ attached to a bit is known as its index and will be in one of the
ranges 0 ≤ i<128, 0 ≤ i< 192 or 0 ≤ i< 256 depending on the block length or key length
specified.
2.4.2. BYTES:
The basic unit of processing in the AES algorithm is a byte, which is a sequence
of eight bits treated as a single entity. The input, output and Cipher Key bit sequences
described in Section 1.1 are processed as arrays of bytes that are formed by dividing
these sequences into groups of eight contiguous bits to form arrays of bytes. For an
input, output or Cipher Key denoted by a, the bytes in the resulting array are referenced
using one of the two forms, an or a[n], where n will be in a range that depends on the
key length. For a key length of 128 bits, n lies in the range 0 ≤ n < 16. For a key length
of 192 bits, n lies in the range 0 ≤ n < 24. For a key length of 256 bits, n lies in the range
0≤ n < 32.
All byte values in the AES algorithm are presented as the concatenation of the
individual bit values, (0 or 1), between braces in the order{b7, b6, b5, b4, b3, b2, b1,
b0}.These bytes are interpreted as finite field elements using a polynomial
representation
b7 x7+ b6 x6+ b5 x5+ b4 x4+ b3 x3+ b2 x2+ b1 x1+ b0x0=Σbi xi

9
For example, {01100011} identifies the specific finite field element x6 + x5 + x +1. It is
also convenient to denote byte values using hexadecimal notation with each of two
groups of four bits being denoted by a single hexadecimal character. The hexadecimal
notation scheme is depicted in Figure.1.
[1]
TABLE 2.2 HEXADECIMAL REPRESENTATION OF BIT PATTERNS
Hence the element {01100011} can be represented as {63}, where the character
denoting the four-bit group containing the higher numbered bits is again to the left.
Some finite field operations involve one additional bit {b8} to the left of an 8-bit byte.
When the b8 bit is present, it appears as {01} immediately preceding the 8-bit byte. For
example, a 9-bit sequence is presented as {01} {1b}.
2.4.3. ARRAYS OF BYTES:
Arrays of bytes are represented in the form a0a1a2···a15. The bytes and the bit
ordering within bytes are derived from the 128-bit input sequence, input0input1input2
···input126input127 as a0 = {input0, input1, ···, input7} , a1 = {input8, input9, ···, input15}
with the pattern continuing up to a15 = {input120, input121, ···, input127}.
The pattern can be extended to longer sequences associated with 192 and 256
bit keys. In general,
an = {input8n, input8n+1, ···, input8n+7}.
An example of byte designation and numbering within bytes for a given input sequence
is presented in Figure 2.
[1]
Figure 2.4: Indices for Bytes and Bits
2.4.4. THE STATE:
Internally, the AES algorithm‘s operations are performed on a two dimensional

array of bytes called the State. The State consists of four rows of bytes. Each row of a
state contains Nb numbers of bytes, where Nb is the block length divided by 32. In the
State array, which is denoted by the symbol S, each individual byte has two indices.
The first byte index is the row number r, which lies in the range 0 ≤r ≤ 3 and the second
byte index is the column number c, which lies in the range 0 ≤ c ≤ Nb−1. Such indexing
10
allows an individual byte of the State to be referred to as S r,c or S[r,c]. For the AES Nb =
4, which means that 0 ≤c ≤ 3. At the beginning of the Encryption and Decryption the
input, which is the array of bytes symbolized by in0in1···in15 is copied into the State
array. This activity is illustrated in Figure 3. The Encryption or Decryption operations are
conducted on the State array. After manipulation of the state array has completed its
final value is copied to the output, which is an array of bytes symbolized by
out0out1···out15.
Input state array output byte
[1]
FIGURE 2.5: STATE ARRAY INPUT AND OUTPUT
At the start of the Encryption or Decryption the input array is copied to the State array
with
S[r, c] = in[r + 4c]
where 0 ≤r ≤3 and 0 ≤c ≤ Nb−1 At the end of the Encryption and Decryption the State is
copied to the output array with
out[r + 4c] = S[r,c]
where 0 ≤ r ≤ 3 and 0 ≤ c ≤ Nb−1.Input Bytes State Array Output Bytes
2.4.5. THE STATE AS AN ARRAY OF COLUMNS:
The four bytes in each column of the State form 32-bit words, where the row
number ―r‖ provides an index for the four bytes within each word. Therefore, the state
can be interpreted as a one-dimensional array of 32 bit words, which is symbolized by
w0...w3. The column number c provides an index into this linear State array. Considering
the State depicted in Figure3, the State can be considered as an array of four words
where
w0 = S0,0 S1,0 S2,0 S3,0,

w1 = S0,1 S1,1 S2,1 S3,1,
w2 = S0,2 S1,2 S2,2 S3,2 and
w3 =S 0,3 S1,3 S2,3 S3,3.
11
2.4.6 MATHEMATICAL BACKGROUND:
Every byte in the AES algorithm is interpreted as a finite field element using the
notation introduced in Section.1.1.2. All Finite field elements can be added and
multiplied. However, these operations differ from those used for numbers and their use
requires investigation.
2.4.6(A) ADDITION:
The addition of two elements in a finite field is achieved by ―adding‖ the

coefficients for the corresponding powers in the polynomials for the two elements. The
addition is performed through use of the XOR operation, which is denoted by the
operator symbol ⊕. Such addition is performed modulo-2. In modulo-2 addition
1 ⊕ 1 = 0,
1 ⊕ 0 = 1,
0⊕1=1
and
0 ⊕ 0 =0.
Consequently, subtraction of polynomials is identical to addition of polynomials.

Alternatively, addition of finite field elements can be described as the modulo-2 addition
of corresponding bits in the byte. For two bytes {a7a6a5a4a3a2a1a0}
and{b7b6b5b4b3b2b1b0}, the sum is {c7c6c5c4c3c2c1c0}, where each ci = ai ⊕ bi where i
represents corresponding bits. For example, the following expressions are equivalent to
one another.
(x6 + x4 + x2 + x +1) + (x7 + x +1) = x7 + x6 + x4 + x2 (Polynomial notation)
{01010111}⊕ {10000011} = {11010100} (Binary notation)
{57}⊕{83} = {d4} (Hexadecimal notation)
2.4.6(B) MULTIPLICATION:
In the polynomial representation, multiplication in Galois Field GF (2 8) (denoted

by•) corresponds with the multiplication of polynomials modulo an irreducible polynomial
of degree 8. A polynomial is irreducible if its only divisors are one and itself. For the
AES algorithm, this irreducible polynomial is given by the below equation
m(x) = x8 + x4 + x3 + x +1
12
For example, {57}•{83} = {c1} because
(x6 + x4 + x2 + x +1)(x7 + x +1) = x13 + x11 + x9 + x8 + x7 + x7 + x5 + x3 +x2 + x + x6 + x4 +

x2 + x + 1
= x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 +1
x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 +1 Modulo (x8 + x4 + x3 + x +1)

= x7 + x6 +1.
The modular reduction by m(x) ensures that the result will be a binary polynomial
of degree less than 8, which can be represented by a byte. Unlike addition, there is no
simple operation at the byte level that corresponds to this multiplication. The
multiplication defined above is associative and the element {01} is the multiplicative
identity. For any non-zero binary polynomial b(x) of degree less than 8, the multiplicative
inverse of b(x), denoted b-1(x), can be found. The inverse is found through use of the
extended Euclidean algorithm to compute polynomials a(x) and c(x) such that
b(x)a(x) + m(x)c(x) = 1.
Hence, a(x) • b(x) mod m(x) = 1, which means
b−1 (x) = a(x)modm(x)
Moreover, for any a(x), b(x) and c(x) in the field, it holds that
a(x) • (b(x) + c(x)) = a(x) • b(x) + a(x) • c(x)
It follows that the set of 256 possible byte values, with XOR used as addition and
multiplication defined as above, has the structure of the finite field GF (28).
2.4.6(C) MULTIPLICATION BY X:
Multiplying the binary polynomial defined in equation (1) with the polynomial x
results in
b7 x8+ b6 x7+ b5 x6+ b4 x5+ b3 x4+ b2 x3+ b1 x2+ b0x1
The result x • b(x) is obtained by reducing the above result modulo m(x). If b7
equals zero the result is already in reduced form. If b7 equals one the reduction is
accomplished by subtracting the polynomial m(x). It follows that multiplication by x,
which is represented by {00000010} or {02}, can be implemented at the byte level as a
left shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes is
denoted by xtime(). Multiplication by higher powers of x can be implemented by
repeated application of xtime(). Through the addition of intermediate results,
multiplication by any constant can be implemented.
13
For example, {57} • {13} = {fe} because
{57} • {02} = xtime ({57}) = {ae}
{57} • {04} = xtime ({ae}) = {47}
{57} • {08} = xtime ({47}) = {8e}
{57} • {10} = xtime ({8e}) = {07},
Thus,
{57} • {13} = {57} • ({01} • {02} • {10})
= {57} • {ae} • {07}
= {fe}.
2.5 POLYNOMIALS WITH COEFFICIENTS IN GF (28):
Four-term polynomials can be defined with coefficients that are finite field
elements as the following equation (7)
a(x) = a3 x3 + a2 x2+ a1 x1 + a
which will be denoted as a word in the form [a0 , a1 , a2 , a3 ]. Note that the
polynomials in this section behave somewhat differently than the polynomials used in
the definition of finite field elements, even though both types of polynomials use the
same indeterminate, x. The coefficients in this section are themselves finite field
elements, i.e., bytes, instead of bits; also, the multiplication of four-term polynomials
uses a different reduction polynomial, defined below. To illustrate the addition and
multiplication operations, let
b(x) = b3 x3+ b2 x2 + b1 x1 + b
define a second four-term polynomial. Addition is performed by adding the finite field
coefficients of like powers of x. This addition corresponds to an XOR operation between
the corresponding bytes in each of the words – in other words, the XOR of the complete
word values Thus, using the equations of (7) and (8),
a( x) + b( x) = (a3⊕b3) x3 + (a2⊕b2)x2 + (a1 ⊕b1) x1 + (a0 ⊕b0)x0
Multiplication is achieved in two steps. In the first step, the polynomial product
c(x) = a(x) • b(x) is algebraically expanded, and like powers are collected to give
c(x) = c6 x6 + c 5x5 + c4 x4 + c3 x3 + c2 x2 + c1 x1 + c0x0

where
c0= a0. b0
c1= a1.b0 ⊕ a0. b1
c2=a2 .b0 ⊕ a1. b1 ⊕ a0. b2
c3= a3. b0 ⊕a2. b1 ⊕ a1. b2 ⊕ a0. b3
14
c4=a3 .b1⊕ a2. b2 ⊕ a0. b3
c5= a3. b2 ⊕ a2.b3
c6= a3. b3
The result, c(x), does not represent a four-byte word. Therefore, the second step
of the multiplication is to reduce c(x) modulo a polynomial of degree 4; the result can be
reduced to a polynomial of degree less than 4. For the AES algorithm, this is
accomplished with the polynomial x4 + 1, so that
xi mod(x4 +1) = ximod 4 .
The modular product of a(x) and b(x), denoted by a(x) • b(x), is given by the four-
term polynomial d(x), defined as follows
d(x) = d3 x3 + d2 x2 + d1 x1 + d0
with
d0= (a0 .b0) ⊕ (a3. b1) ⊕ (a2 .b2) ⊕ (a1. b3)
d1 =(a1. b0) ⊕ (a0. b1) ⊕ (a3. b2) ⊕(a2. b3)
d2= (a2 .b0) ⊕ (a1. b1) ⊕ (a0 .b2)⊕ (a3. b3)
d3= (a3. b0) ⊕ (a2 .b1) ⊕ (a1. b2)⊕( a0. b3)
When a(x) is a fixed polynomial, the operation defined in equation can be written in
matrix form as the following equation below.
d0 a0 a3 a2 a1 b0
d1 = a1 a0 a3 a2 b1
d2 a2 a1 a0 a2 b2
d3 a3 a2 a1 a0 b3
Because x4 + 1 is not an irreducible polynomial over GF(28), multiplication by a fixed

four-term polynomial is not necessarily invertible. However, the AES algorithm specifies
a fixed four-term polynomial that does have an inverse is given by
a(x) = {03}x3 +{01}x2 +{01}x +{02
a−1 (x) = {0b}x3 +{0d}x2 +{09}x +{0e}
Another polynomial used in the AES algorithm has a0 = a1 = a2 = {00} and a3

={01}, which is the polynomial x3. Inspection of equation (13) above will show that its
effect is to form the output word by rotating bytes in the input word. This means that
[b0,b1,b2, b3] is transformed into [b1, b2, b3, b0].
15
2.6. ENCRYPTION PROCESS:
This block diagram is generic for aes specifications.it consists of a number of

different transformations applied consecutively over the data block bits, in a fixed
number of iterations, called rounds. The number of rounds depends on the length of the
key used for the encryption process.
A 128 bit input and output block of AES is mapped to an AES state by putting
thefirst byte of the block in the upper left corner of the matrix and by filling in the
remaining bytes column by column. A round consists of a fixed sequence of
transformations. Except for the first round and the last round,
SubBytes ShiftRows Mix Columns Add Round Key
Plain text
Roundkey 1st Round

AddRoundKey
SubBytes
Shift Rows
Roundkey
MixColumns Repeat
Nr-1 Round
AddRoundKey
SubBytes
ShiftRows Last
Roundkey Round
AddRoundKey
FIGURE 2.6:BLOCK DIAGRAM OF ENCRYPTION
The other rounds are identical and consist of four transformations. The four
transformations are invertable, hence the round itself is invertible.
16
Data block
SUBBYTES
SHIFTROWS
MIXCOLUMNS
ADDROUNDKEY
Key
Data block
FIGURE 2.7:STRUCTURE OF THE ONE ROUND
2.6.1. BYTES SUBSTITUTION TRANSFORMATION:
The bytes substitution transformation subbytes (state) is a non-linear substitution

of bytes that operates independently on each byte of the State using a substitution table
(Sbox) presented in figure7. This S-box which is invertible, is constructed by composing
two transformations
1. Take the multiplicative inverse in the finite field GF (2 8), described in Section
1.3.2. The element {00} is mapped to itself.
2. Apply the following affine transformation (over GF (2))
b′=bi ⊕ b (i+4)mod8⊕ b(i+5)mod8 ⊕ b(i+6)mod8 ⊕ b(i+7)mod8 ⊕ ci
for 0≤ i ≤ 8 , where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the value
{63} or {01100011}. Here and elsewhere, a prime on a variable (e.g., b′ ) indicates that
the variable is to be updated with the value on the right. In matrix form, the affine
transformation element of the S-box can be expressed as
17
b0′ 1 0 0 0 1 1 1 1 b0 1
b1′ 1 1 0 0 0 1 1 1 b1 1
b2′ 1 1 1 0 0 0 1 1 b2 0
b3′ = 1 1 1 1 0 0 0 1 b3 0
b4′ 1 1 1 1 1 0 0 0 b4 0
b5′ 0 1 1 1 1 1 0 0 b5 0
b6′ 0 0 1 1 1 1 1 0 b6 1
b7′ 0 0 0 1 1 1 1 1 b7 0
FIGURE 2.8 MATRIX NOTATION OF S-BOX
[1]
FIGURE 2.9. APPLICATION OF S-BOX TO THE EACH BYTE OF THE STATE
The S-box used in the Sub Bytes transformation is presented in hexadecimal

form in figure 7. For example, if =S1,1= {53}, then the substitution value would be
determined by the intersection of the row with index ‗5‘ and the column with index ‗3‘ in
figure 7. This would result in S'1,1having a value of {ed}.
18
[1]
FIGURE 2.10. S-BOX VALUES FOR ALL 256 COMBINATIONS IN HEXADECIMAL FORMAT
2.6.2. SHIFT ROWS TRANSFORMATION:
In the Shift Rows transformation ShiftRows( ), the bytes in the last three rows of
the State are cyclically shifted over different numbers of bytes (offsets). The first row, r
=0, is not shifted. Specifically, the ShiftRows( ) transformation proceeds as follows
S`r ,c = Sr,(c shift(r,Nb))modNb for 0< r < 4 and 0≤ c≤Nb,
Where the shift value shift(r, Nb) depends on the row number, r, as follows (Nb = 4)
Shift(1,4) = 1: Shift(2,4) = 2; Shift(3,4) = 3.
This has the effect of moving bytes to ―lower‖ positions in the row (i.e.,lower
values of c in a given row), while the ―lowest‖ bytes wrap around into the ―top‖ of the row
(i.e., higher values of c in a given row). Figure 7 illustrates the ShiftRows()
transformation.
19
[1]
Figure 2.11. Cyclic Shift of the Last Three Rows of the State
2.6.3. MIXING OF COLUMNS TRANSFORMATION:
This transformation is based on Galois Field multiplication. Each byte of a

column is replaced with another value that is a function of all four bytes in the given
column. The MixColumns( ) transformation operates on the State column-by-column,
treating each column as a four-term polynomial as described in Section.1.3.4. The
columns are considered as polynomials over GF (28) and multiplied modulo x4 + 1 with a
fixed polynomial a(x), given by the following equation.
a(x) = {03}x3 +{01}x2 +{01}x1 +{02}x0.
As described in Section. 1.3.4, this can be written as a matrix multiplication. Let
S ' (x) = a(x) ⊗S(x)
S '0,c 02 03 01 01 S0,c
S '1,c 01 02 03 01 S1,c
=
S '2,c 01 01 02 03 S2,c
S '3,c 03 01 01 02 S3,c for 0 ≤ c < Nb.
20
As a result of this multiplication, the four bytes in a column are replaced by the following
S '0,c =({02}. S0,c ) ⊕ ({03}.S1,c) ⊕ ({01}.S2,c) ⊕ ({01}.S3,c )

S '1,c =({01}. S0,c ) ⊕ ({02}.S1,c) ⊕ ({01}.S2,c) ⊕ ({01}.S3,c )
S '2,c =({01}. S0,c ) ⊕ ({01}.S1,c) ⊕ ({02}.S2,c) ⊕ ({03}.S3,c )
S '3,c =({03}. S0,c ) ⊕ ({01}.S1,c) ⊕ ({01}.S2,c) ⊕({02}.S3,c )
[1]
FIGURE 2.12. MIXING OF COLUMNS OF THE STATE
Understanding Of Calculations For Mix-Columns
For detailed understanding of calculations for mix-columns is as follows

The mix columns theory is calculated using this formula[1]:
r0 2 3 1 1 a0
r1 = 1 2 3 1 a1
r2 1 1 2 3 a2
r3 3 1 1 2 a3
where r0, r1, r2 and r3 are the results after the transformation. a0 – a3 can be obtain
from the matrix after the data undergoes substitution process in the S-Boxes.
Let's take this example:
21
a0-a3 r0-r3
d4 02 03 01 01 04
bf  01 02 03 01 = 66
01 01 02 03
5d 81
03 01 01 02
30 e5
In this example, a0 – a3 is equals to d4 – 30 and r0 – r3 is equals to 04 – e5.

note that in this it still follows the matrix multiplication rules: row x column. Currently the
matrix size looks like this:
[4 x 1] . [4 x 4] ≠ [4 x 1]
Remember matrix idea of multiplication, to obtain [4 x 1], then the formula to be
[4 x 4] . [4 x 1] = [4 x 1]
Therefore to switch matrices over.
d4 04
02 03 01 01 bf 66
01 02 03 01 x =
5d 81
01 01 02 03
03 01 01 02 30 e5
To calculate the results, multiply the rows with the column. Firstly, take the first
row of the first matrix and multiply the values with a's values.
To get the r0 value, the formula goes like this:
r0 = {02.d4} + {03.bf} + {01.5d} + {01.30}
But when calculating directly go into the steps one at a time.
1. {02.d4}
Now converting d4 to binary. d4 is a byte so when using the Calculator change it

to byte under Hex mode.
d4 = 1101 0100
22
Now d4 is exactly 8 bits. In the case where never get a 8 bits long characters
such as 25 in Hex (converted: 100101), pad on with 0 in the front of the result until 8
characters of 1's and 0's. (25 ends up with 0010 0101)
Now another thing to remember, there is a rule established in the multiplication of

the values as written in the book, Cryptography and Network Security[2], that
multiplication of a value by x (ie. by 02) can be implemented as a 1-bit left shift followed
by a conditional bitwise XOR with (00011011) if the leftmost bit of the original value
(before the shift) is 1. now implement the rule in the calculation.
{d4}.{02} = 1101 0100 << 1 (<< is left shift, 1 is the number of shift done, pad on with
0's)
= 1010 1000 XOR 0001 1011 (because the leftmost is a 1 before shift)
= 1011 0011 (ans)
Calculation:
1010 1000
0001 1011 (XOR)
1011 0011
Now do the same for next set of values, {03.bf}
2. {03.bf}
Similarly, convert bf into binary:
bf = 1011 1111
In this case, multiply 03 to bf. split 03 up in its binary form.
03 = 11
= 10 XOR 01
It is now able to calculate the result.
{03} . {bf} = {10 XOR 01} . {1011 1111}

= {1011 1111 . 10} XOR {1011 1111 . 01}
= {1011 1111 . 10} XOR {1011 1111}
(Because {1011 1111} x 1[in decimal] = 1011 1111)
= 0111 1110 XOR 0001 1011 XOR 1011 1111
= 1101 1010 (ans)
23
{01.5d} and {01.30} is basically multiplying 5d and 30 with 1(in decimal) which end up
with the original values. There isn't a need to calculate them using the above method.
But its is not needed to convert values to binary form.
5d = 0101 1101
30 = 0011 0000
Now, add those values together. As the values are in binary form, addition will be
using XOR.
r0 = {02.d4} + {03.bf} + {01.5d} + {01.30}

= 1011 0011 XOR 1101 1010 XOR 0101 1101 XOR 0011 0000
= 0000 0100
= 04 (in Hex)
Now for the next row.
r1 = {01.d4} + {02.bf} + {03.5d} + {01.30}
1. {02.bf}
{bf} . {02} = 1011 1111 << 1

= 0111 1110 XOR 0001 1011
= 0110 0101
2. {03.5d}
{5d} . {03} = {0101 1101. 02} XOR { 0101 1101}

= 1011 1010 XOR 0101 1101
= 1110 0111
Therefore,
r1 = {01.d4} + {02.bf} + {03.5d} + {01.30}

= 1101 0100 XOR 0110 0101 XOR 1110 0111 XOR 0011 0000
= 0110 0110
= 66 (in Hex)
second values are obtained, 66. Do the same for the rest and will get all the results.
2.6.4 Addition of Round Key Transformation
In the Addition of Round Key transformation AddRoundKey( ), a Round Key is

added to the State by a simple bitwise XOR operation. Each Round Key consists of Nb
words from the key schedule generation (described in following section 2.6). Those Nb
words are each added into the columns of the State, such that
24
[S'0,c , S'1,c , S'2,c , S'3,c ] = [S0,c ,S1,c,S2,c ,S3,c ] ⊕ [Wround ⊕ Nb] for 0 ≤ c<Nb,
FIGURE 2.13. EXCLUSIVE-OR OPERATION OF STATE AND CIPHER KEY WORDS [1]
where [wi] are the key generation words described in chapter 3, and round is a value in
the range in the Encryption, the initial Round Key addition occurs when round = 0, prior
to the first application of the round function. The application of the AddRoundKey ( )
transformation to the Nr rounds of the encryption occurs when 1 ≤ round ≤ Nr. The
action of this transformation is illustrated in figure10, where l = round * Nb. The byte
address with in words of the key schedule was described in Section1.2.1.
2.6.5 Key Schedule Generation:
Each round key is a 4-word (128-bit) array generated as a product of the previous
round key, a constant that changes each round, and a series of S-Box (figure6) lookups for
each 32-bit word of the key. The first round key is the same as the original user input.
Each byte (w0 - w3) of initial key is XOR‘d with a constant that depends on the current
round, and the result of the S-Box lookup for wi, to form the next round key. The number of
rounds required for three different key lengths is presented in figure11.
25
Key Block Number of
length size Rounds(nr)
(nk (nb word)
words)
AES- 4 4 10
128
AES- 6 4 12
192
AES- 8 4 14
256
[1]
TABLE 2.3. KEY-BLOCK- ROUND COMBINATIONS
The Key schedule Expansion generates a total of Nb(Nr + 1) words: the

algorithm requires an initial set of Nb words, and each of the Nr rounds requires Nb
words of key data. The resulting key schedule consists of a linear array of 4-byte words,
denoted [wi], with i in the range 0 ≤ i < Nb(Nr + 1).
Prior to encryption or decryption the key must be explanded.the expanded key is
used in the add round key function. Each time the add round key function is called a
different part of the expanded key is XORed against the state. In order for this to work
the expanded key must be large enough so that it can provide key material for every
time the add round key function is executed. The add round key function gets called for
each round as well as one extra time at the beginning of the algorithm.
Therefore the size of the expanded key will always be equal to:
The 16 in the above function is actually the size of the block in bytes.tis provides key
material for every byte in the block during every round +1.
Since the key size is much smaller than the size of the sub keys,the key is actually
―streached out‖ to provide enough key space for the algorithm.
The key expansion routine executes a maximum of 4 consecutive functions.these
functions are:
ROT WORD
SUB WORD
RCON
An iteration of the above steps is caller a round.the amount of rounds of the key
expansion depends on the key size
26
Key Block Expansio Expande Round Rounds Expande
size size n d bytes / s key key d key
(bytes (bytes algorithm round copy expansio (bytes)
) ) rounds n
16 16 44 4 4 40 176
24 16 52 4 6 46 208
32 16 60 4 8 52 240
TABLE 2.4 REPRESENTATION OF AES-128, AES-192,AES-256 BIT BLOCK SIZE, EXPANSION

ALGORITHM, ROUND KEY COPY, ROUND KEY
The first bytes of the expanded keys are always equal to the key.if the key is 16
bytes long the first 16 bytes of the expanded key will be the same as the original key. If
the key size is 32 bytes then the first 32 bytes of the expanded key will be the same as
the original key.
Each round adds 4 bytes to the expanded key. With the exception of the first
rounds each round also takes the previous rounds 4 bytes as input operates and returns
4 bytes.
One more important note is that not all of the 4 functions are always called in each
round. The algorithm only calls all 4 of the functions every
4 rounds for 16 bytes key
6 ruonds for 24 bytes key
8 rounds for 32 bytes key
The rest of the rounds only a k function result is XORed with the result of the EK
function. There is an exception of this rule where if the key is 32 bytes long an
additional call to the sub word function is called every 8 rounds starting on the 13 th
round.
Rijndael‘s key schedule utilizes a number of operations, which will be described
before describing the key schedule.
ROTATE:
The rotate operation takes a32 bit word like this (in hexa decimal):
1d2c3a4f
And rotates it eight bits to left:
2c3a4f1d
RCON:
Rcon is what the rijndael documentation calls the exponentiation of 2 to a user-

specified values. Noe that this operation is not performed with regular integers, but in
rijndael‘s finite field. In polynomial form , 2 is 2 =00000010 =0x7 + 0x6 + 0x5 + 0x4 + 0x3 +
0x2 + 1x1 + 0x0 = x, and compute
rcon(i) =x(254+i)
27
in F28 or equivalently,
rcon(i) =x(254+i) mod x8 + x4 +x3 +x1 +1
in F2.
For example, the rcon(1)=1,the rcon(2)=2, the rcon(3)=4,and the rcon (9) is the
hexadecimal number 0x1b(27 decimal).
The below is rcon table for encryption process
01 02 04 08 10 20 40 80 1B 36
00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00
TABLE 2.5 RCON TABLE IN ENCRYPTION PROCESS
2.7. Decryption Process:
This process is direct inverse of the encryption process. All the transformations
applied in encrypton process are inversely applied to this process.hence thelast round
values of both the data and key are first round inputs for the decryption process and
follow in decreasing order.
Cipher text
Roundkey* AddRoundKey 1st round
InvShiftRows
InvSubBytes
Roundkey* AddRoundKey repeat

InvMixColumns Nr-1
round
InvShiftRows Last round

Roundkey*
InvSubBytes
AddRoundKey
Plain text
FIGURE 2.14:BLOCK DIAGRAM OF DECRYPTION PROCESS
28
2.7.1. INVERSE BYTES SUBSTITUTION TRANSFORMATION:
Inverse Byte Substitution Transformation InvSubBytes( ) is the inverse of the

byte substitution transformation, in which the inverse S-Box (figure14) is applied to each
byte of the State. This is obtained by applying the inverse of the affine transformation to
the equation (16) followed by taking the multiplicative inverse in GF (2 8).
[1]
`` FIGURE 2.15. APPLICATION OF THE INVERSE S-BOX TO EACH BYTE OF THE STATE
FIGURE 2.16. INVERSE S-BOX VALUES FOR ALL 256 COMBINATIONS IN HEXADECIMAL FORMAT
29
2.7.2. INVERSE SHIFT ROWS TRANSFORMATION:
Inverse Shift Rows Transformation InvShiftRows( ) is the inverse of the

ShiftRows( ) transformation presented in Chapter2. The bytes in the last three rows of
the State are cyclically shifted over different numbers of bytes. The first row, r = 0, is not
shifted. The bottom three rows are cyclically shifted by Nb-shift(r, Nb) bytes, where the
shift value shift(r, Nb) depends on the row number, and is explained in Section.2.3.
Specifically, the InvShiftRows ( ) transformation proceeds as follows
S' r,(c+ shift( r,Nb))mod Nb =Sr,c for 0≤ r<4 and 0≤ c<Nb
[1]
FIGURE 2.17. INVERSE CYCLIC SHIFT OF THE LAST THREE ROWS OF THE STATE
2.7.3. INVERSE MIXING OF COLUMNS TRANSFORMATION:
Inverse Mixing of Columns Transformation InvMixColumns( ) is the inverse of the

MixColumns ( ) transformation presented in chapter2. InvMixColumns ( ) operates on
the State column-by-column, treating each column as a four term polynomial as
described in Section.1.3.4. The columns are considered as polynomials over GF (2 8)
and multiplied modulox4 + 1 with a fixed polynomial a-1(x), given by
a−1(x) = {0b}x3 +{0d}x2 +{09}x1 +{0e}x0.
As described in Section.1.3.4, this can be written as a matrix multiplication. Let
S '(x) = a−1(x) ⊗S(x)
30
S '0,c 0c 0b 0d 09 S0,c
S '1,c 09 0c 0b 0d S1,c
=
S '2,c 0d 09 0c 0b S2,c
S '3,c 0b 0d 09 0c S3,c for 0 ≤ c<Nb.
As a result of this multiplication, the four bytes in a column are replaced by the following
equations.
S '0,c =({0c}. S0,c ) ⊕ ({0b}.S1,c) ⊕ ({0d}.S2,c) ⊕({09}.S3,c )

S '1,c =({09}. S0,c ) ⊕ ({0c}.S1,c) ⊕ ({0b}.S2,c) ⊕({01}.S3,c )
S '2,c =({0d}. S0,c ) ⊕ ({09}.S1,c) ⊕ ({0c}.S2,c) ⊕({0b}.S3,c )
S '3,c =({0b}. S0,c ) ⊕ ({0d}.S1,c) ⊕ ({09}.S2,c) ⊕({0c}.S3,c )
Hence, State can be represented as
[1]
FIGURE 2.18. INVERSE MIX COLUMN OPERATION ON STATE
For detailed understanding the calculation for the inverse mix-columns operation is as
follows
Understanding Of Calculations For Inverse Mix-Columns:
Multiplication of Bits
In encryption process, Mix-Columns[2], mentioned about shifting of 1 bits to the

left and an operation XOR if the leftmost bit before the move is 1. In Inverse Mix-
Columns, this idea still works however with one thing in mind: no longer multiplying
numbers like 2, 3 or 1. but rather 13, 11 and 9. the value to compute is[1] :
31
0101 0111×1000 0011
To find the answer, we have to do it in steps. Let's go through it step by step:
Step 1: Split 1000 0011 into smaller bits.

When this, it means there is only one 1 bit in the 8 bits value (eg. 1000 0000). To get
1000 0011 the value will be this:
1000 0011=1000 0000 XOR 0000 0010 XOR 0000 0001
Step 2: Determine the results of multiplication with 0101 0111
This is the part where you need to pay a little more than usual focus. A single
mistake in the calculation might cause the rest of the calculation wrong. Let's start
multiplying then.
0101 0111 x 0000 0001 = 0101 0111
This part is the same as anything multiply by 1 since 0000 0001 = 1 in decimal.
0101 0111 x 0000 0010 = 0101 0111 << 1

= 1010 1110
Remember this is a shift in bit to the left and appending a 0 at the end.
0101 0111 x 0000 0100 = (0101 0111 x 0000 0010) << 1 XOR 0001 1011
= (1010 1110 << 1) XOR 0001 1011
= 0101 1100 XOR 0001 1011
= 0100 0111
Notice that there is an XOR value in this calculation. This is because the left most
bit in the original value is 1. In our previous calculation, there is only a shift in bits as the
original value has 0 as its left most bit. Therefore this is the conditional XOR in the shift
of value in our calculation. Let's continue then
.
0101 0111 x 0000 1000 = (0101 0111 x 0000 0100) << 1
= 1000 1110
0101 0111 x 0001 0000 = (0101 0111 x 0000 1000) << 1 XOR 0001 1011
= 0001 1100 XOR 0001 1011
= 0000 0111
0101 0111 x 0010 0000 = (0101 0111 x 0001 0000) << 1

= 0000 1110
32
0101 0111 x 0100 0000 = (0101 0111 x 0010 0000) << 1
= 0001 1100
0101 0111 x 1000 0000 = (0101 0111 x 0100 0000) << 1

= 0011 1000
Once we are done with this, we are ready to proceed on.
Step 3: Get the final value, 0101 0111×1000 0011
In step 1 we have already split 1000 0011 into smaller parts. We can now use that in
our calculation
0101 0111 x 1000 0011 = 0101 0111 x (1000 0000 XOR 0000 0010 XOR 00000001)
= 0101 0111 x 1000 0000 XOR 0101 0111 x 0000 0010 XOR
0101 0111 x 0000 0001
= 0011 1000 XOR 1010 1110 XOR 0101 0111
= 1100 0001 (Ans)
And that's how we get the answer.
AES Inverse Mix Column Calculation Example:
Understanding AES Mix-Columns Transformation Calculation [2] , except that

this time, it is the exact opposite round. In inverse mix column transformation, our 4x4
matrix is no longer
02 03 01 01
01 02 03 01
01 01 02 03
03 01 01 02
\
In inverse mix column transformation, we will be using this matrix instead:
0E 0B 0D 09
09 0E 0B 0D
0D 09 0E 0B
0B 0D 09 0E
Therefore our formula will be:
33
0E 0B 0D 09 04 D4
09 0E 0B 0D 66 BF
0D 09 0E 0B 81 = 5D
0B 0D 09 0E E5 30
Remember multiplication of matrix is always row x column. Therefore, we will

have the first result:
1. {0E.04} + {0B.66} + {0D.81} + {09.E5} = D4

2. {09.04} + {0E.66} + {0B.81} + {0D.E5} = BF
3. {0D.04} + {09.66} + {0E.81} + {0B.E5} = 5D
4. {0B.04} + {0D.66} + {09.81} + {0E.E5} = 30
compute the first two formulas. try on the last two formulas. So let's start with (1).
1. {0E.04} + {0B.66} + {0D.81} + {09.E5} = D4
Same as Mix-Columns Transformation, we will work in parts to arrive at our answers.
➢0E.04
First of all, convert the hex-decimal to binary.
0E = 0000 1110
04 = 0000 0100
We will now use the same way in the first part of the document, multiplication of bits.
For ease of computation, choose 0E as the split value.
0E = 0000 1110 = 0000 1000 XOR 0000 0100 XOR 0000 0010
0000 0100 x 0000 0001 = 0000 0100

0000 0100 x 0000 0010 = 0000 1000
0000 0100 x 0000 0100 = 0001 0000
0000 0100 x 0000 1000 = 0010 0000
Therefore, we can now compute 0E.04. Notice that never continue the next few
values like. It is not necessary to go to values that are not using to save you some time
in
exam.
0E.04 = {0000 0100 x 0000 1000} XOR {0000 0100 x 0000 0100}
34
XOR {0000 0100 x 0000 0010}
= 0000 1000 XOR 0001 0000 XOR 0010 0000
= 0011 1000
Do the same for the rest.
➢0B.66
0B = 0000 1011
66 = 0110 0110
0B = 0000 1000 XOR 0000 0010 XOR 0000 0001
0110 0110 x 0000 0001 = 0110 0110

0110 0110 x 0000 0010 = 1100 1100
0110 0110 x 0000 0100 = 1001 1000 XOR 0001 1011 = 1000 0011
0110 0110 x 0000 1000 = 0000 0110 XOR 0001 1011 = 0001 1101
0B.66 = {0000 1000 x 0110 0110} XOR {0000 0010 x 0110 0110}
XOR {0000 0001 x 0110 0110}
= 0001 1101 XOR 1100 1100 XOR 0110 0110
= 1011 0111
➢0D.81
0D = 0000 1101 = 0000 1000 XOR 0000 0100 XOR 0000 0001
81 = 1000 0001
1000 0001 x 0000 0001 = 1000 0001

1000 0001 x 0000 0010 = 0000 0010 XOR 0001 1011 = 0001 1001
1000 0001 x 0000 0100 = 0011 0010
1000 0001 x 0000 1000 = 0110 0100
0D.81 = {0000 1000 x 1000 0001 } XOR {0000 0100 x 1000 0001}
XOR {0000 0001 x 1000 0001}
= 0110 0100 XOR 0011 0010 XOR 1000 0001
= 1101 0111
➢09.E5
09 = 0000 1001 = 0000 1000 XOR 0000 0001
E5 = 1110 0101
1110 0101 x 0000 0001 = 1110 0101

35
1110 0101 x 0000 0010 = 1100 1010 XOR 0001 1011 = 1101 0001
1110 0101 x 0000 0100 = 1010 0010 XOR 0001 1011 = 1011 1001
1110 0101 x 0000 1000 = 0111 0010 XOR 0001 1011 = 0110 1001
09.E5 = {0000 1000 x 1110 0101} XOR {0000 0001 x 1110 0101}
= {0110 1001} XOR 1110 0101
= 1000 1100
Thus,
{0E.04} + {0B.66} + {0D.81} + {09.E5} = 0011 1000 XOR 1011 0111 XOR 1101 0111
XOR 1000 1100
= 1101 0100
= D4 (Shown)
2. {09.04} + {0E.66} + {0B.81} + {0D.E5} = BF
Do the exact same thing as the above.
➢09.04
09 = 0000 1001 = 0000 1000 XOR 0000 0001

04 = 0000 0100
0000 0100 x 0000 0001 = 0000 0100

0000 0100 x 0000 0010 = 0000 1000
0000 0100 x 0000 0100 = 0001 0000
0000 0100 x 0000 1000 = 0010 0000
09.04 = {0000 0100 x 0000 1000} XOR {0000 0100 x 0000 0001}
= 0010 0000 XOR 0000 0100
= 0010 0100
➢0E.66
0E = 0000 1110 = 0000 1000 XOR 0000 0100 XOR 0000 0010
66 = 0110 0110
0110 0110 x 0000 0001 = 0110 0110

0110 0110 x 0000 0010 = 1100 1100
0110 0110 x 0000 0100 = 1001 1000 XOR 0001 1011 = 1000 0011
0110 0110 x 0000 1000 = 0000 0110 XOR 0001 1011 = 0001 1101
0E.66 = {0110 0110 x 0000 1000} XOR {0110 0110 x 0000 0100}
36
XOR {0110 0110 x 0000 0010}
= 0001 1101 XOR 1000 0011 XOR 1100 1100
= 0101 0010
➢0B.81
0B = 0000 1011 = 0000 1000 XOR 0000 0010 XOR 0000 0001
81 = 1000 0001
1000 0001 x 0000 0001 = 1000 0001

1000 0001 x 0000 0010 = 0000 0010 XOR 0001 1011 = 0001 1001
1000 0001 x 0000 0100 = 0011 0010
1000 0001 x 0000 1000 = 0110 0100
0B.81 = {1000 0001 x 0000 1000} XOR {1000 0001 x 0000 0010}
XOR {1000 0001 x 0000 0001}
= 0110 0100 XOR 0001 1001 XOR 1000 0001
= 1111 1100
➢0D.E5
0D = 0000 1101 = 0000 1000 XOR 0000 0100 XOR 0000 0001
E5 = 1110 0101
1110 0101 x 0000 0001 = 1110 0101

1110 0101 x 0000 0010 = 1100 1010 XOR 0001 1011 = 1101 0001
1110 0101 x 0000 0100 = 1010 0010 XOR 0001 1011 = 1011 1001
1110 0101 x 0000 1000 = 0111 0010 XOR 0001 1011 = 0110 1001
0D.E5 = {1110 0101 x 0000 1000} XOR {1110 0101 x 0000 0100}
XOR {1110 0101 x 0000 0001}
= 0110 1001 XOR 1011 1001 XOR 1110 0101
= 0011 0101
Thus,
{09.04} + {0E.66} + {0B.81} + {0D.E5} = 0010 0100 XOR 0101 0010 XOR 1111 1100
XOR 0011 0101
= 1011 1111
= BF (Shown)
This is about it. Do the same for the rest and should be able to get all the values.
37
Plain text key plaintext
Add roundkey Add round key

w[0,3]
Round 10
Inverse subbytes
Expand key
subbytes
Inverse shift rows
Round 1
z
shiftrows
Round 9
Mix columns Inverse mix cols
W[4,7] Add round key

Add round key
Inverse sub bytes

-
- Inverse shift rows
-
-
-
-
subbytes -
Round 9
shiftrows
Mix columns Inverse mix columns

W[36,39]
Add round key Add round key Round 1
Inverse sub bytes

Sub bytes
Round 10
Inverse shift rows

Shift rows
Add round key W[40,43] Add round key
Cipher text ciphertext
[5]
FIGURE 2.19 OVERALL STRUCTURE OF AES
38
2.8 FPGA INTRODUCTION:
Field-programmable gate array (FPGA) is a semiconductor device that can be

configured by the customer or designer after manufacturing—hence the name "field-
programmable". To program an FPGA you specify how you want the chip to work with a
logic circuit diagram or a source code in a hardware description language (HDL).
FPGAs can be used to implement any logical function that an application-specific
integrated circuit (ASIC) could perform, but the ability to update the functionality after
shipping offers advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a

hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"—
somewhat like a one-chip programmable breadboard. Logic blocks can be configured to
perform complex combinational functions, or merely simple logic gates like AND and
XOR. In most FPGAs, the logic blocks also include memory elements, which may be
simple flip-flops or more complete blocks of memory.
For any given semiconductor process, FPGAs are usually slower than their fixed
ASIC counterparts. They also draw more power, and generally achieve less functionality
using a given amount of circuit complexity. But their advantages include a shorter time
to market, ability to re-program in the field to fix bugs, and lower non-recurring
engineering costs. Vendors can also take a middle road by developing their hardware
on ordinary FPGAs, but manufacture their final version so it can no longer be modified
after the design has been committed.
The historical roots of FPGAs are in complex programmable logic devices

(CPLDs) of the early to mid 1980s. A Xilinx co-founder, Ross Freeman, invented the
field programmable gate array in 1984. CPLDs and FPGAs include a relatively large
number of programmable logic elements. CPLD logic gate densities range from the
equivalent of several thousand to tens of thousands of logic gates, while FPGAs
typically range from tens of thousands to several million.
The primary differences between CPLDs and FPGAs are architectural. A CPLD
has a somewhat restrictive structure consisting of one or more programmable sum-of-
products logic arrays feeding a relatively small number of clocked registers. The result
of this is less flexibility, with the advantage of more predictable timing delays and a
higher logic-to-interconnect ratio. The FPGA architectures, on the other hand, are
dominated by interconnect. This makes them far more flexible (in terms of the range of
designs that are practical for implementation within them) but also far more complex to
design for.
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers) and
embedded memories, as well as to have logic blocks implements decoders or
mathematical functions.
39
Some FPGAs have the capability of partial re-configuration that lets one portion
of the device be re-programmed while other portions continue running.
A recent trend has been to take the coarse-grained architectural approach a step
further by combining the logic blocks and interconnects of traditional FPGAs with
embedded microprocessors and related peripherals to form a complete "system on a
programmable chip". This work mirrors the architecture by Ron Perlof and Hana Potash
of Burroughs Advanced Systems Group which combined a reconfigurable CPU
architecture on a single chip called the SB24. That work was done in 1982. Examples of
such hybrid technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices,
which include one or more PowerPC processors embedded within the FPGA's logic
fabric. The Atmel FPSLIC is another such device, which uses an AVR processor in
combination with Atmel's programmable logic architecture.
An alternate approach to using hard-macro processors is to make use of "soft"

processor cores that are implemented within the FPGA logic. (See "Soft processors"
below).
As previously mentioned, many modern FPGAs have the ability to be reprogrammed at

"run time," and this is leading to the idea of reconfigurable computing or reconfigurable
systems — CPUs that reconfigure themselves to suit the task at hand. The Mitrion
Virtual Processor from Mitrionics is an example of a reconfigurable soft processor that is
implemented on FPGAs. It does not however support dynamic reconfiguration at
runtime, but instead adapts itself to a specific program.
Additionally, new, non-FPGA architectures are beginning to emerge. Software-

configurable microprocessors such as the Stretch S5000 adopt a hybrid approach by
providing an array of processor cores and FPGA-like programmable cores on the same
chip.
Applications of FPGAs include digital signal processing, software-defined radio,

aerospace and defense systems, ASIC prototyping, medical imaging, computer vision,
speech recognition, cryptography, bioinformatics, computer hardware emulation and a
growing range of other areas. FPGAs originally began as competitors to CPLDs and
competed in a similar space, that of glue logic for PCBs. As their size, capabilities, and
speed increased, they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SOC).
FPGAs especially find applications in any area or algorithm that can make use of
the massive parallelism offered by their architecture. One such area is code breaking, in
particular brute-force attack, of cryptographic algorithms.
FPGAs are increasingly used in conventional High Performance Computing

applications where computational kernels such as FFT or Convolution are performed on
the FPGA instead of a microprocessor. The use of FPGAs for computing tasks is known
as reconfigurable computing.
40
The inherent parallelism of the logic resources on the FPGA allows for
considerable compute throughput even at a sub-500 MHz clock rate. For example, the
current (2007) generation of FPGAs can implement around 100 single precision floating
point units, all of which can compute a result every single clock cycle. The flexibility of
the FPGA allows for even higher performance by trading off precision and range in the
number format for an increased number of parallel arithmetic units. This has driven a
new type of processing called reconfigurable computing, where time intensive tasks are
offloaded from software to FPGAs.
The adoption of FPGAs in high performance computing is currently limited by the

complexity of FPGA design compared to conventional software and the extremely long
turn-around times of current design tools, where 4-8 hours wait is necessary after even
minor changes to the source code
2.8.1 FPGA ARCHITECTURE:
The typical basic architecture consists of an array of configurable logic blocks

(CLBs) and routing channels. Multiple I/O pads may fit into the height of one row or the
width of one column in the array. Generally, all the routing channels have the same
width (number of wires).
An application circuit must be mapped into an FPGA with adequate resources.
A classic FPGA logic block consists of a 4-input lookup table (LUT), and a flip-
flop, as shown below. In recent years, manufacturers have started moving to 6-input
LUTs in their high performance parts, claiming increased performance.
FIGURE 2.20 TYPICAL FPGA LOGIC BLOCK
There is only one output, which can be either the registered or the unregistered LUT
output. The logic block has four inputs for the LUT and a clock input. Since clock signals
(and often other high-fanout signals) are normally routed via special-purpose dedicated
routing networks in commercial FPGAs, they and other signals are separately managed.
41
For this example architecture, the locations of the FPGA logic block pins are shown
below.
FIGURE2.21 FPGA LOGIC BLOCK PIN LOCATION
Each input is accessible from one side of the logic block, while the output pin can
connect to routing wires in both the channel to the right and the channel below the logic
block. Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it.
Similarly, an I/O pad can connect to any one of the wiring segments in the
channel adjacent to it. For example, an I/O pad at the top of the chip can connect to any
of the W wires (where W is the channel width) in the horizontal channel immediately
below it.
Generally, the FPGA routing is unsegmented. That is, each wiring segment
spans only one logic block before it terminates in a switch box. By turning on some of
the programmable switches within a switch box, longer paths can be constructed. For
higher speed interconnect, some FPGA architectures use longer routing lines that span
multiple logic blocks.
Whenever a vertical and a horizontal channel intersect, there is a switch box. In

this architecture, when a wire enters a switch box, there are three programmable
switches that allow it to connect to three other wires in adjacent channel segments. The
pattern, or topology, of switches used in this architecture is the planar or domain-based
switch box topology. In this switch box topology, a wire in track number one connects
only to wires in track number one in adjacent channel segments, wires in track number
2 connect only to other wires in track number 2 and so on. The figure below illustrates
the connections in a switch box.
Modern FPGA families expand upon the above capabilities to include higher level
functionality fixed into the silicon. Having these common functions embedded into the
silicon reduces the area required and gives those functions increased speed compared
to building them from primitives. Examples of these include multipliers, generic DSP
blocks, embedded processors, high speed IO logic and embedded memories.
42
FIGURE 2.22 FPGA SWITCH BOX TOPOLOGY
FPGAs are also widely used for systems validation including pre-silicon
validation, post-silicon validation, and firmware development. This allows chip
companies to validate their design before the chip is produced in the factory, reducing
the time to market.
2.8.2 FPGA DESIGN AND PROGRAMMING:
To define the behaviour of the FPGA, the user provides a hardware description
language (HDL) or a schematic design. The HDL form might be easier to work with
when handling large structures because it's possible to just specify them numerically
rather than having to draw every piece by hand. On the other hand, schematic entry can
allow for easier visualisation of a design.
Then, using an electronic design automation tool, a technology-mapped net list is

generated. The net list can then be fitted to the actual FPGA architecture using a
process called place-and-route, usually performed by the FPGA Company‘s proprietary
place-and-route software. The user will validate the map, place and route results via
timing analysis, simulation, and other verification methodologies. Once the design and
validation process is complete, the binary file generated (also using the FPGA
company's proprietary software) is used to (re)configure the FPGA.
Going from schematic/HDL source files to actual configuration: The source files
are fed to a software suite from the FPGA/CPLD vendor that through different steps will
produce a file. This file is then transferred to the FPGA/CPLD via a serial interface
(JTAG) or to an external memory device like an EEPROM.
The most common HDLs are VHDL and Verilog, although in an attempt to reduce
the complexity of designing in HDLs, which have been compared to the equivalent of
43
assembly languages, there are moves to raise the abstraction level through the
introduction of alternative languages.
To simplify the design of complex systems in FPGAs, there exist libraries of predefined
complex functions and circuits that have been tested and optimized to speed up the
design process. These predefined circuits are commonly called IP cores, and are
available from FPGA vendors and third-party IP suppliers (rarely free, and typically
released under proprietary licenses). Other predefined circuits are available from
developer communities such as Open Cores (typically free, and released under the
GPL, BSD or similar license), and other sources.
In a typical design flow, an FPGA application developer will simulate the design
at multiple stages throughout the design process. Initially the RTL description in VHDL
or Veri log is simulated by creating test benches to simulate the system and observe
results. Then, after the synthesis engine has mapped the design to a net list, the net list
is translated to a gate level description where simulation is repeated to confirm the
synthesis proceeded without errors. Finally the design is laid out in the FPGA at which
point propagation delays can be added and the simulation run again with these values
back-annotated onto the net list.
Field-programmable gate arrays (FPGAs) arrived in 1984 as an alternative to

programmable logic devices (PLDs) and ASICs. As their name implies, FPGAs offer the
significant benefit of being readily programmable. Unlike their fore bearers in the PLD
category, FPGAs can (in most cases) be programmed again and again, giving
designers multiple opportunities to tweak their circuits.
There‘s no large non-recurring engineering (NRE) cost associated with FPGAs.

In addition, lengthy, nerve wracking waits for mask-making operations are squashed.
Often, with FPGA development, logic design begins to resemble software design due to
the many iterations of a given design. Innovative design often happens with FPGAs as
an implementation platform.
But there are some downsides to FPGAs as well. The economics of FPGAs force
designers to balance their relatively high piece-part pricing compared to ASICs with the
absence of high NREs and long development cycles. They‘re also available only in fixed
sizes, which matters when you‘re determined to avoid unused silicon area.
44
TABLE 2.6 DO’S AND DON’TS FOR THE FPGA DESIGNER
FPGAs fill a gap between discreet logic and the smaller PLDs on the low end of
the complexity scale and costly custom ASICs on the high end. They consist of an array
of logic blocks that are configured using software. Programmable I/O blocks surround
these logic blocks. Both are connected by programmable interconnects.
The programming technology in an FPGA determines the type of basic logic cell
and the interconnect scheme. In turn, the logic cells and interconnection scheme
determine the design of the input and output circuits as well as the programming
scheme. Just a few years ago, the largest FPGA was measured in tens of thousands of
system gates and operated at 40 MHz. Older FPGAs often cost more than $150 for the
most advanced parts at the time. Today, however, FPGAs offer millions of gates of logic
capacity, operate at 300 MHz, can cost less than $10, and offer integrated functions
like processors and memory.
2.8.3 FPGA USAGE THEN AND NOW:
Fifteen years ago, FPGA‘s were designed into systems primarily to reduce system
component costs by consolidating board-level logic into fewer devices. A few hundred
logic gates were replaced by a single FPGA that implemented the same functionality.
Because the role of an FPGA was the consolidation of board-level logic into fewer
components. Availability of tools was a major bottleneck then.
45
The complexity of today‘s FPGA‘s, allows system architects to replace a broad
range of ASICs with FPGA‘s and further consolidate and integrate the system logic into
fewer and fewer components. Today‘s FPGA‘s provide unprecedented flexibility at
attractive costs.
FIGURE 2.23 FPGA DEVICE COMPLEXCITY
The advantage of
 No NRE costs.
 Easy design modification.
 In-System Re-Programmability.
 Easy off-the-shelf availability in small volumes and
 Fast time to Market.
Make FPGA‘s a very attractive alternative to ASICs.
In the earlier days of FPGA‘s, users were primarily board design engineers and
their applications were limited to using FPGA‘s to integrate and consolidate 100s of
gates of board-level logic. To do so, they used their time-tested board design
methodology anchored by schematic capture. They would do the logic minimization
manually and capture the design at the Boolean logic level.
Hence the only additional tool needed was a fitter ( FPGA place and route tool) to
map the logic design into the FPGA with correct routing. The fitter then generated a
chip-programming file. Verification often consisted of actual prototyping.
46
New FPGA and Old ASIC designers are moving to fully exploit the new generation of
FPGA devices by replacing more ASICs with FPGA‘s. As FPGA‘s replace ASICs, FPGA
design is moving from board engineers into chip design teams. Adopting HDL design
methods helps meet their tight time-to-market requirements while implementing designs
into ever larger and more complex FPGA‘s. The combination of these FPGA technology
and design trends increase the need for FPGA design solutions that provide tools
powerful enough to handle ASIC designs while also delivering the productivity of an
integrated FPGA design flow
Fifteen years later, the thought of designing a million-gate FPGA using a schematic
design methodology defies rational thought.
Today‘s FPGA designers are adopting HDL-based design methods at

astonishing rates. HDL-based design, have increased productivity by allowing the
designer to work at higher levels of abstraction — the Register-Transfer Level instead of
Gate level.
Central to shift-over to HDL-based designs coupled with increased size of FPGA‘s, are
two strategically important tools:
 Simulation for design verification.

 Synthesis for automatic implementation of RTL design to the Gate-level.
Breadboard prototyping has fallen apart as a practical design verification method,

due to the cost of debugging functionality after layout.
Simulation allows design problems to be discovered earlier when it is more cost-
effective to fix them.
Schematics and block diagrams still have a limited role in FPGA design. Their
role has been limited to manual implementation of tightly constrained functional blocks
or to help manage complexity by graphically partitioning the design into smaller blocks.
FPGA Also known as:
 LCA (Logic Cell Array)

 pASIC (programmable ASIC)
 FLEX, APEX (Altera)
 ACT (Actel)
 ORCA (Lucent)
 Virtex (Xilinx)
 pASIC (QuickLogic)
47
A generic description of an FPGA is a programmable device with an internal
array of logic blocks, surrounded by a ring of programmable input/output blocks,
connected together via programmable interconnect. There are a wide variety of sub-
architectures within this group. The secret to density and performance in these devices
lies in the logic contained in their logic blocks and on the performance and efficiency of
their routing architecture.
FPGAs are a distinct from SPLDs and CPLDs and typically offer the highest logic
capacity. A typical FPGA contains from 64 to tens of thousands of logic blocks and an
even greater number of flip-flops. Most FPGAs do not provide 100% interconnect
between logic blocks, to do so would be prohibitively expensive in terms of area.
FIGURE 2.24 ARCHETECHTURE OF FPGA
48
CHAPTER 3
IMPLEMETATION
In this chapter method to implementation of the advanced encryption process-

Rijndeal algorithm was discussed. All the implementation is of 128 bit key
3.1.AES ENCRYPTION PROCESS:
3.1.1 ENCRYPTION IMPLEMENTATION:
VHDL is used as the hardware description language because of the flexibility to

exchange among environments. The code is pure VHDL that could easily be
implemented on other devices, without changing the design. The software used for this
work is Xilinx ISE 8.1i. This is used for writing, debugging and optimizing efforts, and
also for fitting, simulating and checking the performance results using the simulation
tools available on Xilinx ISE design software.
3.1.2 STEPS FOLLOWED IN ENCRYPTION PROCESS:
Addroundkey (state, roundkey)
i=Nr
subbytes (state)
shiftrows (state)
mixcolumns (state)
Key Schedule
Addroundkey(State,Roundkey)
i=i+1
i >nr
yes
Subbytes(state)
no
shiftrows(state)
Addroundkey(state,roundkey)
FIGURE 3.1:FLOW CHART REPRESENTATION FOR AES ENCRYPTION PROCESS
49
3.1.3. ADD ROUND KEY:
Add round key is an XOR between the state and the round key. this
transformation is its own inverse. AES operation-add round key
 Each byte of the round key is XORed with the corresponding byte in the state
table.inverse operation is identical since XOR a second time returns the original
values XOR each byte of the roundkey with the state table def
addroundkey(state,roundkey): for i in range(len(state)):state[i]^roundkey[i]
3.1.4.SUB BYTES:
sub byte is a substitute of each byte in the block independent of the

position in the state. This is an s-box. This is the non-linear transformation. The s-box
used is proved to be optimal with regards to non-linearity. The s-box is based on
arithmetic in GF(28).
AES operation –sub bytes
 Each byte of the state table is substituted with the values in the s-box whose
index is the value of the state table byte. Provide non-linearity(algorithm not equal to
the sum of its parts)
3.1.5. SHIFT ROWS:
Shift rows is a cyclic shift of the bytes in the rows in the state and is clearly
invertible (by a shift in the opposite direction by the same amount).
AES operation –shift rows
 Each row in the state table is shifted left by the number of bytes represented by
the row number.
3.1.6. MIX COLUMNS:
Linear mixing layer (shift row and mix column) which guarantees high diffusion.
Non linear s boxes protects against linear and differential cryptanalysis.
AES operation –mix columns

 Mix columns is performed by multiplying each column (within the galois finite
filed).
50
3.1.7. KEY EXPANSION:
AES-expansion operations
AES key expansion consists of several primitive operations:
 Rotate – takes a 4-byte word and rotates everything one byte to the left,
e.g.rotate([1,2,3,4])[2,3,4,1]
 Sub bytes-each byte of the word is substituted with the value in the s-box whose
index is the value of the original byte
 Rcon-the first byte of a word is XORed with the round constant. Each
value of the rcon table is a member of the rijndael finite field.
3.1.8.KEY SCHEDULE CORE:
This operation is used as an inner loop in the key schedule, and is done thus:
 The input is a 32-bit word and an iteration number i. The output is a 32-bit word.
 Copy the input over to the output.
 Use the above described rotate operation to rotate the output eight bits to the left.
 Apply rijndael‘s s-box on all four individual bytes in the output word
 On just the first(leftmost) byte of the output word,exclusive or the byte with 2 to
the power of(i-1). In other words,perform the rcon operation with i as the input
and exclusive or the rcon output with the first byte of the output word.
3.2. DECRYPTION IMPLEMENTATION:
The decryption implementation results are similar to the encryption

implementation. The key schedule generation module is modified in the reverse order.
In which last round key is treated as the first round and decreasing order follows.
51
Addroundkey (state, roundkey)
i=Nr
Invsubbytes (state)
Invshiftrows (state)
Key Schedule
Invmixcolumns (state)
Addroundkey(State,Roundkey)
i=i-1
i >1
Invsubbytes(state)
Invshiftrows(state)
Addroundkey(state,roundkey)
FIGURE 3.2:FLOW CHART REPRESENTATION FOR AES DECRYPTION PROCESS
3.2.1.INVERSE SHIFT ROWS:
Inverse operation simply shifts each row to the right by the number of bytes as
the row number.
3.2.1.INVERSE SUB BYTES:
Inverse operation is performed using the inverted s-box.
3.2.1.INVERSE MIX COLUMNS:
The inverse operation is performed by multiplying each column by the following

inverse matrix.
52
3.2.4. KEY SCHEDULE DESCRIPTION:
Rijndael‘s key schedule is done as follows:
1. The first n bytes of the expanded key are simply the encryption key.
2. The rcon iteration value i is set to 1
3. Until we have b byte s of expanded key. We do the following to generate n
more bytes of expanded key:
We do the following to create 4-bytes of expanded key:
1. We create a 4-bytes temporary variable,t
2. We assign the value of the previous four bytes in the expanded key to t
3. We perform the key schedule core(see above)on t, with i as the rcon iteration
value
4. We increment i by 1
5. We exclusive-or t with four-byte block n bytes before the new expanded key.
this becomes the next 4-bytes in the expanded key
We then do the following three times to create the next twelve bytes of expanded key:
1. We assign the values of the previous 4 bytes in the expanded key to t
2. We exclusive-or t with the four-byte block n bytes before the new expanded
key. This becomes the next 4 bytes in the expanded key
If we are generating a 256-bit key, we do the following to generate the next 4 bytes of
expanded key:
1. We assign the value of the previous four bytes in the expanded key to t
2. We run each of the 4 bytes in t through rijndael‘s s-box
3. We exclusive-or t with the 4-byte block 32 bytes before the new expanded key.
This becomes the next 4 bytes in the expanded key.
If we are generating a 128-bit key, we do not perform the following steps. If we
are generating a 192-bit key, we run the following steps twice. If we are generating a
256-bit key, we run the following steps three times
We assign the values of the previous 4 bytes in the expanded key to t
1. We exclusive-or t with four-byte block n bytes before the new expanded key.
This becomes the next 4 bytes in the expanded key.
3.3 CONSTANTS:
Since the key schedule for 128-bit, 192-bit, and 256-bit encryption are very
similar, with only some constants changed, the following key size constants are defined
here
N has a value of 16 for 128-bit keys,24 for 192-bit keys, and 32 for 256-bit keys
has a value of 176 for 128-bit keys,208 for 192-bit keys, and 240 for 256-bit keys.
53
3.4 HARDWARE IMPLEMENTATION:
In this project , the hardware implementation is done in Spartan 3E fpga starter

kit.
3.4.1 SPARTAN-3E FPGA FEATURES AND EMBEDDED PROCESSING

FUNCTIONS:
The Spartan-3E Starter Kit board highlights the unique features of the Spartan-
3E FPGA family and provides a convenient development board for embedded
processing applications. The board highlights these features:
• Spartan-3E specific features
• Parallel NOR Flash configuration
• Multi Boot FPGA configuration from Parallel NOR Flash PROM
• SPI serial Flash configuration
• Embedded development
• Micro Blaze™ 32-bit embedded RISC processor
• Pico Blaze™ 8-bit embedded controller
• DDR memory interfaces
Key Components and Features:
The key features of the Spartan-3E Starter Kit board are:

• Xilinx XC3S500E Spartan-3E FPGA
• Up to 232 user-I/O pins
• 320-pin FBGA package
• Over 10,000 logic cells
• Xilinx 4 Mbit Platform Flash configuration PROM
• Xilinx 64-macrocell XC2C64A CoolRunner CPLD
• 64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MHz
• 16 MByte (128 Mbit) of parallel NOR Flash (Intel StrataFlash)
• FPGA configuration storage
• MicroBlaze code storage/shadowing
• 16 Mbits of SPI serial Flash (STMicro)
• FPGA configuration storage
• MicroBlaze code shadowing
• 2-line, 16-character LCD screen
• PS/2 mouse or keyboard port
• VGA display port
• 10/100 Ethernet PHY (requires Ethernet MAC in FPGA)
• Two 9-pin RS-232 ports (DTE- and DCE-style)
• On-board USB-based FPGA/CPLD download/debug interface
• 50 MHz clock oscillator
• SHA-1 1-wire serial EEPROM for bitstream copy protection
• Hirose FX2 expansion connector
• Three Digilent 6-pin expansion connectors
54
• Four-output, SPI-based Digital-to-Analog Converter (DAC)
• Two-input, SPI-based Analog-to-Digital Converter (ADC) with programmable-
gain pre-amplifier
• ChipScope™ SoftTouch debugging port
• Rotary-encoder with push-button shaft
• Eight discrete LEDs
3.4.2 CHARACTER LCD SCREEN :
The spartan-3e starter kit board prominently features a 2-line by 16-character liquid
crystal display (lcd). The fpga controls the lcd via the 4-bit data interface shown in figure 3-
1. Although the lcd supports an 8-bit data interface, the starter kit board uses a4-bit data
interface to remain compatible with other xilinx development boards and tominimize total
pin count.
FIGURE 3.4: CHARACTER LCD INTERFACE
Once mastered, the lcd is a practical way to display a variety of information using
standard ascii and custom characters. However, these displays are not fast. Scrolling the
display at half-second intervals tests the practical limit for clarity. Compared with the 50
mhz clock available on the board, the display is slow. A picoblaze processor efficiently
controls display timing plus the actual content of the display.
3.4.3CHARACTER LCD INTERFACE SIGNALS :

55
Table 3.4 Shows The Interface Character LCD Interface Signals.
TABLE 3.4: CHARACTER LCD INTERFACE
3.4.4 VOLTAGE COMPATIBILITY :
The character LCD is power by +5v. The FPGA i/o signals are powered by
3.3v.however, the FPGA‘S output levels are recognized as valid low or high logic levels by
the LCD. The LCD controller accepts 5v TTL signal levels and the 3.3v LVCMOS outputs
provided by the FPGA meet the 5v TTL voltage level requirements.
The 390Ʌ series resistors on the data lines prevent overstressing on the FPGA and
STRATAFLASH I/O pins when the character LCD drives a high logic value. The character
LCD drives the data lines when LCD_RW is high. Most applications treat the lcd as a
write-only peripheral and never read from the display.
INTERACTION WITH INTEL STRATAFLASH :
As shown in figure 5-1, the four LCD data signals are also shared with
STRATAFLASH data lines SF_D<11:8>. As shown in table 5-2, the LCD/STRATAFLASH
interaction depends on the application usage in the design. When the STRATAFLASH
memory is disabled (sf_ce0 = high), then the FPGA application has full read/write access
to the LCD. Conversely, when LCD read operations are disabled (LCD_RW = low), then
the fpga application has full read/write access to the STRATAFLASH memory
TABLE 3-2: LCD/STRATAFLASH CONTROL INTERACTION
56
SF_CEO SF_BYTE LCD_RW OPERATION
1 X X Strata Flash Disabled.Full Read/Write Access To
LCD
X X 0 LCD Write Access Only.Full Access To Strata Flash
Strara Flash In Byte – Wide (X8) Mode.Upper Data
X 0 X Lines Are Not Used.Full Access To Both Lcd And
Strata Flash
Note : ‗X‘ Indicates A Don‘t Care, Can Be Either 0 Or 1
If the STRATAFLASH memory is in byte-wide (x8) mode (SF_BYTE = low), the

FPGA. Application has full simultaneous read/write access to both the LCD and the
STRATAFLASH memory. In byte-wide mode, the STRATAFLASH memory does not use
the SF_D<15:8> data lines.
UCF Location Constraints :
Figure 3.2 Provides The UCF Constraints For The Character LCD, Including The
I/O Pin Assignment And The I/O Standard Used.
NET ―LCD_E‖ LOC =‖M18‖ | IOSTANDARD=LVCMOS33 | DRIVE 4 | SLEW = SLOW ;

NET ―LCD_RS‖ LOC =‖L18‖| IOSTANDARD=LVCMOS33 | DRIVE 4 | SLEW = SLOW ;
NET ―LCD_RW‖LOC =‖L17‖ | IOSTANDARD=LVCMOS33 | DRIVE 4 | SLEW = SLOW ;
# The Lcd 4-Bit Data Interface Is Shared With The Strata Flash
NET ―SF_D<8>‖ LOC =‖R15‖| IOSTANDARD=LVCMOS33 | DRIVE 4 | SLEW = SLOW ;

NET ―SF_D<9>‖ LOC =‖R16‖| IOSTANDARD=LVCMOS33 | DRIVE 4 | SLEW = SLOW ;
NET ―SF_D<10>‖ LOC =‖P17‖| IOSTANDARD=LVCMOS33|DRIVE 4 | SLEW = SLOW ;
NET ―SF_D<D11>‖LOC =‖M18‖|IOSTANDARD=LVCMOS33 |DRIVE 4|SLEW = SLOW ;
3.5 LCD CONTROLLER :
The 2 x 16 character lcd has an internal SITRONIX st7066u graphics controller that is
functionally equivalent with the following devices.
 Samsung S6A0069X or KS0066U
 Hitachi HD44780
 SMOS SED1278
57
3.5.1 MEMORY MAP :
The controller has three internal memory regions, each with a specific purpose. The
display must be initialized before accessing any of these memory regions.
3.5.2 DD RAM :
The display data RAM (DD RAM) stores the character code to be displayed on the
screen. Most applications interact primarily with DD RAM. The character code stored in a
DD RAM location references a specific character bitmap stored either in the predefined
CG ROM character set or in the user-defined cg ram character set.
Figure 3.2 shows the default address for the 32 character locations on the display.
The upper line of characters is stored between addresses 0x00 and 0x0f. The second line
of characters is stored between addresses 0x40 and 0x4f.
FIGURE 3.5: DD RAM HEXADECIMAL ADDRESSES (NO DISPLAY SHIFTING)
Physically, there are 80 total character locations in dd ram with 40 characters

available per line. Locations 0x10 through 0x27 and 0x50 through 0x67 can be used to
store other non-display data. Alternatively, these locations can also store characters that
can only displayed using controller‘s display shifting functions.
The set DD RAM address command initializes the address counter before reading
or writing to DD RAM. Write DD RAM data using the write data to cg ram or DD RAM
command, and read DD RAM using the read data from cg ram or DD RAM command.
The DD RAM address counter either remains constant after read or writes
operations, or auto-increments or auto-decrements by one location, as defined by the i/d
set by the entry mode set command.
3.5.3 CG ROM:
The character generator ROM (CG ROM) contains the font bitmap for each of the
predefined characters that the LCD screen can display, shown in figure 3.3. The character
code stored in DD RAM for each character location subsequently references a position
with the cg rom. For example, a hexadecimal character code of 0x53 stored in a DD RAM
location displays the character‗s‘. The upper nibble of 0x53 Equates To
58
DB[7:4]=‖0101‖binary and the lower nibble equates to DB[3:0] = ―0011‖ binary. As shown
in figure 3.3, the character ‗s‘ appears on the screen.english/roman characters are stored
in CG ROM at their equivalent ASCII code address.
The character ROM contains the ASCII English character set and Japanese kana
characters. The controller also provides for eight custom character bitmaps, stored in CG
RAM. These eight custom characters are displayed by storing character codes 0x00
through 0x07 in add ram location.
FIGURE 3.5.1: LCD CHARACTER SET
3.5.4 CG RAM :
The character generator ram (CG RAM) provides space to create eight custom
character bitmaps. Each custom character location consists of a 5-dot by 8-line bitmap, as
shown in figure 3.4.
The set cg ram address command initializes the address counter before reading or
writing to CG RAM. Write CG RAM data using the write data to cg ram or DD RAM
command, and read cg ram using the read data from cg ram or dd ram command.
The CG RAM address counter can either remain constant after read or write
operations, or auto-increments or auto-decrements by one location, as defined by the i/d
59
set by the entry mode set command.
Figure 3.4 provides an example, creating a special checkerboard character. The

custom character is stored in the fourth CG RAM character location, which is displayed
when a DD RAM location is 0x03. To write the custom character, the cg ram address is
first initialized using the set cg ram address command. The upper three address bits point
to the custom character location.
The lower three address bits point to the row address for the character bitmap. The
write data to cg ram or DD RAM command is used to write each character bitmap row. A
‗1‘ lights a bit on the display. A ‗0‘ leaves the bit unlit. Only the lower five data bits are
used; the upper three data bits are don’t care positions. The eighth row of bitmap data is
usually left as all zeros to accommodate the cursor.
Figure 3.4: Example Custom Checkerboard Character With Character Code 0x03
3.6 COMMAND SET :
Table 3.5 Summarizes The Available LCD Controller Commands And Bit
Definitions. Because The Display Is Set Up For 4-Bit Operation, Each 8-Bit Command Is
Sent As Two 4-Bit Nibbles. The Upper Nibble Is Transferred First, Followed By The Lower
Nibble.
TABLE 3.5: LCD CHARACTER DISPLAY COMMAND SET
60
3.6.1 DISABLED :
If The LCD_E Enable Signal Is Low, All Other Inputs To The LCD Are Ignored .
3.6.2 CLEAR DISPLAY :
Clear the display and return the cursor to the home position, the top-left corner.
This command writes a blank space (ASCII/ANSI character code 0x20) into all DD RAM
addresses. The address counter is reset to 0, location 0x00 in DD RAM. Clears all option
settings. The i/d control bit is set to 1 (increment address counter mode) in the entry mode
set command. Execution time: 82µs – 1.64 ms.
3.6.3 RETURN CURSOR HOME :
Return the cursor to the home position, the top-left corner. DD RAM contents are
unaffected. Also returns the display being shifted to the original position, shown in figure
3.2.the address counter is reset to 0, location 0x00 in DD RAM. The display is returned to
its original status if it was shifted. The cursor or blink move to the top-left character
61
location.execution time: 40µs – 1.6 ms
3.7 ENTRY MODE SET :
Sets the cursor move direction and specifies whether or not to shift the display . . . .
these operations are performed during data reads and writes. Execution time: 40µs
.
3.7.1 Bit DB1: (I/D) Increment/Decrement
0 Auto – Decrement Address Counter. Cursor/Blink Moves To Left
1 Auto Increment Address Counter . Cursor/Blink Moves To Right
This bit either auto-increments or auto-decrements the DD RAM and CG RAM

address counter by one location after each write data to cg ram or DD RAM or read data
from CG RAM or DD RAM command. The cursor or blink position moves accordingly.
3.7.2 Bit DB0: (S) Shift
0 Shifting Disabled
During A DD RAM Write Operation,Shift The Entire Display Value In The

1 Direction Controlled By Bit DBI (I/D). Appears As Though The Cursor
Position Remains Constant And The Display Moves.
3.8 DISPLAY ON/OFF
Display is turned on or off, controlling all characters, cursor and cursor position
character (underscore) blink. Execution time: 40µs.
3.8.1 Bit DB2: (D) Display On/Off
0 No Characters Displayed.However,Data Stored In DDRAM Is Retained

1 Display characters stored in DDRAM
3.8.2 Bit DB1: (C) Cursor On/Off :
The cursor uses the five dots on the bottom line of the character. The cursor
appears as a line under the displayed character.
62
0 No Cursor
1 Display Cursor
3.8.3 Bit DB0: (B) Cursor Blink On/Off :
0 No Cursor Blinking
1 Cursor Blinks On And Off Approximately Every Half Second
3.9 CURSOR AND DISPLAY SHIFT
Moves the cursor and shifts the display without changing DD RAM contents. Shift
cursor position or display to the right or left without writing or reading display data.
This function positions the cursor in order to modify an individual character, or to
scroll the display window left or right to reveal additional data stored in the DD RAM,
beyond the 16th character on a line. The cursor automatically moves to the second line
when it shifts beyond the 40th character location of the first line. The first and second line
displays shift at the same time. When the displayed data is shifted repeatedly, both lines
move horizontally. The second display line does not shift into the first display line.
Execution time: 40µs .
Db3(S/C) Db2(R/L) Operation

SHIFT THE CURSOR POSITION ON THE Left . The Address
0 0 Counterr Is Decrement By One
Shift The Cursor Position To The Right . The Address Counter
0 1 Is Increment By One .
Shift The Entire Display To The Left . The Cursor Follows The
1 0 Display Shift . The Address Counter Is Unchanged .
Shift The Entire Display To The Right . The Cursor Follows The
1 1 Display Shift . The Address Counter Is Unchanged .
3.9.1 FUNCTIONAL SET :
Sets interface data length, number of display lines, and character font. The starter
kit board supports a single function set with value 0x28. Execution time: 40µs
63
3.9.2 SET CG RAM ADDRESS :
Set the initial cg ram address. After this command, all subsequent read or write
operations to the display are to or from cg ram. Execution time: 40=µs
3.9.3 SET DD RAM ADDRESS :
Set the initial DD RAM address. After this command, all subsequent read or write
operations to the display are to or from dd ram. The addresses for displayed characters
appear in figure 3.3.execution time: 40µs
3.9.4 READ BUSY FLAG AND ADDRESS
Read the busy flag (bf) to determine if an internal operation is in progress, and read
the current address counter contents.
Bf = 1 indicates that an internal operation is in progress. The next instruction is not

accepted until bf is cleared or until the current instruction is allowed the maximum time to
execute.
This command also returns the present value of address counter. The address
counter is used for both cg ram and dd ram addresses. The specific context depends on
the most recent set cg ram address or set dd ram address command issued.
Execution time: 1µs
3.9.5 WRITE DATA TO CG RAM OR DD RAM :
Write data into DD RAM if the command follows a previous set DD RAM address
command, or write data into cg ram if the command follows a previous set cg ram address
command.
After the write operation, the address is automatically incremented or

decremented by 1 according to the entry mode set command. The entry mode also
determines display shift.
Execution time: 40µs.
3.9.6 READ DATA FROM CG RAM OR DD RAM:
Read data from DD RAM if the command follows a previous set DD RAM address
64
command, or read data from cg ram if the command follows a previous set cg ram
address command. After the read operation, the address is automatically incremented or
decremented by 1 according to the entry mode set command. However, a display shift is
not executed during read operations. Execution time: 40µs .
3.10 OPERATION:
3.10.1 FOUR BIT DATA INTERRFACE :
The board uses a 4-bit data interface to the character LCD.
Figures 5-6 illustrates a write operation to the LCD, showing the minimum times
allowed for setup, hold, and enable pulse length relative to the 50 MHZ clock (20 ns
period) provided on the board.
FIGURE 3.6 : CHARACTER LCD INTERFACE TIMING
65
The data values on SF_D<11:8>, and the register select (LCD_RS) and the read/write
(LCD_RW) control signals must be set up and stable at least 40 ns before the enable
LCD_E goes high. The enable signal must remain high for 230 ns or longer—the
equivalent of 12 or more clock cycles at 50 mhz.
In many applications, the LCD_RW signal can be tied low permanently because the
FPGA .generally has no reason to read information from the display.
3.10.2 TRANSFERRING 8-BIT DATA OVER THE 4-BIT INTERFACE :
After initializing the display and establishing communication, all commands and
data transfers to the character display are via 8 bits, transferred using two sequential 4-bit
operations. Each 8-bit transfer must be decomposed into two 4-bit transfers, spaced apart
by at least 1 µs, as shown in figure 5-6. The upper nibble is transferred first, followed by
the lower nibble. An 8-bit write operation must be spaced least 40 µs before the next
communication. This delay must be increased to 1.64 ms following a clear display
command.
3.10.3 INITIALIZING THE DISPLAY
After power-on, the display must be initialized to establish the required

communication protocol. The initialization sequence is simple and ideally suited to the
highly-efficient 8-bit picoblaze embedded controller. After initialization, the picoblaze
controller is available for more complex control or computation beyond simply driving the
display.
3.10.4 POWER-ON INITIALIZATION:
The initialization sequence first establishes that the FPGA application wishes to use
the four-bit data interface to the LCD as follows
 Wait 15 ms Or Longer, Although The Display Is Generally Ready When The FPGA
Finishes
 Configuration. The 15 ms Interval Is 750,000 Clock Cycles At 50 MHz.
 Write SF_D<11:8> = 0x3, Pulse LCD_E High For 12 Clock Cycles.
 Wait 4.1 ms Or Longer, Which Is 205,000 Clock Cycles At 50 MHz.
 Wait 100 µs Or Longer, Which Is 5,000 Clock Cycles At 50 MHz.
66
3.10.5 DISPLAY CONFIGURATION :
After the power-on initialization is completed, the four-bit interface is now

established. The next part of the sequence configures the display
 Issue A Function Set Command, 0x28, To Configure The Display For Operation On
The Spartan-3E Starter Kit Board.
 Issue An Entry Mode Set Command, 0x06, To Set The Display To Automatically
Increment The Address Pointer.
 Issue A Display On/Off Command, 0x0c, To Turn The Display On And Disables The
Cursor And Blinking. Finally, Issue A Clear Display Command.
 Allow At Least 1.64 Ms (82,000 Clock Cycles) After Issuing This Command.
3.10.6 WRITING DATA TO THE DISPLAY :
To write data to the display, specify the start address, followed by one or more data
values. Before writing any data, issue a set DD ram address command to specify the initial
7-bit address in the DD RAM. See figure 3.3 for DD RAM locations.
Write data to the display using a write data to CG ram or DD ram command. The 8-
bit data value represents the look-up address into the CG ROM or CG RAM, shown in
figure 3.4. The stored bitmap in the CG ROM or cg ram drives the 5 x 8 dot matrix to
represent the associated character.
If the address counter is configured to auto-increment, as described earlier, the

application can sequentially write multiple character codes and each character is
automatically stored and displayed in the next available location.
Continuing to write characters, however, eventually falls off the end of the first
display line. The additional characters do not automatically appear on the second line
because the DD ram map is not consecutive from the first line to the second .
3.10.7 DISABLING THE UNUSED LCD :
If the FPGA application does not use the character LCD screen, drive the LCD_E
pin low to disable it. Also drive the LCD_RW pin low to prevent the LCD screen from
presenting data.
67
CHAPTER 4
RESULTS AND DISCUSIONS
4.1 RTL SCHEMATIC FOR ENCRYPTION:
4.2 SIMULATION WAVE FORM FOR AES ENCRYPTION:
68
4.4 SYNTHESIS REPORT SCMATIC FOR AES ENCRYPTION
====================================================================
* Final Report *
====================================================================
Final Results
RTL Top Level Output File Name : encryption.ngr
Top Level Output File Name : encryption
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : NO
Design Statistics
# IOs : 258
Cell Usage :
# BELS : 47282
# GND :1
# INV : 182
# LUT1 : 164
# LUT2 : 1776
# LUT2_L : 111
# LUT3 : 231
# LUT3_D :8
# LUT3_L : 609
# LUT4 : 22881
# LUT4_D : 14
# LUT4_L : 273
# MUXF5 : 11338
# MUXF6 : 5698
# MUXF7 : 2839
# MUXF8 : 1156
# VCC :1
# FlipFlops/Latches : 2784
# FD : 2626
# FDE : 129
# FDR : 26
# FDS :3
# Shift Registers : 11
# SRL16 : 11
# Clock Buffers :1
# BUFGP :1
# IO Buffers : 256
# OBUF : 256
====================================================================
69
Device utilization summary:
---------------------------
Selected Device : 3s500efg320-5
Number of Slices: 14517 out of 4656 311% (*)

Number of Slice Flip Flops: 2784 out of 9312 29%
Number of 4 input LUTs: 26260 out of 9312 282% (*)
Number used as logic: 26249
Number used as Shift registers: 11
Number of IOs: 258
Number of bonded IOBs: 257 out of 232 110% (*)
Number of GCLKs: 1 out of 24 4%
WARNING:Xst:1336 - (*) More than 100% of Device resources are used
---------------------------
Partition Resource Summary:
---------------------------
No Partitions were found in this design.
---------------------------
====================================================================
TIMING REPORT
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
GENERATED AFTER PLACE-and-ROUTE.
Clock Information:
------------------
-----------------------------------+------------------------+-------+
Clock Signal | Clock buffer(FF name) | Load |
-----------------------------------+------------------------+-------+
clk | BUFGP | 2795 |
-----------------------------------+------------------------+-------+
Asynchronous Control Signals Information:

----------------------------------------
No asynchronous control signals found in this design
70
Timing Summary:
---------------
Speed Grade: -5
Minimum period: 6.654ns (Maximum Frequency: 150.280MHz)

Minimum input arrival time before clock: No path found
Maximum output required time after clock: 4.040ns
Maximum combinational path delay: No path found
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
====================================================================
Timing constraint: Default period analysis for Clock 'clk'
Clock period: 6.654ns (frequency: 150.280MHz)
Total number of paths / destination ports: 591072 / 2926
-------------------------------------------------------------------------
Delay: 6.654ns (Levels of Logic = 7)
Source: a35/x05_15 (FF)
Destination: a39/x05_109 (FF)
Source Clock: clk rising
Destination Clock: clk rising
Data Path: a35/x05_15 to a39/x05_109

Gate Net
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
FD:C->Q 11 0.514 0.796 a35/x05_15 (a35/x05_15)
LUT4:I3->O 3 0.612 0.481 a36/h1<12>_SW0 (N1659)
LUT4:I2->O 125 0.612 1.128 Mxor_data9_Result<12>1 (data9<12>)
LUT4:I2->O 1 0.612 0.000 a38/s2/Mrom_sbout1 (a38/s2/Mrom_sbout)
MUXF5:I1->O 1 0.278 0.000 a38/s2/Mrom_sbout_f5 (a38/s2/Mrom_sbout_f5)
MUXF8:I1->O 1 0.451 0.000 a38/s2/Mrom_sbout_f8 (sb9<8>)
FD:D 0.268 a39/x05_104
----------------------------------------
Total 6.654ns (4.249ns logic, 2.405ns route)
(63.9% logic, 36.1% route)
====================================================================
Timing constraint: Default OFFSET OUT AFTER for Clock 'clk'
-------------------------------------------------------------------------
71
Offset: 4.040ns (Levels of Logic = 1)
Source: eout_127 (FF)
Destination: eout<127> (PAD)
Data Path: eout_127 to eout<127>

Gate Net
---------------------------------------- ------------
FDE:C->Q 1 0.514 0.357 eout_127 (eout_127)
OBUF:I->O 3.169 eout_127_OBUF (eout<127>)
----------------------------------------
(91.2% logic, 8.8% route)
====================================================================
Total REAL time to Xst completion: 359.00 secs

Total CPU time to Xst completion: 359.06 secs
-->
Total memory usage is 302972 kilobytes
Number of errors : 0 ( 0 filtered)

Number of warnings : 376 ( 0 filtered)
Number of infos : 34 ( 0 filtered)
72
4.5 RTL SCHEMATIC FOR AES DECRYPTION
4.7 SIMULATION WAVE FORM FOR AES DECRYPTION:
73
4.8 SYNTHESIS REPORT FOR AES DECRYPTION:
====================================================================
* Final Report *
====================================================================
Final Results
RTL Top Level Output File Name : decryption.ngr
Top Level Output File Name : decryption
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : NO
Design Statistics
# IOs : 258
Cell Usage :
# BELS : 56417
# GND :1
# INV : 27
# LUT1 : 143
# LUT2 : 1461
# LUT2_D : 788
# LUT2_L : 101
# LUT3 : 506
# LUT3_D :5
# LUT3_L : 13
# LUT4 : 28454
# LUT4_D : 604
# LUT4_L : 967
# MUXF5 : 12541
# MUXF6 : 6272
# MUXF7 : 3136
# MUXF8 : 1397
# VCC :1
# FlipFlops/Latches : 3118
# FD : 2815
# FDE : 129
# FDR : 99
# FDS :3
# LDCP : 72
# Shift Registers : 11
# SRL16 : 11
# Clock Buffers :1
# BUFGP :1
# IO Buffers : 256
74
# OBUF : 256
====================================================================
Device utilization summary:

---------------------------
Selected Device : 3s100evq100-5
Number of Slices: 16884 out of 960 1758% (*)

Number of Slice Flip Flops: 3118 out of 1920 162% (*)
Number of 4 input LUTs: 33080 out of 1920 1722% (*)
Number used as logic: 33069
Number used as Shift registers: 11
Number of IOs: 258
Number of bonded IOBs: 257 out of 66 389% (*)
Number of GCLKs: 1 out of 24 4%
WARNING:Xst:1336 - (*) More than 100% of Device resources are used
---------------------------
Partition Resource Summary:
---------------------------
No Partitions were found in this design.
---------------------------
====================================================================
TIMING REPORT
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
GENERATED AFTER PLACE-and-ROUTE.
Clock Information:
------------------
---------------------------------------------+------------------------+-------+
Clock Signal | Clock buffer(FF name) | Load |
---------------------------------------------+------------------------+-------+
clk | BUFGP | 3057 |
d37/x40_8_cmp_lt0000(d37/x40_8_cmp_lt00001:O)| NONE(*)(d37/x40_8) |8 |
75
d25/x40_8_cmp_lt0000(d25/x40_8_cmp_lt00001:O)| NONE(*)(d25/x40_8) | 8 |
---------------------------------------------+------------------------+-------+
(*) These 9 clock signal(s) are generated by combinatorial logic,
and XST is not able to identify which are the primary clock signals.
Please use the CLOCK_SIGNAL constraint to specify the clock signal(s) generated by
combinatorial logic.
INFO:Xst:2169 - HDL ADVISOR - Some clock signals were not automatically buffered by XST
with BUFG/BUFR resources. Please use the buffer_type constraint in order to insert these
buffers to the clock signals to help prevent skew problems.
Timing Summary:
---------------
Speed Grade: -5
Minimum period: 9.510ns (Maximum Frequency: 105.149MHz)

Minimum input arrival time before clock: No path found
Maximum output required time after clock: 4.040ns
Maximum combinational path delay: No path found
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
====================================================================
Timing constraint: Default period analysis for Clock 'clk'
Clock period: 9.510ns (frequency: 105.149MHz)
-------------------------------------------------------------------------
Delay: 9.510ns (Levels of Logic = 9)
Source: d35/x09_66 (FF)
Destination: d38/x09_19 (FF)
Destination Clock: clk rising
Data Path: d35/x09_66 to d38/x09_19

Gate Net
---------------------------------------- ------------
FD:C->Q 127 0.514 1.250 d35/x09_66 (d35/x09_66)
76
LUT1:I0->O 1 0.612 0.000 d36/is9/Mrom_sibout14_f5_4_rt
(d36/is9/Mrom_sibout14_f5_4_rt)
MUXF5:I1->O 1 0.278 0.000 d36/is9/Mrom_sibout14_f5_4
(d36/is9/Mrom_sibout14_f55)
MUXF8:I0->O 23 0.451 1.091 d36/is9/Mrom_sibout14_f8 (isb9<71>)
LUT2:I1->O 16 0.612 0.882 Mxor_round9_Result<71>1 (round9<71>)
LUT4_D:I3->O 3 0.612 0.454 d37/x40_15_mux00011 (d37/x40_15_mux0001)
LUT4_D:I3->O 1 0.612 0.360 d37/x40_19_mux00001 (d37/x40_19_mux0000)
LUT4:I3->O 1 0.612 0.000 d37/Mxor_rout<83>_xo<0>39 (imout9<83>)
FD:D 0.268 d38/x09_19
----------------------------------------
(57.5% logic, 42.5% route)
====================================================================
Timing constraint: Default OFFSET OUT AFTER for Clock 'clk'
-------------------------------------------------------------------------
Offset: 4.040ns (Levels of Logic = 1)
Source: deout_127 (FF)
Destination: deout<127> (PAD)
Data Path: deout_127 to deout<127>

Gate Net
---------------------------------------- ------------
FDE:C->Q 1 0.514 0.357 deout_127 (deout_127)
OBUF:I->O 3.169 deout_127_OBUF (deout<127>)
----------------------------------------
(91.2% logic, 8.8% route)
====================================================================
Total REAL time to Xst completion: 302.00 secs

Total CPU time to Xst completion: 301.67 secs
Total memory usage is 339772 kilobytes
Number of errors : 0 ( 0 filtered)
Number of warnings : 126 ( 0 filtered)
Number of infos : 32 ( 0 filtered)
77
CHAPTER 5
SUMMARY
5.1 PROJECT SUMMARY:
The Advanced Encryption Standard(AES) is a securiety standard that became

effective on May 26,2002 by NIST to replace DES.the cryptography scheame is
asymmetric block cipher that encrypts and decrypts 128-bit blocks of datd.lengths of
128,192 and 256 bits are standard key lengths used by AES.
Plain text refers to the data to be encrypted. Cipher text refers to the data after
going through the cipher as well as the data that will be going into the decipher. The
state is an intermediate form of the cipher or deciphers result usually displayed as a
rectangular table of bytes with 4 rows and 4 columns
The first stage ―subbytes‖ transformation a non-linear byes substitution for each
byte of the block. The second stage‖shiftrows‖ transformation cyclically shifts(permutes)
the bytes within the block. The third stage‖mixcolumns‖ transformation groups 4-bytes
together forming 4-term polynomials and multiplies the polynomials with a fixed
polynomial mod(X4+1).The fourth stage‖add roundkey‖transformation adds the round key
with the block of data. The decipher is simply the inverse of cipher
The algorithm consists of four stages that make up a round which is iterated 10
times for a 128-bit length key, 12 times for 192-bit key and 14 times for a 256-bit key.
78
5.2 CONCLUSION:
The main advantage with the Advanced Encryption Standard is to maintain the
secret communication between the Encryption and Decryption. It is the symmetric key
encryption algorithm. This reduces the complexity of the Encrypt and Decrypt the data.
Cipher key is same for both the Encryption and Decryption process
VHDL code is used to develop the implementation of Encryption and Decryption

process. Each program is tested with the some of the sample vectors provided by NIST
and output results are perfect with minimal delay. In the case of 192,256-bit key
algorithm, it requires 192,256-bit plain text and 128-bit cipher key.
AES is important to understand the using the algorithm, it will greatly increase the
reliability and safety of software systems. Therefore, AES can indeed be implemented
with reasonable efficiency on an FPGA, with the encryption and decryption taking an
average of 320 and 340 ns respectively (for every 128 bits). The time varies from chip to
chip and the calculated delay time can only be regarded as approximate. Adding data
pipelines and some parallel combinational logic in the key scheduler and round
calculator can further optimize this design.
There is currently no evidence that AES has any weakness making any attack
other than exhaustive search. Even AES-128 bit offers a sufficiently large number of
possible keys, making an exhaustive search impractical for many decades, provided no
technological breakthrough causes the computational power available to increase
dramatically and that theoretical research does not find a short cut to bypass the need
for exhaustive search. There are many pitfalls to avoid when encryption is implemented
and keys are generated.
It is necessary to ensure each and every implementations security, an important

correctly implemented AES-128 is likely to protect against a million dollar budget for at
least 50-60 years and against individual budgets for at least another ten years.
5.3 SCOPE OF EXPANSION :
 This algorithm is also implemented with the 192,256-bit keys.

 By using with this design we are also implemented as a crypto-processor for secret
communication.
 This algorithm is also used to implement as crypto processor for smartcards.
79
APPENDEX
REFERENCES:
The following diagram shows the values in the state array as the encryption
progresses for a block length and a key length of 16 bytes each(i.e Nb =4 and Nk =4) [1]
Input = 32 43 f6 a8 88 5a 30 80 31 31 98 a2 e0 37 07 34
Cipher key= 2b 7e 15 16 28 ae d2 a6 ab f7 15 88 09 cf 4f 3c
80
81
[1] FIPS 197, ―Advanced Encryption Standard (AES)”, November 26, 2001
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
[2] J. Daemen and V. Rijmen, ―AES Proposal: Rijndael‖, AES Algorithm Submission,
September 3, 1999
http://www.esat.kuleuven.ac.be/~rijmen/rijndael/rijndaeldocV2.zip
[3] ―FPGA Simulations of Round 2 Advanced Encryption Standards‖

http://csrc.nist.gov/CryptoToolkit/aes/round2/conf3/presentations/elbirt.pdf.
[4] http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
[5] Tilborg, Henk C. A. van. ―Fundamentals of Cryptology: A Professional Reference

and Interactive Tutorial‖, New York Kluwer Academic Publishers, 2002
[6] Peter J. Ashenden, ―The designers‘s Guide to VHDL‖, 2nd Edition, San Francisco,
CA, Morgan Kaufmann, 2002
References:understanding mix columns
[7 Wikipedia – Rijndael mix columns, [Online]

Available: http://en.wikipedia.org/wiki/Rijndael_mix_columns
[8] William Stalling (2006), Chapter 4.6 Finite Fields of the Form GF(2n) – Multiplication,
in Cryptography and Network Security: Principles and Practices, Page 125 – 126.
References:understanding inverse mix columns
[9] William Stalling (2006), Chapter 4.6 Finite Fields of the Form GF(2n) – Multiplication,
in Cryptography and Network Security: Principles and Practices, Page 125 – 126.
[10] Kit Choy Xintong (2009), Understanding AES Mix-Columns Transformation

Calculation
[Available] Online: http://sites.google.com/site/kitworldoftheory/Home/mixcolumns.pdf?
attredirects=0
FPGA REFERENCES:
 Initial Design For Spartan-3e Starter Kit (Reference Design)

http://www.xilinx.com/s3estarter
 Powertip Pc1602-D Character Lcd (Basic Electrical And Mechanical Data)
http://www.powertipusa.com/pdf/pc1602d.pdf
 Sitronix St7066u Character Lcd Controller
http://www.sitronix.com.tw/sitronix/product.nsf/doc/st7066u?opendocument
82
 Detailed Data Sheet On Powertip Character Lcd
http://www.rapidelectronics.co.uk/images/siteimg/57-0910e.pdf
 Samsung S6a0069x Character Lcd Controller
http://www.samsung.com/products/semiconductor/displaydriveric/mobileddi/bwstn
/s6a0069x/s6a0069x.htm
83

Aes

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Aes

Hochgeladen von

Copyright:

Verfügbare Formate

CHAPTER 1

1.1 EVOLUTION OF ENCRYPTION:

In 1975,IBM submits a proposal to develop a secure standard for businesses like

In 1991, The first case of widespread regulation by the government on encryption

1.2 ADVANCED ENCRYPTION STANDARD-RIJNDAEL CIPHER

Round 128 bit key input

Clk2 Key Schedule

AES is an algorithm for performing encryption (and reverse, decryption).the

Encryption is the transformation of plain text into cipher text through a

plain text cipher key

FIGURE 1.2 BLOCK DIAGRAM OF ENCRYPTION PROCESS

Decryption is a process to convert cipher text back into plain text

Cipher text plain text

FIGURE 1.3 BLOCK DIAGRAM OF DECRYPTION PROCESS

1.5 APPLICATION OF CRYPOTGRAPHY:

 Cryptography helped ensure secrecy in important communications, such as

2.1 BACKGROUND OF THE ALGORITHM:

Many algorithms were originally presented by researchers from twelve different

2.1.1 AES OVERVIEW:

1997-2001: With des becoming outdated NIST announces competition to

2001:Rijndeal, designed by Joan Daemen and Vincent Rijmen , is selected by

Based in finite mathematic, widely analysed and considered secure,

Used for US government top secret data,

Supports 128, 196, 256 bit keys,

Unpatented, Expected to be the standard for 20+years

2.2 TYPES OF CYPHERS:

There are two classes of algorithm in encryption, an asymmetric key and

Algorithm is a process for completing a task. An encryption algorithm is a

2.2.2 SYMMETRIC ENCRYPTION:

Symmetric encryption uses a secret key value to encrypt and decrypt

In a symmetric or private key algorithm, in the ordinary case, the

The symmetric algorithm is much faster than a asymmetric key algorithm,

Private key kc Private key kc

Figure 2.1 Private Key Cryptography

2.2.4 ASYMMETRIC ENCRYPTION:

Examples of asymmetric encryption are RSA, ELGAMAL.

2.2.5 ASYMMETRIC KEY FOR PUBLIC KEY:

Private key ke Private key kd

Figure:2.2 Public Key Cryptography.

2.2.6 CIPHER TEXT:

This is the encrypted message produced by applying the algorithm to the

Block chipper is a type of the symmetric-key encryption algorithm that transforms

FIGURE 2.3 BLOCK CIPHER

Block length is limited to 128 bit

Key 4/16/192 6/24/192 8/32/256

2.4 NOTATION AND CONVENTIONS:

2.4.1. INPUTS AND OUTPUTS:

b7 x7+ b6 x6+ b5 x5+ b4 x4+ b3 x3+ b2 x2+ b1 x1+ b0x0=Σbi xi

2.4.3. ARRAYS OF BYTES:

2.4.4. THE STATE:

Internally, the AES algorithm‘s operations are performed on a two dimensional

Input state array output byte

S[r, c] = in[r + 4c]

out[r + 4c] = S[r,c]

where 0 ≤ r ≤ 3 and 0 ≤ c ≤ Nb−1.Input Bytes State Array Output Bytes

2.4.5. THE STATE AS AN ARRAY OF COLUMNS:

w0 = S0,0 S1,0 S2,0 S3,0,

The addition of two elements in a finite field is achieved by ―adding‖ the

Consequently, subtraction of polynomials is identical to addition of polynomials.

(x6 + x4 + x2 + x +1) + (x7 + x +1) = x7 + x6 + x4 + x2 (Polynomial notation)

{01010111}⊕ {10000011} = {11010100} (Binary notation)

{57}⊕{83} = {d4} (Hexadecimal notation)

In the polynomial representation, multiplication in Galois Field GF (2 8) (denoted