Sie sind auf Seite 1von 102

CHAPTER -1

INTRODUCTION
1.1 Introduction
In this research work, we propose the design and implementation of a real-time FPGA
based application, which demonstrates the creation of real-time process tasks in FPGA
systems for successful real-time communication between multiple FPGA systems. We have
chosen the RSA based encryption and decryption algorithm for this implementation, as
security is one of the most important need for data communication. The recent development
of Field-Programmable Gate Array (FPGA) architectures, with soft core (Micro Blaze) and
hard core (PowerPC) processors, embedded memories and IP cores, offers the potential for
high computing power. Presently FPGAs are considered as a major platform for high
performance embedded applications as it provides the opportunity for reconfiguration as well
as good clock speed and design resources.
As the complexities in the embedded applications increase, use of an operating system
brings in a lot of advantages. In present day application scenarios most embedded systems
have real-time requirements that demand the use of Real-time operating systems (RTOS),
which creates a suitable environment for real time applications to be designed and expanded
easily. In an RTOS the design process is simplified by splitting the application code into
separate tasks and then the scheduler executes them according to a specific schedule, meeting
the real-time deadline. In this research work, we propose the design and implementation of a
real-time FPGA based application, which demonstrates the creation of real-time process tasks
in FPGA systems for successful real-time communication between multiple FPGA systems.
We have chosen the RSA based encryption and decryption algorithm for this implementation,
as security is one of the most important need for data communication. At first we demonstrate
the real time execution of multiple process tasks in a single FPGA system for the encryption
and decryption of data. Next we describe the most challenging part of our work, where we
establish the real time communication between two FPGA systems, each running the
encryption engine and decryption engine respectively and communicating with one another
via an RS232 communication link. The results show that our design is better in terms of
execution speed in comparison with the existing research works.
At first we demonstrate the real time execution of multiple process tasks in a single FPGA
system for the encryption and decryption of data. Next we describe the most challenging part
of our work, where we establish the real time communication between two FPGA systems,

1
each running the encryption engine and decryption engine respectively and communicating
with one another via an RS232 communication link. The results show that our design is better
in terms of execution speed in comparison with the existing research works. It achieves the
real time secured information between the systems implemented in multiple FPGA’s by using
RTOS (Real Time Operating System). This information sharing is based on RSA algorithm
(encryption and decryption). Very large Scale Integrations in The Recent trends of design.
Network Security in the Techniques of Very large Scale Integrations Plays Very Vital Role.
FPGA, logic circuits, operating systems (computers), Micro Blaze FPGA architectures,
embedded memory, multiple FPGA systems and soft core processors.
It Design of The Present System using microcontroller with RTOS. So the system
operation speed will be less when compared to the FPGA. Then the information sending
between the systems is not secured. The proposed technology has been implemented over
here is based on RSA algorithm (encryption and decryption). This process is communicated
between multiple FPGA’s in multitasking using RTOS (real time communication system)
with high execution speed compared to the existing system.
To demonstrate a 128-bit Advanced Encryption Standard (AES) both symmetric key
encryption and decryption algorithm by developing suitable hardware and software design on
Xilinx Spartan- 3EDK (XC3S200) device, the implementation has been tested successfully
The system is optimized in terms of execution speed and hardware utilization.
It design using application is Security purposes, Medical field. Network Security, online
bank security. It develop similar approaches for the implementation of AES, we can
implement double AES for more security and will less encryption speed .
In today’s world most of the communication is done using electronic media. Data Security
plays a vital role in such communication. Hence, there is a need to protect data from
malicious attacks. Cryptography is the science of secret codes, enabling the confidentiality of
communication through an insecure channel. It protects against unauthorized parties by
preventing unauthorized alteration of use. Generally speaking, it uses a cryptographic system
to transform a plaintext into a cipher text, using most of the time a key.
1.2 Aim of the project
The aim of this project is to communicate the data secretly using AES Algorithm i.e we
first send the data(plain text) which is of 128 bits and the key which can be of of 128 or 192
or 256 bits into the encryption process .The output of this process will be cipher text.This
cipher text is then fed into the decryption process and then the data(plain text) is got as

2
output, since we add the key and shuffle the data it is very hard for the unknown person to
find out the original data. Since for each key there will be a change in the cipher text and so
the person has to know the key in order to find out the original data.
This project is to give the security for the Data. The Data Which is transmitted and
received by the sender and receiver. When the data encrypted and decrypted at that instant the
hacker may hack the data. To avoid these types of the problems we use some security
Algorithms like AES, SHA-0, SHA-1, SHA-2, and RSA.
To achieve this aim the following task are carried out.
 To achieve the real time secured information between the systems implemented in
multiple FPGA’s by using RTOS (Real Time Operating System).
 This information sharing is based on RSA algorithm (encryption and decryption).
 The Object is to develop Low Power, security, less time and Expensive.
 System C and Synthesis results are generated for observing in the Xilinx platform
studio.
 Each Block of the Design is written in System C code for each module.
 Dump the code on a FPGA kit in order to see the output. But the Accepts only 8 bits
we can able to see the Red LED’s, which are only in active High mode.
1.3 Motivation of the project
 Message authentication codes (MAC’s) are much like cryptographic hash functions,
except that a secret key is used to authenticate the hash value.
 The Keys are Public Key and Private Key. Applications of cryptography include
ATM Cards, Computer Applications and Electronic commerce.
 Cryptography is the study of hiding information. Modern Cryptography intersects the
disciplines of mathematics, computer science and engineering.
 Securities often require that data be kept safe from unauthorized access. And the best
line of defense is physical security (placing the machine to be protected behind
physical walls).
 However, physical security is not always an option(due to cost and/ or efficiency
considerations).
 Instead, most computers are interconnected with each other openly, there by exposing
them and the communication channels that they use.

3
1.4 Literature Survey
For real‐time applications, there are several factors (time, cost, power) that are moving
security considerations from a function centric perspective into a system architecture
(hardware/software) design issue. Advanced Encryption Standard (AES) is used nowadays
extensively in many network and multimedia applications to address security issues. The
AES algorithm specifies three key sizes: 128, 192 and 256 bits offering different levels of
security. To deal with the amount of application and intensive computation given by security
mechanisms, we define and develop a QoSS (Quality of Security Service) model for
reconfigurable AES processor. QoSS has been designed and implemented to achieve a
flexible trade‐off between overheads caused by security services and system performance.
The proposed architecture can provide up to 12 AES block cipher schemes within a
reasonable hardware cost. We envisage a security vector in a fully functional QoSS request to
include levels of service for the range of security service and mechanisms. Our unified
hardware can run both the original AES algorithm and the extended AES algorithm (QoSS‐
AES). A novel on‐the‐fly AES encryption/ decryption design is also proposed for 128‐, 192‐,
and 256‐bit keys.
1.5. Design and Implementation
Hardware implementation mainly deals with implementation of AES algorithm on a
single-chip FPGA using pipelined approach, area throughput trade of f or an ASIC
implementation in a 0:18um CMOS technology, crypto-memory and SRAM architecture,
high speed non-pipelined FPGA, a fully sub-pipelined encryptor to achieve a throughput of
21.56 Gbps on Xilinx device, a proto type chip implemented using 0:35_m CMOS
technology. Software implementation deals with fast implementation of algorithm in smart
cards, PDA secure communication with Java on optimum construction of composite fields for
the AES, evaluation of different implementations for high end servers ,implementation
approaches for AES algorithm in C, C++ and MATLAB , security protocol for automobile
remote key less system.
The algorithm is composed of three main parts: Cipher, Inverse Cipher and Key
Expansion. Cipher converts data, commonly known as plaintext, to an unintelligible form
called cipher. Key Expansion generates a key schedule that is used in the Cipher and the
Inverse Cipher procedure. Cipher and Inverse Cipher are composed of specific number of
rounds For the AES algorithm; the number of rounds to be performed during the execution of
the algorithm is dependent on the key length. AES operates on a 4x4 array of bytes (referred

4
to as “state”). The algorithm consists of four different simple operations. These operations
are:
o Sub Bytes
o Shift Rows
o Mix Columns
o Add Round Key

The Encryption and decryption process consists of a number of different transformations


applied consecutively over the data block bits, in a fixed number of iterations, called rounds.
The number of rounds depends on the length of the key used for the encryption process.
Block cipher with block size of 128 bit organized as a 4x4 byte matrix (State) Key size either
128, 192, or 256 bit 10, 12, or 14 similar rounds with 4 round transformations.
1.6. Application of the project
This section presents the Application experimental results that are carried out to evaluate
the performance of the QoSS‐AES processor in the case of an MPEG4 decoder. A
comparative study has been done between the proposed QoSS‐AES processor and the
conventional video encryption schemes (Sub band Shuffle, Block Shuffle). The results
demonstrate that the QoSS‐AES processor is well suited to provide high security with very
low latency.
The application works in following way:
1. The user opens the application and authenticates using pattern lock.
2. User can either type new message or reply to an existing message.
3. If new message is selected, user enters the message and presses encrypt button after
inserting the recipient’s name. The user has to enter a cipher key before the message is sent.
The cipher key is auto-generated if the user does not enter one.
4. If the user selects to reply to an existing message, he first decrypts the message by long
pressing the message and then types in the reply. The user is asked to enter cipher key before
the message is sent.
5. Once the cipher key is entered, the message is successfully sent and is shown in encrypted
form in the thread.
6. All messages in thread are displayed in encrypted format to both sender and receiver.
7. Long pressing the thread wills pop-up an action box wherein the user can delete, view
contact details or call the recipient.

5
8. Long pressing any message in the thread will pop-up an action box wherein the user can
delete, forward or decrypt the message.
9. The cipher key is randomly generated if the user does not enter it.
10. Various settings such as notification settings, Display settings, Encryption settings, Tone
settings, Personalization settings are available for the user‟s convenience.
11. This application is developed on Android platform. The reason behind using Android
platform is similar to other operating systems for mobile devices; Android OS supports
connectivity, messaging, language support, media support, Bluetooth etc. The main feature of
android would be open source technology and JAVA support. It also supports multitasking,
multi touch, Wi-Fi, tethering, 3G services, and very importantly security and privacy.
1.7 Organization of thesis
The complete dissertation work is divided into seven chapters.
The second chapter deals with the Description of the project end with conclusion.
The Third chapter presents the Design Analysis and end with conclusion.
The Fourth chapter deals with the Hardware Implementation of the project end with
conclusion.
The Fifth chapter Mathematical Analysis of the project, end with conclusion.
The sixth chapter output verification of the project, end with conclusion.
The seventh chapter deals with the conclusion of the project followed by future scope.
After completion of the 7 chapters this thesis deals with the references require for the design
and implementation of the project.
Finally this thesis deals with the Appendices. The appendices deal with the code.

6
CHAPTER -2
DESCRIPTION OF THE PROJECT
2.1 Introduction
The secret-key is assumed to be safe in hands and be known to two individual
communicating parties namely, the Sender and the Receiver alone. If we further assume that
the data communication is duplex, then each side needs to have their own FPGA based AES
processor for encryption and decryption. The process of the data communication from Sender
to the Receiver is as follows:
1. The Sender configures his FPGA processor by the Encryption Module with the known
Plain-text and the Cipher-key using the Configuration tool (say SANDS Software v1.1)
and encrypts the Plain-text to obtain the Cipher-text.
2. Then the output buffer collects and sends the Cipher-text over the communication
channel. Every client in between can see the cipher-text, but none other than the Receiver
having the Secret-key can make use of it.
3. Then the Receiver having configured his FPGA processor with the Decryption Module
can decrypt the Cipher-text to obtain the Inverse Cipher i.e., the original Plain-text.
Alternatively, if the current Receiver wants to send the sensitive data to the initial Sender,
then the above processes repeat with the roles of the Sender and the Receiver mutually
interchanged. Thus, the aim of the project, the FPGA implementation of secure data
communication using aes algorithm can practically be realized very effectively through the
efforts of our project work, by employing two FPGA processors, one at each side of the data
transfer with the condition that both parties must have known the Cipher-key used.

Fig 2.1. General Block diagram of AES

7
2.2. Preface
The following document provides a detailed and easy to understand explanation of the
implementation of the AES (RIJNDAEL) encryption algorithm. The purpose of this paper is
to give developers with little or no knowledge of cryptography the ability to implement AES.
2.3. Terminology
There are terms that are frequently used throughout this paper that need to be clarified.
Block: AES is a block cipher. This means that the number of bytes that it encrypts is fixed.
AES can currently encrypt blocks of 16 bytes at a time; no other block sizes are presently a
part of the AES standard. If the bytes being encrypted are larger than the specified block then
AES is executed concurrently. This also means that AES has to encrypt a minimum of 16
bytes. If the plain text is smaller than 16 bytes then it must be padded. Simply said the block
is a reference to the bytes that are processed by the algorithm.
State: Defines the current condition (state) of the block. That is the block of bytes that are
currently being worked on. The state starts off being equal to the block, however it changes
as each round of the algorithms executes. Plainly said this is the block in progress.
XOR: Refers to the bitwise operator Exclusive Or. XOR operates on the individual bits in a
byte in the following way:
0 XOR 0 = 0
1 XOR 0 = 1
1 XOR 1 = 0
0 XOR 1 = 1
For example the Hex digits D4 XOR FF
11010100
XOR 11111111
= 00101011 (Hex 2B)
Another interesting property of the XOR operator is that it is reversible.
So Hex 2B XOR FF = D4.
Table.2.1: Most programming languages have the XOR operator built in.

8
HEX: Defines a notation of numbers in base 16. This simply means that; the highest number
that can be represented in a single digit is 15, rather than the usual 9 in the decimal (base 10)
system.
Table 2.2 Hex to Decimal table:
0 1 2 3 4 5 6 7 8 9 A B C D E F
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
2 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
3 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
4 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
5 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
6 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
7 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
8 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
9 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
A 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
B 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
C 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
D 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
E 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
F 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
For example using the above table HEX D4 = DEC 212 All of the tables and examples in
this paper are written in HEX. The reason for this is that a single digit of Hex represents
exactly 4 bits. This means that a single byte can always be represented by 2 HEX digits. This
also makes it very useful in creating lookup tables where each HEX digit can represent a
table index.
2.4. AES Brief History
Effective May 26, 2002 the National Institute of Science and Technology (NIST) has
selected a block cipher called RIJNDAEL (named after its creators Vincent Rijmen and Joan
Daemen) as the symmetric key encryption algorithm to be used to encrypt sensitive but
unclassified American federal information. RIJNDAEL was originally a variable block (16,
24, 32 bytes) and variable key size (16, 24, 32 bytes) encryption algorithm. NIST has
however decided to define AES with a block size of 16 bytes while keeping their options
open to future changes.

9
2.5. AES Algorithm
AES is an iterated symmetric block cipher, which means that:
o AES works by repeating the same defined steps multiple times.
o AES is a secret key encryption algorithm.
o AES operates on a fixed number of bytes.

Advanced Encryption Algorithm Flow


PLAIN TEXT
PLAIN TEXT KEY
W(0,3) ADD ROUND KEY

ROUND 10
ADD ROUND KEY
INVERSE SUB BYTES
SUBSTITUTE BYTES EXPAND KEY
INVERSE SHIFT ROW
SHIFT ROWS
ROUND 1

MIX COLOUMNS
INV MIX COLUMN
ADD ROUND KEY W(36,39)
ADD ROUND KEY

ROUND 1
INV SUB BYTES
SUBSTITUTE BYTES
ROUND 10

INV SHIFT ROW


SHIFT ROWS
ADD ROUND KEY W(40,43) ADD ROUND KEY

CIPHER TEXT CIPHER TEXT

Fig 2.1: Advanced Encryption Algorithm flow


AES as well as most encryption algorithms is reversible. This means that almost the same
steps are performed to complete both encryption and decryption in reverse order. The AES
algorithm operates on bytes, which makes it simpler to implement and explain. This key is
expanded into individual sub keys, a sub keys for each operation round. This process is called
KEY EXPANSION, which is described at the end of this document.
For both its Cipher and Inverse Cipher, the AES algorithm uses a round function that is
composed of four different byte-oriented transformations: 1) Byte substitution using a
substitution table (S-box), 2) Shifting rows of the State array by different offsets, 3) Mixing
the data within each column of the State array, and 4) Adding a Round Key to the State. As
mentioned before AES is an iterated block cipher. All that means is that the same operations
are performed many times on a fixed number of bytes. These operations can easily be broken
down to the following functions:

10
 ADD ROUND KEY
 SUB BYTE
 SHIFT ROW
 MIX COLUMN
An iteration of the above steps is called a round. The amount of rounds of the algorithm
depends on the key size.
Table 2.3: key size
Key Block
Size Size Rounds
(bytes) (bytes)
16 16 10
24 16 12
32 16 14

The only exception being that in the last round the Mix Column step is not performed, to
make the algorithm reversible during decryption.
2.6. Encryption and Decryption
Data that can be read and understood without any special measures is called plaintext or
clear text. The method of disguising plaintext in such a way as to hide its substance is called
encryption. Encrypting plaintext results in unreadable gibberish called cipher text. You use
encryption to ensure that information is hidden from anyone for whom it is not intended, even
those who can see the encrypted data. The process of reverting cipher text to its original
plaintext is called decryption.
2.7. Encryption
Table 2.4 : AES encryption cipher using a 32 byte key.
Round Function
- Add Round Key(State)
0 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
1 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
2 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
3 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
4 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
5 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
6 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
7 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
8 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
9 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
10 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
11 Add Round Key(Mix Column(Shift Row(Byte Sub(State))))
12 Ad Round Key(Mix Column(Shift Row(Byte Sub(State))))
13 Ad Round Key(Shift Row(Byte Sub(State)))

11
2.8. Decryption
Table2.5: AES decryption cipher using a 32 byte key.
Round Function
- Add Round Key(State)
0 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
1 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
2 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
3 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
4 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
5 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
6 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
7 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
8 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
9 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
10 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
11 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
12 Mix Column(Add Round Key(Byte Sub(Shift Row(State))))
13 Add Round Key(Byte Sub(Shift Row(State)))

2.9. AES Cipher Functions


2.9.1. Add Round Key
Each of the 16 bytes of the state is XORed against each of the 16 bytes of a portion of the
expanded key for the current round. The Expanded Key bytes are never reused. So once the
first 16 bytes are XORed against the first 16 bytes of the expanded key then the expanded key
bytes 1-16 are never used again. The next time the Add Round Key function is called bytes
17-32 are XORed against the state.
The first time Add Round Key gets executed
State 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR

Exp Key 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

The second time Add Round Key is executed

State 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR XOR

Exp Key 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

And so on for each round of execution. During decryption this procedure is reversed.
Therefore the state is first XORed against the last 16 bytes of the expanded key, then the
second last 16 bytes and so on. The method for deriving the expanded key is described in
section 6.0

12
2.9.2. Sub Byte
During encryption each value of the state is replaced with the corresponding SBOX value.
Table 2.6. AES S-Box encryption Lookup Table

For example HEX 19 would get replaced with HEX D4


During decryption each value in the state is replaced with the corresponding inverse of the
SBOX
Table 2.7. AES S-Box decryption Lookup Table

For example HEX D4 would get replaced with HEX 19

13
2.9.3. Shift Row
Arranges the state in a matrix and then performs a circular shift for each row. This is not a
bit wise shift. The circular shift just moves each byte one space over. A byte that was in the
second position may end up in the third position after the shift. The circular part of it
specifies that the byte in the last position shifted one space will end up in the first position in
the same row. In Detail: The state is arranged in a 4x4 matrix (square).
The confusing part is that the matrix is formed vertically but shifted horizontally. So the first
4 bytes of the state will form the first bytes in each row.
So bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Will form a matrix:
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
Each row is then moved over (shifted) 1, 2 or 3 spaces over to the right, depending on the
row of the state. First row is never shifted
Row1 0
Row2 1
Row3 2
Roe4 3
The following table shows how the individual bytes are first arranged in the table and then
moved over (shifted). Blocks 16 bytes long:
From To
1 5 9 13 1 5 9 13
2 6 10 14 6 10 14 2
3 7 11 15 11 15 3 7
4 8 12 16 16 4 8 12
During decryption the same process is reversed and all rows are shifted to the left:
From To
1 5 9 13 1 5 9 13
2 6 10 14 14 2 6 10
3 7 11 15 11 15 3 7
4 8 12 16 8 12 16 4

14
2.9.4. Mix Column
This is perhaps the hardest step to both understand and explain. There are two parts to this
step. The first will explain which parts of the state are multiplied against which parts of the
matrix. The second will explain how this multiplication is implemented over what’s called a
Galois Field.
2.10. Matrix Multiplication
The state is arranged into a 4 row table (as described in the Shift Row function). The
multiplication is performed one column at a time (4 bytes). Each value in the column is
eventually multiplied against every value of the matrix (16 total multiplications). The results
of these multiplications are XORed together to produce only 4 result bytes for the next state.
Therefore 4 bytes input, 16 multiplications 12 XORs and 4 bytes output. The multiplication is
performed one matrix row at a time against each value of a state column.
Multiplication Matrix

The first result byte is calculated by multiplying 4 values of the state column against 4
values of the first row of the matrix. The result of each multiplication is then XORed to
produce 1 Byte.
b1 = (b1 * 2) XOR (b2*3) XOR (b3*1) XOR (b4*1)
The second result byte is calculated by multiplying the same 4 values of the state column
against 4 values of the second row of the matrix. The result of each multiplication is then
XORed to produce 1 Byte.
b2 = (b1 * 1) XOR (b2*2) XOR (b3*3) XOR (b4*1)
The third result byte is calculated by multiplying the same 4 values of the state column
against 4 values of the third row of the matrix. The result of each multiplication is then
XORed to produce 1 Byte.
b3 = (b1 * 1) XOR (b2*1) XOR (b3*2) XOR (b4*3)
The fourth result byte is calculated by multiplying the same 4 values of the state column
against 4 values of the fourth row of the matrix. The result of each multiplication is then

15
XORed to produce 1 Byte.
b4 = (b1 * 3) XOR (b2*1) XOR (b3*1) XOR (b4*2)
This procedure is repeated again with the next column of the state, until there are no more
state columns. Putting it all together: The first column will include state bytes 1-4 and will be
multiplied against the matrix in the following manner:
b1 = (b1 * 2) XOR (b2*3) XOR (b3*1) XOR (b4*1)
b2 = (b1 * 1) XOR (b2*2) XOR (b3*3) XOR (b4*1)
b3 = (b1 * 1) XOR (b2*1) XOR (b3*2) XOR (b4*3)
b4 = (b1 * 3) XOR (b2*1) XOR (b3*1) XOR (b4*2)
(b1= specifies the first byte of the state)
The second column will be multiplied against the second row of the matrix in the
following manner.
b5 = (b5 * 2) XOR (b6*3) XOR (b7*1) XOR (b8*1)
b6 = (b5 * 1) XOR (b6*2) XOR (b7*3) XOR (b8*1)
b7 = (b5 * 1) XOR (b6*1) XOR (b7*2) XOR (b8*3)
b8 = (b5 * 3) XOR (b6*1) XOR (b7*1) XOR (b8*2)
And so on until all columns of the state are exhausted.
2.10.1 Mix Column Inverse
During decryption the Mix Column the multiplication matrix is changed to:

Other than the change to the matrix table the function performs the same steps as during
encryption.
2.10.2 Mix Column Example During Encryption
The following examples are denoted in HEX.
Input = D4 BF 5D 30
Output(0) = (D4 * 2) XOR (BF*3) XOR (5D*1) XOR (30*1)
= E(L(D4) + L(02)) XOR E(L(BF) + L(03)) XOR 5D XOR 30
= E(41 + 19) XOR E(9D + 01) XOR 5D XOR 30
= E(5A) XOR E(9E) XOR 5D XOR 30
= B3 XOR DA XOR 5D XOR 30
= 04

16
Output(1) = (D4 * 1) XOR (BF*2) XOR (5D*3) XOR (30*1)
= D4 XOR E(L(BF)+L(02)) XOR E(L(5D)+L(03)) XOR 30
= D4 XOR E(9D+19) XOR E(88+01) XOR 30
= D4 XOR E(B6) XOR E(89) XOR 30
= D4 XOR 65 XOR E7 XOR 30
= 66
Output(2) = (D4 * 1) XOR (BF*1) XOR (5D*2) XOR (30*3)
= D4 XOR BF XOR E(L(5D)+L(02)) XOR E(L(30)+L(03))
= D4 XOR BF XOR E(88+19) XOR E(65+01)
= D4 XOR BF XOR E(A1) XOR E(66)
= D4 XOR BF XOR BA XOR 50
= 81
Output(3) = (D4 * 3) XOR (BF*1) XOR (5D*1) XOR (30*2)
= E(L(D4)+L(3)) XOR BF XOR 5D XOR E(L(30)+L(02))
= E(41+01) XOR BF XOR 5D XOR E(65+19)
= E(42) XOR BF XOR 5D XOR E(7E)
= 67 XOR BF XOR 5D XOR 60
= E5
2.10.3. Mix Column Example During Decryption
Input 04 66 81 E5
Output(0) = (04 * 0E) XOR (66*0B) XOR (81*0D) XOR (E5*09)
= E(L(04)+L(0E)) XOR E(L(66)+L(0B)) XOR E(L(81)+L(0D)) XOR
E(L(E5)+L(09))
= E(32+DF) XOR E(1E+68) XOR E(58+EE) XOR E(20+C7)
= E(111-FF) XOR E(86) XOR E(146-FF) XOR E(E7)
= E(12) XOR E(86) XOR E(47) XOR E(E7)
= 38 XOR B7 XOR D7 XOR 8C
= D4
Output(1) = (04 * 09) XOR (66*0E) XOR (81*0B) XOR (E5*0D)
= E(L(04)+L(09)) XOR E(L(66)+L(0E)) XOR E(L(81)+L(0B)) XOR
E(L(E5)+L(0D))
= E(32+C7) XOR E(1E+DF) XOR E(58+68) XOR E(20+ EE)
= E(F9) XOR E(FD) XOR E(C0) XOR E(10E-FF)
= E(F9) XOR E(FD) XOR E(C0) XOR E(0F)
= 24 XOR 52 XOR FC XOR 35= BF

17
Output(2) = (04 * 0D) XOR (66*09) XOR (81*0E) XOR (E5*0B)
= E(L(04)+L(0D)) XOR E(L(66)+L(09) XOR E(L(81)+L(0E)) XOR
E(L(E5)+(0B))
= E(32+EE) XOR E(1E+C7) XOR E(58+DF) XOR E(20+68)
= E(120-FF) XOR E(E5) XOR E(137-FF) XOR E(88)
= E(21) XOR E(E5) XOR E(38) XOR E(88)
= 34 XOR 7B XOR 4F XOR 5D
= 5D
Output(3) = (04 * 0B) XOR (66*0D) XOR (81*09) XOR (E5*0E)
= E(L(04)+L(0B)) XOR E(L(66)+L(0D)) XOR E(L(81)+L(09)) XOR
E(L(E5)+L(0E))
= E(32+68) XOR E(1E+EE) XOR E(58+C7) XOR E(20+DF)
= E(9A) XOR E(10C-FF) XOR E(11F-FF) XOR E(FF)
= E(9A) XOR E(0D) XOR E(20) XOR E(FF)
= 2C XOR F8 XOR E5 XOR 01
= 30
2.11. AES Key Expansion
Prior to encryption or decryption the key must be expanded. The expanded key is used in
the Add Round Key function defined above. Each time the Add Round Key function is called
a different part of the expanded key is XORed against the state. In order for this to work the
Expanded Key must be large enough so that it can provide key material for every time the
Add Round Key function is executed. The Add Round Key function gets called for each
round as well as one extra time at the beginning of the algorithm. Therefore the size of the
expanded key will always be equal to: 16 * (number of rounds + 1).
The 16 in the above function is actually the size of the block in bytes. This provides key
material for every byte in the block during every round +1
Table 2.8. key size, block size, expanded key
Key Block Expanded
Size Size Key
(bytes) (bytes) (bytes)
16 16 176
24 16 208
32 16 240

Since the key size is much smaller than the size of the sub keys, the key is actually
“stretcheout” to provide enough key space for the algorithm. The key expansion routine
executes a maximum of 4 consecutive functions. These functions are:

18
 ROT WORD
 SUB WORD
 RCON
 EK
 K
An iteration of the above steps is called a round. The amount of rounds of the key
expansion algorithm depends on the key size.
Table 2.9. key expansion algorithm depends on the key size.
Key Block Expansion Expanded Rounds Rounds Expanded
Size Size Algorithm Bytes / Key Copy Key Key
(bytes) (bytes) Rounds Round Expansion (bytes)
16 16 44 4 4 40 176
24 16 52 4 6 46 208
32 16 60 4 8 52 240

The first bytes of the expanded key are always equal to the key. If the key is 16 bytes long
the first 16 bytes of the expanded key will be the same as the original key. If the key size is
32 bytes then the first 32 bytes of the expanded key will be the same as the original key. Each
round adds 4 bytes to the Expanded Key. With the exception of the first rounds each round
also takes the previous rounds 4 bytes as input operates and returns 4 bytes. One more
important note is that not all of the 4 functions are always called in each round. The algorithm
only calls all 4 of the functions every:
4 Rounds for a 16 byte Key
6 Rounds for a 24 byte Key
8 Rounds for a 32 byte Key
The rest of the rounds only a K function result is XORed with the result of the EK
function. There is an exception of this rule where if the key is 32 bytes long an additional call
to the Sub Word function is called every 8 rounds starting on the 13th round.
2.12. AES Key Expansion Functions
 Rot Word (4 bytes)
This does a circular shift on 4 bytes similar to the Shift Row Function.
1,2,3,4 to 2,3,4,1
 Sub Word (4 bytes): This step applies the S-box value substitution as described in
 Bytes Sub: Function to each of the 4 bytes in the argument.
 Rcon ((Round/(Key Size/4))-1): This function returns a 4 byte value based on the
following table

19
Rcon(0) = 01000000
Rcon(1) = 02000000
Rcon(2) = 04000000
Rcon(3) = 08000000
Rcon(4) = 10000000
Rcon(5) = 20000000
Rcon(6) = 40000000
Rcon(7) = 80000000
Rcon(8) = 1B000000
Rcon(9) = 36000000
Rcon(10) = 6C000000
Rcon(11) = D8000000
Rcon(12) = AB000000
Rcon(13) = 4D000000
Rcon(14) = 9A000000

 For example for a 16 byte key Rcon is first called in the 4th round: (4/(16/4))-1=0
 In this case Rcon will return : 01000000
 For a 24 byte key Rcon is first called in the 6th round: (6/(24/4))-1=0
 In this case Rcon will also return : 01000000
 EK(Offset): EK function returns 4 bytes of the Expanded Key after the specified
offset. For example if offset is 0 then EK will return bytes 0,1,2,3 of the Expanded
Key
 K(Offset): K function returns 4 bytes of the Key after the specified offset. For
example if offset is 0 then K will return bytes 0,1,2,3 of the Expanded Key
2.13. AES Key Expansion Algorithm
Since the expansion algorithm changes depending on the length of the key, it is extremely
difficult to explain in writing. This is why the explanation of the Key Expansion Algorithm is
provided in a table format. There are 3 tables, one for each AES key sizes (16, 24, and 32).
Each table has 3 fields:
Table 2.10. Three fields of AES key sizes
Field Description
A counter representing the current step in the key
Round expansion
algorithm, think of this as a loop counter
Expanded key bytes effected by the result of the
Expanded Key Bytes function(s)
Function The function(s) that will return the 4 bytes written to the
effected expanded key bytes
2.13.1. 32 byte Key Expansion
Each round (except rounds 0, 1, 2, 3, 4, 5, 6 and 7) will take the result of the previous
round and produce a 4 byte result for the current round. Notice the first 8 rounds simply
copy the total of 32 bytes of the key.

20
Table 2.11: 32 byte key Expansion
Roun Expanded Key Function
d Bytes
0 0 1 2 3 K(0)
1 4 5 6 7 K(4)
2 8 9 10 11 K(8)
3 12 13 14 15 K(12)
4 16 17 18 19 K(16)
5 20 21 22 23 K(20)
6 24 25 26 27 K(24)
7 28 29 30 31 K(28)
8 32 33 34 35 Sub Word(Rot Word(EK((8-1)*4))) XOR Rcon((8/8)-1) XOR EK((8-8)*4)
9 36 37 38 39 EK((9-1)*4)XOR EK((9-8)*4)
10 40 41 42 43 EK((10-1)*4)XOR EK((10-8)*4)
11 44 45 46 47 EK((11-1)*4)XOR EK((11-8)*4)
12 48 49 50 51 Sub Word(EK((12-1)*4))XOR EK((12-8)*4)
13 52 53 54 55 EK((13-1)*4)XOR EK((13-8)*4)
14 56 57 58 59 EK((14-1)*4)XOR EK((14-8)*4)
15 60 61 62 63 EK((15-1)*4)XOR EK((15-8)*4)
16 64 65 66 67 Sub Word(Rot Word(EK((16-1)*4))) XOR Rcon((16/8)-1) XOR EK((16-8)*4)
17 68 69 70 71 EK((17-1)*4)XOR EK((17-8)*4)
18 72 73 74 75 EK((18-1)*4)XOR EK((18-8)*4)
19 76 77 78 79 EK((19-1)*4)XOR EK((19-8)*4)
20 80 81 82 83 Sub Word(EK((20-1)*4))XOR EK((20-8)*4)
21 84 85 86 87 EK((21-1)*4)XOR EK((21-8)*4)
22 88 89 90 91 EK((22-1)*4)XOR EK((22-8)*4)
23 92 93 94 95 EK((23-1)*4)XOR EK((23-8)*4)
24 96 97 98 99 Sub Word(Rot Word(EK((24-1)*4))) XOR Rcon((24/8)-1) XOR EK((24-8)*4)
25 100 101 102 103 EK((25-1)*4)XOR EK((25-8)*4)
26 104 105 106 107 EK((26-1)*4)XOR EK((26-8)*4)
27 108 109 110 111 EK((27-1)*4)XOR EK((27-8)*4)
28 112 113 114 115 Sub Word(EK((28-1)*4))XOR EK((28-8)*4)
29 116 117 118 119 EK((29-1)*4)XOR EK((29-8)*4)
30 120 121 122 123 EK((30-1)*4)XOR EK((30-8)*4)
31 124 125 126 127 EK((31-1)*4)XOR EK((31-8)*4)
32 128 129 130 131 Sub Word(Rot Word(EK((32-1)*4))) XOR Rcon((32/8)-1) XOR EK((32-8)*4)
33 132 133 134 135 EK((33-1)*4)XOR EK((33-8)*4)
34 136 137 138 139 EK((34-1)*4)XOR EK((34-8)*4)
35 140 141 142 143 EK((35-1)*4)XOR EK((35-8)*4)
36 144 145 146 147 Sub Word(EK((36-1)*4))XOR EK((36-8)*4)
37 148 149 150 151 EK((37-1)*4)XOR EK((37-8)*4)
38 152 153 154 155 EK((38-1)*4)XOR EK((38-8)*4)
39 156 157 158 159 EK((39-1)*4)XOR EK((39-8)*4)
40 160 161 162 163 Sub Word(Rot Word(EK((40-1)*4))) XOR Rcon((40/8)-1) XOR EK((40-8)*4)
41 164 165 166 167 EK((41-1)*4)XOR EK((41-8)*4)
42 168 169 170 171 EK((42-1)*4)XOR EK((42-8)*4)
43 172 173 174 175 EK((43-1)*4)XOR EK((43-8)*4)
44 176 177 178 179 Sub Word(EK((44-1)*4))XOR EK((44-8)*4)
45 180 181 182 183 EK((45-1)*4)XOR EK((45-8)*4)
46 184 185 186 187 EK((46-1)*4)XOR EK((46-8)*4)
47 188 189 190 191 EK((47-1)*4)XOR EK((47-8)*4)
48 192 193 194 195 Sub Word(Rot Word(EK((48-1)*4))) XOR Rcon((48/8)-1) XOR EK((48-8)*4)
49 196 197 198 199 EK((49-1)*4)XOR EK((49-8)*4)
50 200 201 202 203 EK((50-1)*4)XOR EK((50-8)*4)
51 204 205 206 207 EK((51-1)*4)XOR EK((51-8)*4)

21
52 208 209 210 211 Sub Word(EK((52-1)*4))XOR EK((52-8)*4)
53 212 213 214 215 EK((53-1)*4)XOR EK((53-8)*4)
54 216 217 218 219 EK((54-1)*4)XOR EK((54-8)*4)
55 220 221 222 223 EK((55-1)*4)XOR EK((55-8)*4)
56 234 225 226 227 Sub Word(Rot Word(EK((56-1)*4))) XOR Rcon((56/8)-1) XOR EK((56-8)*4)
57 228 229 230 231 EK((57-1)*4)XOR EK((57-8)*4)
58 232 233 234 235 EK((58-1)*4)XOR EK((58-8)*4)

2.14. Project Overview


Our project initially aims at understanding a conventional cryptographic standard known
as Advanced Encryption Standard (AES), which is the most-sought after secret-key security
algorithm that is to be effectively employed in the future for the greatest security deals, in its
various forms, namely simple AES (128 bit key and plain-text), APES (512 bit data using
parallel 128 bit AES), ADES (512 bit data using 64 bit DES and 128 bit AES), etc., all of
which have been opened up by the tremendous growth and significant breakthroughs in the
recent history of conventional cryptography. Later, the Advanced Encryption Standard is
discussed in all its mathematical preliminaries and scientific depiction of the approved
algorithm. After this, the limitations in software implementation are analyzed and the various
hardware approaches are studied exhaustively.
A highly parallelized and low cost hardware architectural solution is proposed based on
relative merits of FPGA architecture; the architectural details and the functionality are fully
elucidated in its top-to-bottom modular hierarchy; simulated using the Mentor Graphics
VHDL simulator, Model Sim XE II v5.8C, synthesized using the Xilinx Synthesis tool,
Xilinx ISE 6 (Integrated Software Environment), and finally configured and implemented in
FPGA using SANDS FPGA/CPLD development platform.
Thus, in this project we ultimately aim at developing a cost-effective but highly secure
and parallelized solution for implementing the AES algorithm in hardware, by effectively
integrating the potential advantages, major capabilities and micro compactness of the VLSI
to revolutionize the major area of Secure Data Communications through Computers
Networks, which is now-a-days a major concern not only to the giant federal organizations
but also to the private individuals in this strategic world.
 Project Requirements - Summary

1. Design Entry: System C / VHDL


2. Simulator:XilinxPlatformStudio(XPS)
Synthesis and Implementation by Xilinx’s Webpack

22
Where;
o XC- Xilinx Commercial,
o SPARTAN-III FPGA,
o 1, 50,000 Gate Count,
o plastic quad package,
o Speed Grade: -5.

2.15. Conclusion
The above document provides you with only the basic information needed to implement
the AES encryption algorithm. The mathematics and design reasons behind AES were
purposely left out. For more information on these topics in Rijndael.

23
CHAPTER -3
DESIGN ANALYSIS
3.1 Introduction
Micro Blaze Soft Core processor is used to Synthesis using EDK10.1 on Spartan3E. The
Embedded Development Kit (EDK) from Xilinx allows the designer to build a complete
processor system on Xilinx's FPGAs. The systems that can be produced using EDK ranges
from simple single processor architecture to a complex multi-processor system with multiple
hardware accelerators. The tool mainly supports two types of processors:
i) Micro Blaze which is a reconfigurable soft-core processor and
ii) Power-PC which is a hardcore processor implemented in some FPGAs from
Xilinx.
Depending on the FPGA chip we are using, multiple Micro Blazes and Power-PCs can be
integrated together in a single design. EDK provides C/C++ compilers for both Micro Blaze
and Power-PC along with several tools for debugging/profiling of the applications running on
each processor. Besides, using ISE, you can perform several types of simulations for the
generated architectures which allow the estimation of both the performance and power
consumption of the architecture. This tutorial will demonstrate the process of creating and
testing a Micro Blaze system design using the Embedded Development Kit (EDK) and
Spartan 3E starter board from Xilinx.
3.2 Objectives
The project contains these sections:
 System Requirements
 Micro Blaze System Description
 Steps
The following steps are described in this project:
 Starting XPS
 Using the Base System Builder Wizard
 Create or Import IP Peripheral
 Design Modification using Platform Studio
 Implementing the Design
 Defining the Software Design

24
3.3 System Requirements:
You must have the following software installed on your PC to complete this project:
Windows 2000 SP2/Windows XP
 EDK 10.1i.
 ISE 10.1i.
 Familiarity with Xilinx ISE 10.1 design flow.
 Spartan 3e starter kit and Xilinx USB download cable.
 Update pin assignments in the system. elf file
 Update board JTAG chain specified in the download.cmd

Fig. 3.1. Secure data communication controlled by an FPGA processor


3.4. FPGA Trainer Kit:
This topic is on requirements and specifications to get started with the FPGA trainer Kit.
The kit has the following Deliverables:
FPGA Kit in a box containing inbuilt units:
3.4.1. Key component features:
The key features of the Spartan-3E Starter Kit board are:
 Xilinx XC3S200E Spartan-3E FPGA
 Up to 232 user-I/O pins
 320-pin FBGA package
 Over 10,000 logic cells
 Xilinx 4 Mbit Platform Flash configuration PROM
 Xilinx 64-macrocell XC2C64A Cool Runner CPLD
 64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MHz

25
 16 MByte (128 Mbit) of parallel NOR Flash (Intel Strata Flash)
 FPGA configuration storage
 Micro Blaze code storage/shadowing
 16 Mbits of SPI serial Flash (STMicro)
 FPGA configuration storage
 Micro Blaze code shadowing

Fig 3.2: Xilinx Spartan 3E FPGA kit


 2-line, 16-character LCD screen
 PS/2 mouse or keyboard port
 VGA display port
 10/100 Ethernet PHY (requires Ethernet MAC in FPGA)
 Two 9-pin RS-232 ports (DTE- and DCE-style)
 On-board USB-based FPGA/CPLD download/debug interface
 50 MHz clock oscillator
 SHA-1 1-wire serial EEPROM for bit stream copy protection
 Hirose FX2 expansion connector
 Three Diligent 6-pin expansion connectors
 Four-output, SPI-based Digital-to-Analog Converter (DAC)
 Two-input, SPI-based Analog-to-Digital Converter (ADC) with programmable-
gain pre-amplifier

26
 Chip Scope™ Soft Touch debugging port
 Rotary-encoder with push-button shaft
 Eight discrete LEDs
 Four slide switches
3.5. Spartan-3E starter kit
The Spartan-3E Starter Kit board highlights the unique features of the Spartan-3E shown
below in figure 3.2 FPGA family and provides a convenient development board for
embedded processing applications.
The board highlights these features:
• Spartan-3E specific features
 Parallel NOR Flash configuration
 Multi Boot FPGA configuration from Parallel NOR Flash PROM
 SPI serial Flash configuration
• Embedded development
 Micro Blaze™ 32-bit embedded RISC processor
 Pico Blaze™ 8-bit embedded controller
 DDR memory interfaces
The main blocks here include the Micro-controller, CPLD, and the FPGA Spartan-II.
Micro-controller and the CPLD can tolerate 5 V whereas the FPGA operates with 2.5 V.
Micro-controller acts as booting interface between the whole kit architecture and the
SANDS software. It converts the serial data obtained into parallel data as needed for
processing in the FPGA. It also takes control over the FPGA by acting as a Master over the
slave till configuration gets completed. Once it successfully configures FPGA then it releases
hold over it to make the FPGA function independently based upon the inputs provided. The
function of the CPLD is to coordinate and provide separate access to address and data bus
values attained from a common bus. Moreover, it also acts as a voltage controller to provide
the FPGA with the necessary 2.5 V from its input supply of 5 V. From the programmable
port, the hex file will be driven into the Micro-controller and from there to CPLD and then to
the target device. Though this is a round-about process rather than programming the chip
directly from the JTAG port, it eliminates the need for costlier cables and high speed
configuring software by sacrificing the configuration speed to some extent, which is in fact
affordable in many situations.

27
Now, with the necessary inputs and the clock, we can run the configured gate-level
extracted circuit to achieve the functionality that we have designed and downloaded which
may be either encryption or decryption. The output generated before the UART software
module goes into the transmitter state-machine and the data will be converted from parallel to
serial which is collected at the serial communication port. The data obtained now can be
communicated to the other side using serial cable RS-232 which is connected directly to the
COM port of the other PC wherein the encipher or decipher can be seen. Thus, the FPGA
based processor achieves the implementation of the desired algorithm very effectively.
3.6 Xilinx platform studio (XPS)
The Xilinx Platform Studio (XPS) is the development environment or GUI used for
designing the hardware portion of your embedded processor system in figure 6.1. Embedded
Development Kit Xilinx Embedded Development Kit (EDK) is an integrated software tool
suite for developing embedded systems with Xilinx MicroBlaze and PowerPC CPUs. EDK
includes a variety of tools and applications to assist the designer to develop an embedded
system right from the hardware creation to final implementation of the system on an FPGA.
System design consists of the creation of the hardware and software components of the
embedded processor system and the creation of a verification component is optional. A
typical embedded system design project involves: hardware platform creation, hardware
platform verification (simulation), software platform creation, software application creation,
and software verification.
Base System Builder is the wizard that is used to automatically generate a hardware
platform according to the user specifications that is defined by the MHS (Microprocessor
Hardware Specification) file. The MHS file defines the system architecture, peripherals and
embedded processors]. The Platform Generation tool creates the hardware platform using the
MHS file as input. The software platform is defined by MSS (Microprocessor Software
Specification) file which defines driver and library customization parameters for peripherals,
processor customization parameters, standard 110 devices, interrupt handler routines, and
other software related routines. The MSS file is an input to the Library Generator tool for
customization of drivers, libraries and interrupts handlers.

28
Figure 3.3.: Xilinx Platform Studio Set up.
XPS includes a graphical user interface (GUI), along with a set of tools that aid in project
design. From the XPS GUI, you can design a complete embedded processor system for
implementation within a Xilinx FPGA device. The XPS main window is shown in the figure
below.
Note that the XPS main window is divided into three areas:
i. The Project Information Panel
ii. The System Assembly Panel
iii. The Connectivity Panel

Figure 3.4.: Xilinx Platform Studio GUI


XPS Features Include
 Base System Builder allows creation of a fully functional processor system in minutes
 System Assembly View allows user to quickly customize and configure design details
29
 IP configuration dialogs open automatically when new IP is added to a design
 Auto bus connectivity on AXI based designs
 Extensive catalog of AXI and PLB based processors, peripherals, and utility IP
 Tightly integrated with ISE Project Navigator, ISim, and Chip Scope
 Create / Import IP wizard automates creation of custom IP templates, and provides
mechanism to import user IP into XPS, and Bus Functional Model simulation support
for custom IP.
 Debug Wizard automates hardware / software cross triggering and Chip Scope
inclusion
 Hardware project export to the Software Development Kit (SDK)
3.7. Projection information panel
The Project Information Area panel offers control over and information about your
project. The Project Information panel provides Project, Applications, and IP Catalog tabs
shown in figure 6.3.
 THE PROJECT TAB
The Project Tab lists references to project related files. Information is grouped in the
following general categories:
1. Project Files: All project-specific files such as the Microprocessor Hardware
Specification (MHS) files, Microprocessor Software Specification (MSS) files,
User Constraints File (UCF) files, Impact Command files, Implementation Option
files, and Bitgen Option files.
2. Project Options: All project specific options, such as Device, Net,
Implementation, Hardware Description Language (HDL), and Sim Model options.
3. Reference Files: All log and output files produced by the XPS implementation
processes.

Figure 3.5. Project Information Area:Project Tab

30
 Application tab:

The Applications tab lists all software application option settings, header files, and source
files associated with each application project. With this tab selected, you can:
 Create and add a software application project, build the project, and load it to the
block RAM.
 Set compiler options.
 Add source and header files to the project.
 IP catalog tab:
The IP Catalog tab lists all the EDK IP cores and any custom IP cores you created as
shown in figure 6.4. If a project is open, only the IP cores compatible with the target Xilinx
device architecture are displayed. The catalog lists information about the IP cores, including
release version, status (active, early access or deprecated), lock (not licensed, locked, or
unlocked), processor support, and a short description. Additional details about the IP core,
including the version change history, data sheet, and Microprocessor Peripheral Description
(MPD) file, are available in the right-click menu. By default, the IP cores are grouped
hierarchically by function.

Figure 3.6. Project Information Area: IP Catalog Tab


i. The system assembly panel:
The System Assembly Panel is where you view and configure system block elements.
If the System Assembly Panel is not already maximized in the main window, click the
System Assembly tab at the bottom of the pane to open it.
 bus interface, ports, and address filters:XPS provides Bus Interface, Ports, and
Addresses radio buttons in the System Assembly Panel (shown in the figure below), which

31
organize information about your design and allow you to edit your hardware platform more
easily.

Fig 3.7. System Assembly Panel Views


ii. The connectivity panel
With the Bus Interface filter selected, you’ll see the Connectivity Panel, highlighted by the
dashed line. The Connectivity Panel is a graphical representation of the
hardware platform interconnects.
 A vertical line represents a bus, and a horizontal line represents a bus interface to an
 IP core.
 If a compatible connection can be made, a connector is displayed at the intersection
between the bus and IP core bus interface.
 The lines and connectors are color-coded to show the compatibility.
 Differently shaped connection symbols indicate mastership of the IP core bus
interface.
 A hollow connector represents a connection that you can make, and a filled connector
represents a connection made. To create or disable a connection, click the connector
symbol.
3.8. Integrated software environment (ISE)
ISE is the foundation for Xilinx FPGA logic design. Because FPGA design can be an
involved process, Xilinx has provided software development tools that allow the designer to
circumvent some of this complexity. Various utilities such as constraints entry, timing
analysis, logic placement and routing, and device programming have all been integrated into
ISE.

32
3.8.1. Steps for Setup
Spartan3E starter board with a RS-232 terminal connected to the serial port and
configured for 57600 baud, with 8 data bits, no parity and no handshakes.
 Creating the Project File in XPS
The first step in this tutorial is using the Xilinx Platform Studio (XPS) to create a project file.
XPS allows you to control the hardware and software development of the MicroBlaze system,
and includes the following:
 An editor and a project management interface for creating and editing source code
 Software tool flow configuration options
You can use XPS to create the following:
(i) A Project Navigator project file that allows you to control the hardware implementation
flow
(ii) A Microprocessor Hardware Specification (MHS) file
(iii) Microprocessor Software Specification (MSS) file
XPS supports the software tool flow associated with these software specifications.
Additionally, you can use XPS to customize software libraries, drivers, and interrupt
handlers, and to compile your programs.
1 Starting XPS
(a)To open XPS, select Start  All Programs  Development  Xilinx ISE Design
Suite10.1  EDK Xilinx Platform Studio.
(b) Select Base System Builder Wizard (BSB) to open the \Create New XPS Project
Using BSB Wizard" dialogue box shown in Figure6.1.

Fig 3.8: starting window of XPS

33
(c) Click Ok.
(d) Use the Project File Browse button to browse to the folder you want as your
project directory.
(e) Click Open to create the system.xmp file then Save.

Fig 3.9: Create New XPS Project Using Base System Builder Wizard
(f) Click Ok to start the BSB wizard. The wizard window will appear, which will be
used to build the design as will be discussed in following sections.
3.9. Defining the system hardware
3.9.1 MHS and MPD Files
The next step is defining the embedded system hardware with the Microprocessor
Hardware Specification (MHS) and Microprocessor Peripheral Description (MPD) files.
MHS File: The Microprocessor Hardware Specification (MHS) file describes the following:
 Embedded processor: either the soft core MicroBlaze processor or the hard core
PowerPC (only available in Virtex-II Pro and Virtex-4 FX devices)
 Peripherals and associated address spaces
 Buses
 Overall connectivity of the system
The MHS file is a readable text file that is an input to the Platform Generator (the
hardware system building tool). Conceptually, the MHS file is a textual schematic of the
embedded system. To instantiate a component in the MHS file, you must include information
specific to the component.
MPD File: Each system peripheral has a corresponding MPD file. The MPD file is the
symbol of the embedded system peripheral to the MHS schematic of the embedded system.
The MPD file contains all of the available ports and hardware parameters for a peripheral.
The MPD file is located in the following directory:
$XILINX EDK= hw =Xilinx Processor IPLib= Pcores = < Peripheral name > =data

34
EDK provides two methods for creating the MHS file. Base System Builder Wizard
and the Add/Edit Cores Dialog assist you in building the processor system, which is defined
in the MHS file. This illustrates the Base System Builder.
3.9.2 Using the Base System Builder Wizard
Use the following steps to create the processor system:
 In the Base System Builder - Select “I would like to create a new design" then click
Next.
 In the Base System Builder - Select Board Dialog select the following, as shown in
Figure 6.8:
 Board Vendor: Xilinx
 Board Name: Spartan-3E Starter Board
 Board Revision: C
 Click next. Select the MicroBlaze
 Click Next. You will now specify several processor options as shown in Figure 6.8:
The following is an explanation of the settings specified in Figure
 System Wide Setting:
 Reference clock frequency: This is the on board frequency of the clock.
 Processor-Bus clock frequency: This is the frequency of the clock driving the
processor system.
 Processor Configuration:
 Debug I/F:
 On-Chip H/W Debug module: When the H/W debug module is selected; a PLB MDM
module is included in the hardware system. This introduces hardware intrusive
debugging with no software stub required. This is the recommended way of
debugging for MicroBlaze system.
 XMD with S/W Debug stub: Selecting this mode of debugging interface introduces a
software intrusive debugging. There is a 1200-byte stub that is located at 0x00000000.
This stub communicates with the debugger on the host through the JTAG interface of
the PLB MDM module.
 No Debug: debugging is disabled.

35
Fig 3.10: BSB: Select a Board
 Users can specify the size of the local instruction and data memory.
 Cache setup:
 No Cache: No caching will be used
 Enable cache link: Caching will be used through the FSL bus
 You can also specify the use of the floating point unit (FPU).
 Click Next.
Select the peripheral subset (Configure IO Interfaces wizard) as shown in Figure 6.5. It
should be noted that the number of peripheral shown on each dialogue box is dynamic based
upon your computers resolution.

Fig 3.11: configure processor


In the first page of the “Configure IO Interfaces wizard", Figure 6.10:
 RS232_DTE deselect
 RS232_DCE  select
 XPS UARTLITE baud-rate  57600, data bits  8 and Parity  NONE
 LEDs 8Bit  select
36
 Click Next
In the second page of the “Configure IO Interfaces wizard", Figure 6.11:
 DIP Switch 4Bit  select
 Buttons 4Bit  deselect
 FLASH  deselect
 SPI FLASH  deselect
 Click Next

Fig 3.12: Configure I/O Interfaces – 1


In the third page of the “Configure IO Interfaces wizard", Figure 6.11:
 DDR SDRAM  select
 Ethernet Mac  deselect
 Click Next through the Add Internal Peripherals page as we will not add any in this
project.
 Click Next
 This completes the hardware specification and we will now configure the software
settings. Using the
 Software Setup dialogue box as shown in Figure 6.13, specify the following software
settings:
 Standard Input (STDIN)  RS232
 Standard Output (STDOUT)  RS232
 Boot Memory  ilmb cntlr
 Sample Application Selection  Memory Test
 Click Next.

37
Fig 3.13: Configure I/O Interfaces – 2

Fig 3.14: Configure I/O Interfaces – 3


Using the Configure Memory Test Application dialogue box as shown in Figure 6.8, specify
the following software settings:
 Instructions  ilmb cntlr
 Data  dlmb cntlr
 Stack/Heap  dlmb cntlr
 Click Next.

38
Fig 3.15: Software Setup
The simple memory test application will illustrate system aliveness and perform a basic
read/write to your memory devices.
The completed system including the memory map will be displayed as shown in Figure6.9.
Currently the memory map cannot be changed or updated in the BSB. If you want to change
the memory map you can do this in XPS.
 Click Generate and then Finish, to complete the design.
 Select “Start Using Platform Studio" and click OK.
3.10. Review
The Base System Builder Wizard has created the hardware and software specification files
that define the processor system. When we look at the project directory, shown in Figure
6.10, we see these as system.mhs and system.mss. There are also some directories created:
 data - contains the UCF (user constraints file) for the target board.
 etc - contains system settings for JTAG configuration on the board that is used when
downloading the bit file and the default parameters that are passed to the ISE tools.
 pcores - is empty right now, but is utilized for custom peripherals.
 TestApp Memory - contains a user application in C code source, for testing the
memory in the system.
3.10.1. Project Options
To see the project options that Base System Builder has configured select:
Project  Project Options, the device information is specified. Select: Hierarchy and Flow.
This window provides the opportunity to export the processor system into an ISE project as
either the top level system or a sub-module design.

39
Fig 3.16: Configure Memory Test Application

Fig 3.16: Generated Processor System


 Click finish to build project

Fig 3.17: BSB Finish Setup


40
 To continue with XPS Project click start using Platform Studio

Fig. 3.18: start using XPS


3.10.2 Implementing the Design
Now that the hardware has been completely specified in the MHS file, you can run the
Platform Generator. Platform Generator elaborates the MHS _le into a hardware system
consisting of NGC files that represent the processor system. Then the Xilinx ISE tools will be
called to implement the design for the target board. To generate a netlist and create the bit
file, follow these steps:
3.10.3 Defining the Hardware Design
 Start Generating Netlist and Bitstream of Microprocessor Hadware Specification file
to read the hardware

Fig 3.19: Starting MHS Netlist and Bitstream Generation


 Select Hardware Generate Netlist. This will elaborate the MHS file and generate a
netlist for the complete system (this will take a while!).
 Select Hardware Generate Bitstream. This will call ISE tools to implement the
design and generate a bit file that could be downloaded into the FPGA.

41
At the end of this step the XPS output screen should look like Figure 6.14. The bit file
that is generate is called system.bit which contains all the required information to configure
the FPGA except the contents of the block ram (application/data). The bit file will be updated
with the application code after defining the software design.
3.10.4 Defining the Software Design
Now that the hardware design is completed, the next step is defining the software design.
There are two major parts to software design, configuring the Board Support Package (BSP)
and writing the software applications. The configuration of the BSP includes the selection of
device drivers and libraries.

Fig 3.20: after H/W and S/W Specification netlist generated the block diagram
3.11. Generating the linker script file
 From the system assemble view copy the address of DDR_SDRAM starting address.
 On project information area, in application select project right click  select
compiler options.
 In compiler options paste the starting address
 Generating the linker script by selecting the Generate linker option from the same
menu.
3.12. Building the User Application
In EDK 10.1, XPS provides for the user with the ability to create multiple software
projects. These projects can include source _les, header _les, and linker scripts. Unique
software projects allow the designer to specify the following options for each software
project:
 Specify compiler options
 Specify which projects to compile
 Specify which projects to download

42
 Build entire projects
Software application code development can be managed by selecting the Applications tab
as shown. The Base System Builder (BSB) generates a sample application which tests a
subset of the peripherals included in the design.
 Compiling the Code
Using the GNU GCC Compiler, compile the application code as follows:
 Select Software  Build All User Applications to run mb-gcc. Mb-gcc compiles the
source files.
3.13. Downloading the Design
Now that the hardware and software designs are completed, the device can be configured.
Follow these steps to download and configure the FPGA:
 Connect the host computer to the target board, including connecting the Xilinx USB
download cable and the serial cable.
 Start a hyper-terminal session with the following settings:-com1 . This is dependant on
the com port your serial cable is connected to.-Bits per second: 57600
 Connect the board power
 In EDK, select Device Configuration Update Bit-stream. This will update the bit
file with the application compiled code. Repeat this step each time the application
changes.
 Select Device Configuration  Download Bit-stream. This will start device
configuration software (iMPACT) within EDK and executes the download command
file etc/download.cmd.
 iMPACT will download the file download.bit on the FPGA.

Fig 3.21. FPGA Physical diagram

43
Fig 3.22. FPGA hardware output diagram
3.14. .ELF file generation
After downloading both Hardware and Software .bit generation .elf file will be generated by
 Delecting option Debug in menu
 Before debugging set options for JTAG activation
 Select option Debug in the menu lunch XMD
 ELF file window will be shown as below fig 3.20

3.15. Conclusion

 The implementation requirement which includes the primary input and primary output
of the design and the proper notation and conventions were discussed.
 General implementation flow of the design were represented and explained in order to
understand the proper flow.
 Implementation details have been discussed which includes implementation style of
each process.
 Finally the synthesis process was discussed which gives that in which FPGA family,
the design has been implemented.

44
CHAPTER-4
HARDWARE IMPLEMENTATION
4.1. Introduction
The purpose of the Design is to walk you through a complete hardware and software
processor system design. In this process, you will use the BSB of the XPS system to
automatically create a processor system and then add a custom OPB peripheral (adder circuit)
to that processor system which will consist of the following items:

Fig.4.1: FPGA Internal Diagram


• Micro Blaze Processor
• Local Memory Bus (LMB) Bus
• LMB BRAM controllers for BRAM
• BRAM BLOCK (On-chip memory)
• On-chip Peripheral Bus (OPB) BUS
• Debug Module (OPB_MDM)
• UART (OPB_UARTLITE)
• 2 - General Purpose Input / Output Pheriphals (OPB_GPIOs)
• Push Buttons
• Dip Switches
• Custom peripheral (32-bit adder circuit)

45
4.2 Micro Blaze Processor Design
Field-programmable gate arrays (FPGA's) are flexible and reusable high-density circuits
that can be easily re-configured by the designer, enabling the VLSI design / validation
/simulation cycle to be performed more quickly and less expensive. Increasing device
densities have prompted FPGA manufacturers, such as Xilinx and Altera, to incorporate
larger embedded components, including multipliers, DSP blocks and even embedded
processors. One of the recent architectural enhancements in the Xilinx Spartan, Virtex family
architectures is the introduction of the MicroBlaze (Soft IP) and PowerPC405 hard-core
embedded processor. The MicroBlaze processor is a 32-bit Harvard Reduced Instruction Set
Computer (RISC) architecture optimized for implementation in Xilinx FPGAs with separate
32-bit instruction and data buses running at full speed to execute programs and access data
from both on-chip and external memory at the same time.
4.3. Micro Blaze System Description
In general, to design an embedded processor system, you need the following:
 Hardware components
 Memory map
 Software application
4.3.1. Design Hardware
The Micro Blaze (MB) tutorial design includes the following hardware components:
 Micro Blaze
 Local Memory Bus (LMB) Bus
 LMB BRAM_IF_ CNTLR
 BRAM BLOCK
 PLB Bus
 Multi-Port Memory Controller (MPMC)
 MDM
 XPS UARTLITE
 2 - XPS GPIOs

46
4.3.2. Design Memory Map
The following table shows the memory map for the tutorial design as created by Base
System Builder.
Address Address
Device Size Comment
Min Max
LMB_BRAM 0x0000_0000 0X000_3FFF 16K bytes LMB Memory
MDM 0x8440_0000 0x8440_FFFF 64K bytes Debug_module
XPS_UARTLITE 0x8400_0000 0x8400_FFFF 64K bytes RS232_DCE
XPS_GPIO 0x8140_0000 0x8140_FFFF 64K bytes LED
XPS_GPIO 0x8142_0000 0x8142_FFFF 64K bytes DIP_Switches_4Bit
MPMC 0X8600_0000 0x87FF_FFFF 32Mbytes DDR_SDRAM
Table 4.1: Design memory map
4.4. Background
The backbone of the architecture is a single-issue, 3-stage pipeline with 32 general-
purpose registers (does not have any address registers like the Motorola 68000 Processor), an
Arithmetic Logic Unit (ALU), a shift unit, and two levels of interrupt. This basic design can
then be configured with more advanced features to tailor to the exact needs of the target
embedded application such as: barrel shifter, divider, multiplier, single precision on floating-
point unit (FPU), instruction and data caches, exception handling, debug logic, Fast Simplex
Link (FSL) interfaces and others.
This flexibility allows the user to balance the required performance of the target
application against the logic area cost of the soft processor Micro Blaze also supports reset,
interrupt, user exception, and break hardware exceptions. For interrupts, Micro Blaze
supports only one external interrupt source (connecting to the Interrupt input port). If multiple
interrupts are needed, an interrupt controller must be used to handle multiple interrupt
requests to Micro Blaze shown in figure4.2. An interrupt controller is available for use with
the Xilinx Embedded Development Kit (EDK) software tools. The processor will only react
to interrupts if the Interrupt Enable (IE) bit in the Machine Status Register (MSR) is set to 1.
On an interrupt the instruction in the execution stage will complete, while the instruction in
the decode stage is replaced by a branch to the interrupt vector (address Ox 10). The interrupt
return address (the PC associated with the instruction in the decode stage at the time of the
interrupt) is automatically loaded into general-purpose register. In addition, the processor also

47
disables future interrupts by clearing the IE bit in the MSR. The IE bit is automatically set
again when executing the RTlD instruction.

Fig 4.2: Micro Blaze architecture block diagram


Due to the advancement in the fabrication technology and the increase in the density of
logic blocks on FPGA, the use of FPGA is not limited anymore to debugging and prototyping
digital electronic circuits. Due to the enormous parallelism achievable on FPGA and the
increasing density of logic blocks, it is being used now as a replacement to ASIC solutions in
a few applications where the time to market is critical and also entire embedded processor
systems are implemented on these devices with soft core processors embedded in the system.
Soft cores are technology independent and require only simulation and timing verification
after synthesized to a target technology. This reduces the design cycle development time by a
major factor as compared to the development cycle for a hard core processor and has the
advantage of customizing the soft core design for a specific application.
4.5. Features
The MicroBlaze soft core processor is highly configurable, allowing you to select a
specific set of features required by your design.
The fixed feature set of the processor includes:
• Thirty-two 32-bit general purpose registers
• 32-bit instruction word with three operands and two addressing modes
• 32-bit address bus
• Single issue pipeline

48
In addition to these fixed features, the MicroBlaze processor is parameterized to allow
selective enabling of additional functionality. Older (deprecated) versions of MicroBlaze
support a subset of the optional features described here. Only the latest (preferred) version of
MicroBlaze (v7.00) supports all options. Xilinx recommends that all new designs use the
latest preferred version of the MicroBlaze processor.
4.6 Pipeline Architecture
MicroBlaze instruction execution is pipelined. For most instructions, each stage takes one
clock cycle to complete. Consequently, the number of clock cycles necessary for s specified
instruction to complete is equal to the number of pipeline stages, and one instruction is
completed in every cycle. A few instructions require multiple clock cycles in the execute
stages to complete. This is achieved by stalling the pipeline.When executing from slower
memory, instruction fetches may take multiple cycles. This additional latency directly affects
the efficiency of the pipeline. MicroBlaze implements an instruction prefetch buffer that
reduces the impact of such multi-cycle instruction memory latency. While the pipeline is
stalled by a multi-cycle instruction in the execution stage, the prefetch buffer continues to
load sequential instructions. When the pipeline resumes execution, the fetch stage can load
new instructions directly from the prefetch buffer instead of waiting for the instruction
memory access to complete.
4.7 Three Stage Pipeline
When area optimization is enabled, the pipeline is divided into three stages to minimize
hardware cost: Fetch, Decode, and Execute.
Cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7

Fetch Decode Execute


Instruction 1
e
Fetch Decode Execute Execute Execute
Instruction 2
Instruction 3 Fetch Decode Stall Stall Execute

4.8 Five Stage Pipeline


When area optimization is disabled, the pipeline is divided into five stages to maximize
performance: Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM), and Write
back (WB).

49
4.8.1. Memory Architecture
Micro Blaze is implemented with Harvard memory architecture; instruction and data
accesses are done in separate address spaces. Each address space has a 32-bit range (that is,
handles up to 4- GB of instruction and data memory respectively). The instruction and data
memory ranges can be made to overlap by mapping them both to the same physical memory.
The latter is useful for software debugging. Both instruction and data interfaces of Micro
Blaze are 32 bits wide and use big endian, bit-reversed format. Micro Blaze supports word,
half word, and byte accesses to data memory. Data accesses must be aligned (word accesses
must be on word boundaries, half word on half word boundaries), unless the processor is
configured to support unaligned exceptions. All instruction accesses must be word aligned.
Micro Blaze does not separate data accesses to I/O and memory (it uses memory
mapped I/O). The processor has up to three interfaces for memory accesses. The LMB
memory address range must not overlap with PLB, OPB or XCL ranges. Micro Blaze has a
single cycle latency for a accesses to local memory (LMB) and for cache read hits, except
with area optimization enabled when data side accesses and data cache read hits require two
clock cycles. A data cache write normally has two cycles of latency (more if the posted-write
buffer in the memory controller is full).
The Micro Blaze instruction and data caches can be configured to use 4 or 8 word
cache lines. When using a longer cache line, more bytes are pre-fetched, which generally
improves performance for software with sequential access patterns. However, for software
with a more random access pattern the performance can instead decrease for a given cache
size. This is caused by a reduced cache hit rate due to fewer available cache lines.
 Local Memory Bus (LMB)
 Processor Local Bus (PLB)
 On-chip Peripheral Bus (OPB)
 Xilinx Cache Link (XCL).

50
4.9 Micro Blaze I/O Overview
The core interfaces shown in Figure 1-1 are defined as follow:
DPLB: Data interface, Processor Local Bus
DOPB: Data interface, On-chip Peripheral Bus
DLMB: Data interface, Local Memory Bus (BRAM only)
IPLB: Instruction interface, Processor Local Bus
IOPB: Instruction interface, On-chip Peripheral Bus
ILMB: Instruction interface, Local Memory Bus (BRAM only)
MFSL 0-15: FSL master interfaces
SFSL 0-15: FSL slave interfaces
IXCL: Instruction side Xilinx Cache Link interface (FSL master/slave pair)
DXCL: Data side Xilinx Cache Link interface (FSL master/slave pair)
Core: Miscellaneous signals for: clock, reset, debug, and trace.
 Processor Local Bus (PLB) Interface Description: The MicroBlaze PLB interfaces are
implemented as byte-enable capable 32-bit masters.
 On-Chip Peripheral Bus (OPB) Interface Description: The MicroBlaze OPB interfaces are
implemented as byte-enable capable masters.
 Local Memory Bus (LMB) Interface Description: The LMB is a synchronous bus used
primarily to access on-chip block RAM. It uses a minimum number of control signals and a
simple protocol to ensure that local block RAM are accessed in a single clock cycle. LMB
signals and definitions are shown in the following table. All LMB signals are active high.

4.10. Experimental setup


4.10.1 Xilinx Platform Studio
The Xilinx Platform Studio (XPS) is the development environment or GUI used for
designing the hardware portion of your embedded processor system. Embedded Development
Kit Xilinx Embedded Development Kit (EDK) is an integrated software tool suite for
developing embedded systems with Xilinx Micro Blaze and PowerPC CPUs. EDK includes a
variety of tools and applications to assist the designer to develop an embedded system right
from the hardware creation to final implementation of the system on an FPGA. System
design consists of the creation of the hardware and software components of the embedded
processor system and the creation of a verification component is optional. A typical
embedded system design project involves: hardware platform creation, hardware platform
verification (simulation), software platform creation, software application creation, and

51
software verification. Base System Builder is the wizard that is used to automatically
generate a hardware platform according to the user specifications that is defined by the MHS
(Microprocessor Hardware Specification) file. The MHS file defines the system architecture,
peripherals and embedded processors. The Platform Generation tool creates the hardware
platform using the MHS file as input.

Fig 4.3. Embedded Development Kit Design Flow


The creation of the verification platform is optional and is based on the hardware
platform. The MHS file is taken as an input by the Sim-gen tool to create simulation files for
a specific simulator. Three types of simulation models can be generated by the Sim-gen tool:
behavioral, structural and timing models. Some other useful tools available in EDK are
Platform Studio which provides the GUI for creating the MHS and MSS files. Create / Import
IP Wizard which allows the creation of the designer's own peripheral and import them into
EDK projects. Bit stream Initializer tool initializes the instruction memory of processors on
the FPGA. GNU Compiler tools are used for compiling and linking application executables
for each processor in the system [8]. There are two options available for debugging the
application created using EDK namely: Xilinx Microprocessor Debug (XMD) for debugging
the application software using a Microprocessor Debug Module (MDM) in the embedded
processor system, and Software Debugger that invokes the software debugger corresponding
to the compiler being used for the processor. Software Development Kit Xilinx Platform
Studio Software Development Kit (SDK) is an integrated development environment,
complimentary to XPS, that is used for C/C++ embedded software application creation and
verification. The software application can be written in a "C or C++" then the complete

52
embedded processor system for user application will be completed, else debug & download
the bit file into FPGA. Then FPGA behaves like processor implemented on it in a Xilinx
Field Programmable Gate Array (FPGA) device.

Fig 4.4: Hardware and Software flow


4.11. Design Flow
To build an embedded system on Xilinx FPGAs, the embedded development kit (EDK) is
used to complete the reconfigurable design Figure3.2 shows the design flow. Unlike the
design flow in the traditional software design using C/C++ language or hardware design
using hardware description languages, the EDK enables the integration of both hardware and
software components of an embedded system. For the hardware side, the design entry from
VHDL/Verilog is first synthesized into a gate-level netlist, and then translated into the
primitives, mapped on the specific device resources such as Look-up tables, flip-flops, and
block memories. The location and interconnections of these device resources are then placed
and routed to meet with the timing Constraints. A downloadable .bit file is created for the
whole hardware platform. The software side follows the standard embedded software flow to
compile the source codes into an executable and linkable file (ELF) format. Meanwhile, a
microprocessor software specification (MSS) file and a microprocessor hardware
specification (MHS) file are used to define software structure and hardware connection of the
system. The EDK uses these files to control the design flow and eventually merge the system
into a single downloadable file. The whole design runs on a real-time operating system
(RTOS).

53
Fig 4.5: Design flow
4.12. FPGA Design flow
The FPGA based Design Flow is extensively used in today’s world due to its following
advantages – short design time, easy to market etc. FPGA based Design Flow allows one to
implement his/her VLSI design in a very short duration, cater to customer needs and make
last minute changes. The FPGA based Design Flow consists of different stages as shown in
Fig.28.

Fig. 4.6. FPGA Design Flow

54
 Design Entry
 Synthesis
 Simulation
 Implementation
 Configuring or Programming the target device.
4.13. Design Entry and Simulation
The major drawback of traditional design methods is the manual translation of design
description into a set of logical equations. This step can be entirely eliminated with hardware
description languages (HDLs). For example, most HDL tools such as VHDL, Verilog HDL
tools allow the use of finite state machines for sequential systems and truth tables for
combinatorial modules. Such design descriptions can be automatically converted into HDL
code that can be implemented by Synthesis tools. Hardware description languages found
their principal application in programmable logic devices (PLDs) of various complexities,
from simple PLDs up to complex CPLDs and FPGAs. There are several HDL languages in
use today. The most popular ones are VHDL (Very High Speed Integrated Circuit HDL),
Verilog HDL and Abel.
4.13.1. Hardware implementation ISE/XPS Flow
 The ISE/XPS flow provides integration of a processor system at two levels as a
component in a FPGA design :
• The processor system is the top-level design
• The processor system is a sub module
 Once the processor system is added in the ISE project, XPS can be invoked from ISE
by selecting .xmp file in Sources window and double-clicking Manage Processor
System in the Processes window
 Add user constraint file in ISE

4.13.2. SOFTWARE IMPLEMENTATION OF ISE/XPS FLOW


 Four stages to perform software flow:
• Pre-processor: Replaces all macros with definitions as defined in the .c or .h
files
• Machine-specific and language-specific compiler: Compiles C/C++ code
• Assembler: Converts code to machine language and generates the object file
• Linker: Links all the object files using user-defined or default linker script

55
4.14 Spartan-3E Starter KIT
The Spartan-3E Starter Kit board highlights the unique features of the Spartan-3E shown
below in figure 6.17 FPGA family and provides a convenient development board for
embedded processing applications.
The board highlights these features:
• Spartan-3E specific features
 Parallel NOR Flash configuration
 Multi Boot FPGA configuration from Parallel NOR Flash PROM
 SPI serial Flash configuration
• Embedded development
 MicroBlaze™ 32-bit embedded RISC processor
 PicoBlaze™ 8-bit embedded controller
 DDR memory interfaces

Fig 4.7: Xilinx Spartan 3E FPGA kit

56
4.14.1. Key component features
The key features of the Spartan-3E Starter Kit board are:
1) Xilinx XC3S200E Spartan-3E FPGA

 Up to 232 user-I/O pins


 320-pin FBGA package
 Over 10,000 logic cells
2) Xilinx 4 Mbit Platform Flash configuration PROM
3) Xilinx 64-macrocell XC2C64A Cool Runner CPLD
4) 64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MHz
5) 16 MByte (128 Mbit) of parallel NOR Flash (Intel Strata Flash)

 FPGA configuration storage


 MicroBlaze code storage/shadowing
6) 16 Mbits of SPI serial Flash (STMicro)

 FPGA configuration storage


 MicroBlaze code shadowing
o 2-line, 16-character LCD screen
o PS/2 mouse or keyboard port
o VGA display port
o 10/100 Ethernet PHY (requires Ethernet MAC in FPGA)
o Two 9-pin RS-232 ports (DTE- and DCE-style)
o On-board USB-based FPGA/CPLD download/debug interface
o 50 MHz clock oscillator
o SHA-1 1-wire serial EEPROM for bit stream copy protection
o Hirose FX2 expansion connector
o Three Digilent 6-pin expansion connectors
o Four-output, SPI-based Digital-to-Analog Converter (DAC)
o Two-input, SPI-based Analog-to-Digital Converter (ADC) with programmable-gain
pre-amplifier
o Chip Scope™ Soft Touch debugging port
o Rotary-encoder with push-button shaft
o Eight discrete LEDs
o Four slide switches

57
4.15. Configuration Methods
A typical FPGA application uses a single non-volatile memory to store configuration
images. To demonstrate new Spartan-3E capabilities, the starter kit board has three different
configuration memory sources that all need to function well together. The extra configuration
functions make the starter kit board more complex than typicalSpartan-3E applications.
The starter kit board also includes an on-board USB-based JTAG programming interface.
The on-chip circuitry simplifies the device programming experience. In typical applications,
the JTAG programming hardware resides off-board or in a separate programming module,
such as the Xilinx Platform USB cable.
4.16. Voltages for all Applications
The Spartan-3E Starter Kit board showcases a triple-output regulator developed by Texas
Instruments, the TPS75003 specifically to power Spartan-3 and Spartan-3E FPGAs. This
regulator is sufficient for most stand-alone FPGA applications. However, the starter kit board
includes DDR SDRAM, which requires its own high-current supply. Similarly, the USB-
based JTAG download solution requires a separate 1.8V supply.
4.17. JTAG
JTAG primary purpose is to allow a computer to take control of the state of all the IO
pins on a board. In turn, this allows each device connectivity to other devices on the board to
be tested. Standard JTAG commands can be used for this purpose.
FPGAs are JTAG-aware and so all the FPGA IO pins can be controlled from the
JTAG interface. FPGAs add the ability to be configured through JTAG (using proprietary
JTAG commands).
JTAG consists of 4 signals: TDI, TDO, TMS and TCK. A fifth pin, TRST, is optional.
A single JTAG port can connect to one or multiple devices (as long as they are all JTAG-
aware parts). With multiple devices, you create what is called a "JTAG chain". The TMS and
TCK are tied to all the devices directly, but the TDI and TDO form a chain: TDO from one
device goes to TDI of the next one in the chain. The master controlling the chain (a computer
usually) closes the chain.
4.18. RS232
As shown in Figure 4.7, the Spartan-3E Starter Kit board has two RS-232 serial ports: a
female DB9 DCE connector and a male DTE connector. The DCE-style port connects
directly to the serial port connector available on most personal computers and workstations

58
via a standard straight-through serial cable. Null modem, gender changers, or crossover
cables are not required.
Use the DTE-style connector to control other RS-232 peripherals, such as modems or
printers, or perform simple loop back testing with the DCE connector.

Fig 4.8: RS 232 Serial ports


The FPGA supplies serial output data using LVTTL or LVCMOS levels to the Maxim
device, which in turn, converts the logic value to the appropriate RS-232 voltage level.
Likewise, the Maxim device converts the RS-232 serial input data to LVTTL levels for the
FPGA. A series resistor between the Maxim output pin and the FPGA’s RXD pin protects
against accidental logic conflicts.
Hardware flow control is not supported on the connector. The port’s DCD, DTR, and DSR
signals connect together, as shown in Figure 5.4. Similarly, the port’s RTS and CTS signals
connect together.
4.19. Universal Asynchronous Receiver/Transmitter (UART)
4.19.1. Introduction
The Universal Asynchronous Receiver Transmitter (UART) is a popular and widely-used
device for data communication in the field of telecommunication. There are different versions
of UARTs in the industry. Some of them contain FIFOs for the receiver/transmitter data
buffering and some of them have the 9 Data bits mode (Start bit + 9 Data bits + Parity + Stop
bits). This application note describes a fully configurable UART optimized for and
implemented in a variety of Lattice devices, which have superior performance and
architecture compared to existing semiconductor ASSPs (application-specific standard
products). This UART reference design contains a receiver and a transmitter. The receiver
performs serial-to-parallel conversion on the asynchronous data frame received from the

59
serial data input SIN. The transmitter performs parallel-to serial conversion on the 8-bit data
received from the CPU. In order to synchronize the asynchronous serial data and to insure the
data integrity, Start, Parity and Stop bits are added to the serial data. An example of the
UART frame format is shown in Figure 23 below.

Figure 4.9. UART Frame Format: (1 Start Bit, 8 Data Bits, 1 Parity Bit, 1 Stop Bit)
This design can also be instantiated many times to get multiple UARTs in the same
device. For easily embedding the design into a larger implementation, instead of using tri-
state buffers, the bi-directional data bus is separated into two buses, DIN and DOUT. The
transmitter and receiver both share a common internal Clk16X clock. This internal clock
which needs to be 16 times of the desired baud rate clock frequency is obtained from the on-
board clock through the MCLK input directly.
4.19.2. Features
 Functionally compatible with the NS16450 UART.
 Faster performance than industry standard hardwired devices.
 Inserts or extracts standard asynchronous communication bits (Start, Stop and Parity)
to or from the serial data.
 Holding and shifting registers eliminate the need for precise synchronization between
the CPU and serial data.
 Standard CPU Interface.
 Fully prioritized interrupt system control.
 MODEM interface functions (CTS, RTS, DSR, DTR, RI and DCD)
 Fully programmable serial interface characteristics:
a) 5, 6, 7 or 8-bit characters
b) Even, odd, or no-parity bit generation and detection
c) 1, 1.5 or 2-stop bit generation and detection
 False Start bit detection
 Interactive control signaling and status reporting capabilities

60
 Separate input and output data buses for use as an embedded module in a larger
design
 Receiver synchronizes off the Start bit
 Receiver samples all incoming bits at the center of each bit.
4.19.3. Operations Overview

Fig.4.10. UART General Block Diagram


Thus, from the general overview of the UART, we will extract out the desired
functionalities mainly of the transmitter, receiver and the baud-rate generator to develop the
Software Implementation of the UART as a serial data communication protocol required for
interfacing the FPGA-based AES processor with the PC.
4.20. Conclusion
In this chapter discuss about Hardware Implementation of project and description of the
each and every blocks in the block diagram.

61
CHAPTER -5
MATHEMATICAL ANALYSIS
5.1. Introduction
Any discussion of AES must begin with DES, the original Data Encryption Standard.
DES was selected as a Federal Information Processing Standard (FIPS) for the United States
in 1976. In 1977 the National Bureau of Standards (now the National Institute of Standards
and Technology, or NIST) adopted an IBM-designed cipher that encrypted 64-bit blocks
under 56- bit keys as the Data Encryption Standard (DES). It became widely used
internationally in many commercial applications, including financial transactions. The
algorithm remained controversial because of suspicions that the National Security Agency
had introduced deliberate weaknesses. But with only 56 bits of key, DES is now obsolete. In
its place many people are now using triple-DES, a multiple version of an algorithm that does
not perform particularly well. In 1997, the NIST announced its desire to choose a successor
to DES that could no longer be considered secure because of its small key size and the
increased availability of computing power. So NIST announced a competition for an
Advanced Encryption Standard (AES), an algorithm with 128-bit blocks and 128-, 192-, and
256-bit keys to replace DES. NIST sought a symmetric-key algorithm for “sensitive,
unclassified” information. The chosen algorithm would have to be available royalty-free
worldwide. Winners would get fame and glory—and probably a lot of consulting. And AES
would undoubtedly become one of the most widely used cryptographic algorithms in the
world. In 1998 twenty-one industry and academic groups offered candidates; fifteen met
NIST’s submission criteria.

62
On October 2, 2000, NIST announced its choice for the Advanced Encryption Standard:
Rijndael (pronounced “Rhine Dahl”), an algorithm developed by two Belgian cryptographers,
Joan Daemen and Vincent Rijmen. Rijndael should appeal to mathematicians; the
cryptosystem is quite algebraic. Rijndael repeats rounds, with the number of rounds
determined by key size. In the 128-bit key version, Rijndael runs for 10 rounds. As specified
in the call for algorithms, Rijndael operates on a 128-bit block of data. It divides the block
into sixteen 8-bit bytes and treats these as elements of GF(28), defined by the polynomial x8 +
x4 + x3 + x + 1, which is irreducible over Z/2Z. The data are placed in a 4 x 4 array, and all
operations occur on the bytes of the array. Each round consists of four operations: one
transforms the bytes, one transforms the rows, one transforms the columns, and one adds in
the key. First, each of the bytes is modified by maps easily described in the arithmetic of GF
(28): inversion (with zero mapped to itself) and an affine transformation; then the rows of the
array are shifted circularly, with the bytes of row i moving i - 1 locations to the right. Next
the bytes in each column are mixed by multiplication: view the column elements as
coefficients of a polynomial of degree 3, and multiply this polynomial by 3x3 + x2 + x + 2
modulo x4 + 1. The last operation is an XOR of the key bits with the elements of the array.
The polynomials used for the field arithmetic were determined by two criteria: (a)
arithmetic efficiency and (b) resistance to cryptanalytic attack. Though DES was first cracked
by brute-force attack that searched the entire key space, linear and differential cryptanalysis
and weak keys are serious attacks on the security of the algorithm. Rijndael’s multiplicative
map and affine transformation were chosen for their ability to resist these. The polynomial
3x3 + x2 + x + 2 was picked for its combination of fast multiplication and diffusion power.
(Diffusion is spreading changes in key or text bits into the cipher text.) NIST’s evaluation
used published research from academic and industry experts and private advice from the
National Security Agency (NSA). NIST based its decision on security, efficiency, and
“algorithm and implementation” characteristics (including hardware and software suitability
and simplicity). Security is difficult to assess. The breaking of an algorithm is clear, but there
are no proofs of security, only proofs that an algorithm passes the tests we currently know to
perform. By contrast, results of efficiency tests, even though only using current technology,
provide more definitive information. Efficiency tests were conducted in a variety of venues,
including fast implementations in C++, Java, assembler code, FPGAs (Field Programmable
Gate Arrays) and ASICs (Application Specific Integrated Circuits). All finalists were fine on
these measures, but some were finer than others. Why did NIST pick Rijndael? NIST judged
the submission to be “the best overall algorithm for the AES…Rijndael’s combination of
63
security, performance, efficiency, implements ability, and flexibility make it an appropriate
selection for the AES.” Rijndael’s cryptographic complexity rests on several well-studied
cryptographic transformations, and the algorithm is easy to describe. The algorithm performs
efficiently on a variety of platforms (NIST noted that it was a “good performer in hardware
and software across a wide range of computing environments”), and the algorithm is
relatively easy to defend against power and timing attacks. There were some comments that
the polynomials chosen for Rijndael’s primitives might lead to breaks. But GF (2n) is a field
that NSA knows well, and it is fair to assume that Rijndael passed NSA’s tests. Many of the
finest minds in the field submitted candidates, and the candidate algorithms were widely
reviewed, criticized, and discussed by experts around the world. As a result, AES is
considered to be a high quality and trustworthy solution for data encryption. AES became a
government standard in 2002. In 2003, the U.S. Government approved AES for use with
classified information. Today, it is one of the most popular algorithms used in symmetric key
cryptography.
5.2. AES FIPS-197-Algorithm
5.2.1. Introduction
This standard specifies the Rijndael algorithm ([3] and [4]), a symmetric block cipher that
can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256 bits.
Rijndael was designed to handle additional block sizes and key lengths, however they are not
adopted in this standard. Throughout the remainder of this standard, the algorithm specified
herein will be referred to as “the AES algorithm.” The algorithm may be used with the three
different key lengths indicated above, and therefore these different “flavors” may be referred
to as “AES-128”, “AES-192”, and “AES-256”. This specification includes the following
sections:
 Definitions of terms, acronyms, and algorithm parameters, symbols, and functions;
 Notation and conventions used in the algorithm specification, including the ordering
and numbering of bits, bytes, and words;
 Mathematical properties that are useful in understanding the algorithm;
 Algorithm specification, covering the key expansion, encryption, and decryption
routines;
 Implementation issues, such as key length support, keying restrictions, and additional
block/key/round sizes.

64
5.2.2. Definitions
1) Glossary of Terms and Acronyms
The following definitions are used throughout this standard:
 AES: Advanced Encryption Standard
 Affine Transformation: A transformation consisting of multiplication by a matrix
followed by the addition of a vector.
 Array: An enumerated collection of identical entities (e.g., an array of bytes).
 Bit: A binary digit having a value of 0 or 1.
 Block: Sequence of binary bits that comprise the input, output, State, and Round
Key. The length of a sequence is the number of bits it contains. Blocks are also
interpreted as arrays of bytes.
 Byte: A group of eight bits that is treated either as a single entity or as an array of 8
individual bits.
 Cipher: Series of transformations that converts plaintext to cipher text using the
Cipher Key. Cipher Key Secret, cryptographic key that is used by the Key Expansion
routine to generate a set of Round Keys; can be pictured as a rectangular array of
bytes, having four rows and Nk columns.
 Cipher text: Data output from the Cipher or input to the Inverse Cipher.
 Inverse Cipher: Series of transformations that converts cipher text to plaintext using
the Cipher Key.
 Key Expansion: Routine used to generate series of Round Keys from the Cipher Key.
 Plaintext: Data input to the Cipher or output from the Inverse Cipher.
 Rijndael: Cryptographic algorithm specified in this Advanced Encryption Standard.
 Round Key: Round keys are values derived from the Cipher Key using the Key
Expansion routine; they are applied to the State in the Cipher and Inverse Cipher.
 State: Intermediate Cipher result that can be pictured as a rectangular array of bytes,
having four rows and Nb columns.
 S-box: Non-linear substitution table used in several byte substitution transformations
and in the Key Expansion routine to perform a one-for-one substitution of a byte
value.
 Word: A group of 32 bits that is treated either as a single entity or as an array of 4
bytes.

65
2) Mathematical Preliminaries: All bytes in the AES algorithm are interpreted as finite
field elements using the notation introduced in Sec. 2.2.3.2. Finite field elements can be
added and multiplied, but these operations are different from those used for numbers. The
following subsections introduce the basic mathematical concepts needed for Sec. 2.2.5.
3)Addition: The addition of two elements in a finite field is achieved by “adding” the
coefficients for the corresponding powers in the polynomials for the two elements.
For example, the following expressions are equivalent to one another:

4) Multiplication: In the polynomial representation, multiplication in GF(28) corresponds


with the multiplication of polynomials modulo an irreducible polynomial of degree 8. A
polynomial is irreducible if its only divisors are one and itself. For the AES algorithm, this
irreducible polynomial is given by

Eq. 5.1.
It is also represented by {01}{1b} in hexadecimal notation. For example, {57} {83} = {c1},
because of the operations as shown:

The modular reduction by m(x) ensures that the result will be a binary polynomial of
degree less than 8, and thus can be represented by a byte. Unlike addition, there is no simple
operation at the byte level that corresponds to this multiplication. The multiplication defined
above is associative, and the element {01} is the multiplicative identity. For any non-zero
binary polynomial b(x) of degree less than 8, the multiplicative inverse of b(x), denoted b-
1(x), can be found as follows: the extended Euclidean algorithm is used to compute
polynomials a(x) and c(x) such that

66
It follows that the set of 256 possible byte values, with XOR used as addition and the
multiplication defined as above, has the structure of the finite field GF(2^8).
5) Multiplication by x: Multiplying the binary polynomial defined in Equation.) with the
polynomial x results in

Eq. 5.4
The result b(x) is obtained by reducing the above result modulo m(x), as defined in equation
(2.2.4.1). If b7 = 0, the result is already in reduced form. If b7 = 1, the reduction is
accomplished by subtracting (i.e., XORing) the polynomial m(x). It follows that
multiplication by x (i.e., {00000010} or {02}) can be implemented at the byte level as a left
shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes is denoted
by xtime(). Multiplication by higher powers of x can be implemented by repeated application
of xtime(). By adding intermediate results, multiplication by any constant can be
implemented.
6) Polynomials with Coefficients in GF(28)
Four-term polynomials can be defined - with coefficients that are finite field elements - as:

Eq.5.5
which will be denoted as a word in the form [a0 , a1 , a2 , a3 ]. Note that the polynomials in this
section behave somewhat differently than the polynomials used in the definition of finite field
elements, even though both types of polynomials use the same indeterminate, x. The
coefficients in this section are themselves finite field elements, i.e., bytes, instead of bits;
also, the multiplication of four-term polynomials uses a different reduction polynomial,
defined below. The distinction should always be clear from the context.
To illustrate the addition and multiplication operations, let

Eq. 5.6
define a second four-term polynomial. Addition is performed by adding the finite field
coefficients of like powers of x. This addition corresponds to an XOR operation between the
corresponding bytes in each of the words – in other words, the XOR of the complete word
values. Thus, using the equations of (5.5) and (5.6),

Eq. 5.7
Multiplication is achieved in two steps. In the first step, the polynomial product c(x) = a(x)
b(x) is algebraically expanded, and like powers are collected to give

67
Eq. 5.8
Where,

Eq. 5.9
The result, c(x), does not represent a four-byte word. Therefore, the second step of the
multiplication is to reduce c(x) modulo a polynomial of degree 4; the result can be reduced to
a polynomial of degree less than 4. For the AES algorithm, this is accomplished with the
polynomial x4 + 1, so that

Eq.5.10
The modular product of a(x) and b(x), denoted by a(x) b(x), is given by the four-term
polynomial d(x), defined as follows:

Eq. 5.11
With

Eq.5.12
When a(x) is a fixed polynomial, the operation in Eq.2.2.4.11 can be written in matrix form
as:

Eq.5.13
Because x^41 is not an irreducible polynomial over GF(2^8), multiplication by a fixed four-
term polynomial is not necessarily invertible.

68
However, the AES algorithm specifies a fixed four-term polynomial that does have an
inverse.

Eqs. 5.14, 5.15


Another polynomial used in the AES algorithm (see the RotWord() function) has a0= a1 =
a2 = {00} and a3 = {01}, which is the polynomial x3. Inspection of equation (2.2.4.13) above
will show that its effect is to form the output word by rotating bytes in the input word. This
means that [b0, b1, b2, b3] is transformed into [b1, b2, b3, b0].
7) AES Algorithm-Block Overview and Specification:

For the AES algorithm, the length of the input block, the output block and the State is 128
bits. This is represented by Nb = 4, which reflects the number of 32-bit words (number of
columns) in the State. For the AES algorithm, the length of the Cipher Key, K, is 128, 192, or
256 bits. The key length is represented by Nk = 4, 6, or 8, which reflects the number of 32-bit
words (number of columns) in the Cipher Key. For the AES algorithm, the number of rounds
to be performed during the execution of the algorithm is dependent on the key size. The
number of rounds is represented by Nr, where Nr = 10 when Nk = 4, Nr = 12 when Nk = 6,
and Nr = 14 when Nk = 8. The only Key-Block-Round combinations that conform to this
standard are given in Fig. 4. For implementation issues relating to the key length, block size
and number of rounds,

Table 5.1: key block round combinations

69
The various operational blocks required and the state flow in our design consideration
of the AES-128 algorithm is shown here:

Fig: 5.1: Pseudo code for the Cipher


8)Sub Bytes()Transformation: The Sub Bytes() transformation is a non-linear byte
substitution that operates independently on each byte of the State using a substitution table
(S-box). This S-box (Fig. 7), which is invertible, is constructed by composing two
transformations:
1. Take the multiplicative inverse in the finite field GF(2^8), the element {00} is
mapped to itself.
2. Apply the following affine transformation (over GF(2) ):

Eq. 5.17
for,0 -i-8 where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the
value {63} or {01100011}. Here and elsewhere, a prime on a variable indicates that the
variable is to be updated with the value on the right.

In matrix form, the affine transformation element of the S-box can be expressed as:
The S-box used in the SubBytes() transformation is presented in hexadecimal form in Fig. 7.
For example, if S1,1={53}, then the substitution value would be determined by the
intersection of the row with index ‘5’ and the column with index ‘3’.

70
Fig 5.2: Substitution Values for the byte xy (in hexadecimal format)
9) Shift Rows() Transformation: In the Shift Rows() transformation, the bytes in the last
three rows of the State are cyclically shifted over different numbers of bytes (offsets). The
first row, r = 0, is not shifted. Specifically, the Shift Rows() transformation proceeds as
follows:

Eq. 5.18
where the shift value shift (r, Nb) depends on the row number, r, as follows
(recall that Nb = 4):
shift(1,4) =1; shift(2,4) = 2 ; shift(3,4) = 3 . Eq. (5.19)
This has the effect of moving bytes to “lower” positions in the row (i.e., lower values of c in a
given row), while the “lowest” bytes wrap around into the “top” of the row (i.e., higher
values of c in a given row). Figure 8 illustrates the Shift Rows() transformation.

Fig 5.3. shift Rows () cyclically shifts the last three rows in the state

71
10) Mix Columns() Transformation: The Mix Columns() transformation operates on the
State column-by-column, treating each column as a four-term polynomial as described. The
columns are considered as polynomials over GF(28) and multiplied modulo x4 + 1 with a
fixed polynomial a(x), given by,

Eq.5.20

Eq.5.21
11). Add RoundKey() Transformation: In the Add RoundKey() transformation, a Round
Key is added to the State by a simple bitwise XOR operation. Each Round Key consists of
Nb words from the key schedule Those Nb words are each added into the columns of the
State, such that:

Eq.5.22
where [wi] are the key schedule words described in Sec. 2.2.5.2, and round is a value in the
range 0<= round <= Nr. In the Cipher, the initial Round Key addition occurs when round = 0,
prior to the first application of the round function (see Fig. 5). The application of the Add
Round Key() transformation to the Nr rounds of the Cipher occurs when 1<= round <= Nr.
The action of this transformation is illustrated in Fig. 10, where l = round * Nb. The byte
address within words of the key schedule was described in.
12) Key Expansion: The AES algorithm takes the Cipher Key, K, and performs a Key
Expansion routine to generate a key schedule. The Key Expansion generates a total of Nb (Nr
+ 1) words: the algorithm requires an initial set of Nb words, and each of the Nr rounds
requires Nb words of key data. The resulting key schedule consists of a linear array of 4-byte
words, denoted [wi ], with i in the range 0 <= i < Nb(Nr + 1). The expansion of the input key
into the key schedule proceeds according to the pseudo code. Sub Word() is a function that
takes a four-byte input word and applies the S-box to each of the four bytes to produce an
output word. The function RotWord() takes a word [a0,a1,a2,a3] as input, performs a cyclic
permutation, and returns the word [a1,a2,a3,a0]. The round constant word array, Rcon[i],
contains the values given by [x^(i-1),{00},{00},{00}], with x^(i-1) being powers of x (x is
denoted as {02}) in the field GF(28), as discussed. (note that i starts at 1, not 0).From Fig. 11,

72
it can be seen that the first Nk words of the expanded key are filled with the Cipher Key.
Every following word, w[i], is equal to the XOR of the previous word, w[i-1], and the word
Nk positions earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a
transformation is applied to w[i-1] prior to the XOR, followed by an XOR with a round
constant, Rcon[i]. This transformation consists of a cyclic shift of the bytes in a word
(RotWord()), followed by the application of a table lookup to all four bytes of the word
(SubWord()). It is important to note that the Key Expansion routine for 256-bit Cipher Keys
(Nk = 8) is slightly different than for 128- and 192-bit Cipher Keys. If Nk = 8 and i-4 is a
multiple of Nk, then SubWord() is applied to w[i-1] prior to the XOR.
13) Decryption ( Inverse Cipher Generation): The Cipher transformations in Section. can
be inverted and then implemented in reverse order to produce a straightforward Inverse
Cipher for the AES algorithm. The individual transformations used in the InverseCipher –
InvShiftRows(),InvSubBytes(), InvMixColumns(), and Add RoundKey() – process the State
and are described in the following subsections.The Inverse Cipher is described in the pseudo
code in Fig. 12. In Fig. 12, the array contains the key schedule, which was described
previously in
14.a. Inv Shift Rows() Transformation: Inv Shift Rows() is the inverse of the Shift Rows()
transformation. The bytes in the last three rows of the State are cyclically shifted over
different numbers of bytes (offsets). The first row, r = 0, is not shifted. The bottom three rows
are cyclically shifted by Nb - shift(r, Nb) bytes, where the shift value shift (r,Nb) depends on
the row number, and is given in equation.
Specifically, the Inv Shift Rows() transformation proceeds as follows:

Eq.5.23
Figure 13 illustrates the Inv Shift Rows() transformation.

73
14.b)Inv Sub Bytes() Transformation: InvSubBytes() is the inverse of the byte substitution
transformation, in which the inverse Sbox is applied to each byte of the State. This is
obtained by applying the inverse of the affine transformation followed by taking the
multiplicative inverse in GF (28).
The inverse S-box used in the InvSubBytes() transformation is presented in Fig. 14:

Fig 5.4: Inverse S- box : Substitution values


15) Inv MixColumns()Transformation: InvMixColumns() is the inverse of the
MixColumns() transformation. Inv Mix Columns() operates on the State column-by-column,
treating each column as a four-term polynomial as described in Sec. 4.3. The columns are con
sidered as polynomials over GF(28) and multiplied modulo x^4 + 1 with a fixed polynomial
a-1(x), given by

Eq.5.14
As described in Sec. 2.2.4.3, this can be written as a matrix multiplication.

Eq. 5.15
As a result of this multiplication, the four bytes in a column are replaced by the following:

74
16) Inverse of the Add Round Key() Transformation: Add Round Key(), which was
described in Sec. 2.2.5.1.4, is its own inverse, since it only involves an application of the
XOR operation.
17) Equivalent Inverse Cipher—the adopted method to improve speed of Operation:
In the straightforward Inverse Cipher presented in Sec. 2.2.5.3 and Fig. 12, the sequence of
the transformations differs from that of the Cipher, while the form of the key schedules for
encryption and decryption remains the same. However, several properties of the AES
algorithm allow for an Equivalent Inverse Cipher that has the same sequence of
transformations as the Cipher (with the transformations replaced by their inverses). This is
accomplished with a change in the key schedule.
The two properties that allow for this Equivalent Inverse Cipher are as follows:
1. The SubBytes() and ShiftRows() transformations commute; that is, a SubBytes()
transformation immediately followed by a ShiftRows() transformation is equivalent
to a ShiftRows() transformation immediately followed buy a SubBytes()
transformation. The same is true for their inverses, InvSubBytes() and InvShiftRows.
2. The column mixing operations – MixColumns() and InvMixColumns() – are linear
with respect to the column input, which means
InvMixColumns (stateXORRoundKey)=InvMixColumns(state) XOR
InvMixColumns(Round Key).
These properties allow the order of InvSubBytes() and InvShiftRows() transformations to
be reversed. The order of the AddRoundKey() and InvMixColumns() transformations can
also be reversed, provided that the columns (words) of the decryption key schedule are
modified using the InvMixColumns() transformation.
5.3. Implementation Issues
 Key Length Requirements: An implementation of the AES algorithm shall support at
least one of the three key lengths specified in Sec. 5: 128, 192, or 256 bits (i.e., Nk = 4, 6,
or 8, respectively). Implementations may optionally support two or three key lengths,
which may promote the interoperability of algorithm implementations.

75
 Keying Restrictions: No weak or semi-weak keys have been identified for the AES
algorithm, and there is no restriction on key selection.
 Parameterization of Key Length, Block Size, and Round Number: This standard
explicitly defines the allowed values for the key length (Nk), block size (Nb), and number
of rounds (Nr) – see Fig. 4. However, future reaffirmations of this standard could include
changes or additions to the allowed values for those parameters. Therefore, implementers
may choose to design their AES implementations with future flexibility in mind.
 Implementation Suggestions Regarding Various Platforms: Implementation variations
are possible that may, in many cases, offer performance or other advantages. However,
given the same input key and data (plaintext or ciphertext), any implementation that
produces the same output (ciphertext or plaintext) as the algorithm specified in this
standard is an acceptable implementation of the AES. Thus, in this project, unlike the
usual implementation of Inverse Cipher, we have adopted the Equivalent Inverse Cipher
because of its potential advantages of reuse Gate-level implementation. We also propose
to merge, at least to some extent Subbytes() and Shiftrows() transformations at
encryption as well as Invsubbytes() and Invshiftrows() at decryption by effectively
eliminating the two steps, otherwise required, first to convert the byte operation into a
word operation and next to apply the shift operation. This is done using a single operation
BtoWconvandShift() instead. Also, in Mixcolumns() and Invmixcolumns()
transformations, designing the algorithm using component reuse/calling technique
effectively eliminates the otherwise complex and even the most tedious mathematical
operations that would be required.
5.4 Advantages and Limitations of AES algorithm
In most ciphers, the iterated transform (or round) usually has a Feistel Structure. Typically
in this structure, some of the bits of the intermediate state are transposed unchanged to
another position (permutation). The major advantage of the AES algorithm is that it does not
have a Feistel structure but is composed of three distinct invertible transforms based on the
Wide Trial Strategy design method. The Wide Trial Strategy design method provides
resistance against linear and differential cryptanalysis. In the Wide Trail Strategy, every layer
has its own function:
 The linear mixing layer: guarantees high diffusion over multiply rounds
 The non-linear layer: parallel application of S-boxes that have the optimum worst-
case non-linearity properties.

76
 The key addition layer: a simple XOR of the round key to the intermediate state
5.4.1. Advantageous Features
 Key lengths of 128, 192, and 256 bits are supported. Each step in key size requires
only two additional rounds. The decipher is simply, the inverse of the cipher.
 Effective and easier implementations both in Software and Hardware approaches.
 Easier design and reduction in number of additional instructions and efficient
utilization rate because of similar algorithms adopted for both encryption and
decryption only with an additional timing.
 There are no weak or semi-weak keys in the Advanced Encryption Algorithm.
 By using a true low level bit-serial approach, minimum cost AES co-processor
architecture can be achieved. This architecture can be used in many military,
industrial, and commercial applications that require compactness and low cost.
 It has much higher strength of the key security as compared to that of the asymmetric
key cryptographic methods such as RSA, Elliptical Curve Cryptography.
 It is more resistant to theoretical attacks such as linear and differential crypt analysis
and weak keys. And also resistant to various attacks on implementations such as
timing and power attacks.
 It occupies minimum space due its inherent properties of modularity, regularity and
availability that greatly helps in instruction level parallelism potentialities.

Fig.5.5. Features of AES candidate algorithm

77
Fig.5.6. Comparison of key sizes in conventional and public key cryptography
5.4.2. Cryptanalytic progress against AES: No effective breaks affecting the AES
algorithm yet because finding a secret key is computationally infeasible. It may be largely
attributed to the following considerations:
 The complexity of the sequence of operations and the operations themselves
performed in the algorithm, that too for a large number of iterations and
 The complexity of the order of the key-space.
Then the solution to find a secret key would be a function of its key length ‘n’ (say), then
the number of operations required would be a function, O (2^n). Then one can hardly
imagine the exhaustive search that may find the secret key required in the 128-, 192-, or 256-
bit key spaces. For a chosen 128-bit key space, the effort required would be 2^128, which is
a magnificent 3 x 10 E 38. Then even with an approximately trillion number of chips that
would operate at 1000GHz frequency, it would take at least a million years to exhaustively
search a 128-bit key space and hence, one need not again say of the next higher 192-bit or
256-bit key space strength. The analysis figure below would represent the rough estimation of
finding a secret key from AES algorithm.

Table:5.2. Analysis of the effort needed to break AES ALGORITHM

And the storage requirement to allocate such huge number of encryption and decryption
operations (to construct the two tables in order to assist in searching the required secret key)

78
for the key space would also be analogously a large amount. Thus, if we are ready to afford
these enormous costs and the unimaginably large electric bills particularly meeting the above
said conditions, at least for a million years continuously, perhaps we may break the secret
key!!!
5.4.3. Limitations and the possible attacks: The main limitation of the Advanced
Encryption Algorithm which is a major development in symmetric key algorithms would be
same as that of the major drawback of the conventional cryptography— that is the
distribution of the secret key between the two communicating parties without the third-party
intervention would be the major weak link. No matter how strong a cryptosystem would be,
if an intruder could steal the key at least while communicating through the weak channel, the
whole system would render useless. So, it has to take advantage of the public key algorithms
at least for the purpose of safe key-distribution through the channel.
Another major offset is that AES is quite susceptible to the new type attack on the
cache behavior, if implemented in a Microprocessor/ DSP-based processor. If the attacker
can access the machine where AES runs, secret key can be retrieved in a fraction of a second.
Perhaps this type of attack can be minimized in our present idea of implementation through
the programmable logic devices such as FPGAs, CPLDs, ASICs which would act as virtual
processors that completely minimizes the burden on the actual processors.
5.5. The Security of AES and the future trend
Some cryptographers still have concerns about the security of AES. A common attack on
block ciphers is to attack the algorithm with a reduced number of rounds. At the time of this
writing, attacks on AES exist for seven rounds with 128-bit keys, eight rounds with 192-bit
keys, and nine rounds with 256-bit keys. Recall that the full implementation of AES uses 10,
12, and 14 rounds with 128-, 192-, and 256-bit keys, respectively. There is concern that there
is not enough distance between the attack for a seven-round encryption and the actual ten-
round implementation and that there is a risk these attacks could be improved to break the
cipher. Another worry results from the mathematical structure of AES. In contrast to most
ciphers, AES has a concise and elegant algebraic structure. There is concern among some
cryptographers that an attack based on new insights into this formulation could be successful.
AES appears to be secure as of the work done in late 2006 during our project working
period. The largest well-known brute force attack occurred in 2002 against a 64-bit RC5 key.
With a key size of at least 128 bits, AES is well out of reach of brute force attacks by normal
adversaries for years if not decades. AES is efficient, elegant, and secure. It will be a top

79
choice for data security in the next decade and beyond with this safety and visible security of
AES for many years together, there can be a tremendous growth in the development of low
cost and highly pipelined processors with minimum size requirements that would even suit
the ultimate requirements of the common man applications and the smaller organizational
needs ranging from smart cards to internal database locking and regulated distributed in the
organizations. The advanced implementations of AES and DES together such as Advanced
Parallel Encryption Standard (APES), Advanced Data Encryption Standard (ADES), etc.,
would be a practical possibility in all the security demands and network applications of secret
key cryptography.
5.6. Applications
Vendors of both hardware and software have enthusiastically adopted AES. Because AES
uses a simple and efficient algorithm, using it as an encryption specification decreases system
complexity, lowers costs, and promotes interoperability. There are many areas where AES is
now in commercial use.
1. Most high-end VPN software contains implementations of AES, including offerings
from Checkpoint, Cisco, and Symantec.
2. AES is now commonly found in Network Appliances.
3. Voice Over IP vendors are using AES for telephone security.
4. Vendors now use AES to provide security for process control (SCADA) systems.
5. AES has even been added to common file compression programs, such as WinZip.
6. Dozens of hardware implementations are available that use both FPGAs and ASICs.
7. There are multiple implementations in software in the public domain such as Smart-
Cards security systems.
Thus, it seems that there is no security system and sensitive data transfer that cannot use,
rely and be operated upon the most efficient and highly structured algorithm— Advanced
Encryption Standard.
5.7. Conclusion:
In This chapter Deals with Mathematical preliminaries and overview of project.

80
CHAPTER-6
OUTPUT VERIFICATION
6.1. Introduction
The functional verification was carried out for all the test cases and hence the Xilinx
platform studio is taken to the synthesis process using the Xilinx tool.
6.2. Synthesis Process
The synthesis process will be carried out by giving the XPS model as the input to the tool.
This XPS modeling requires Spartan 3 board for the implementation. Hence the Spartan
board is selected and the whole process flow will be carried out in the Xilinx tool and finally
is generated which is used for dumping on the board
6.3. Xilinx Platform Studio Outputs
ENCRYPTION:

Fig 6.1 : HyperTerminal Encryption Output

81
DECRYPTION:

Fig 6.2 : HyperTerminal Decryption Output

Fig 6.3 : XPS Synthesis report

82
CHAPTER – 7
CONCLUSION AND FUTURE SCOPE
7.1 Introduction
The main aim of the project is to provide security for the Encrypted and Decrypted data.
These algorithms can be used for many applications. They are as follows.
7.2 Applications
1. This standard may be used by Federal department and agencies when an agency
determines that sensitive (unclassified) information (as defined in P.L. 100-235)
require cryptographic protection.
2. Security purposes.
3. Medical field.
4. Network Security.
5. online bank security.
6. Secure video teleconferencing.
7. Routers and remote access servers
8. High speed ATM/ Ethernet/Fiber-channel switches.
9. In addition , This standard may be adopted and used by non-Federal Government
organizations. Such use is encouraged when it provides the desired security for
commercial and private organizations.
7.3. Advantages
1 Through AES, input message of length 128 bits can be encrypted which is more than
the DES and triple DES.
2 ASE has the various secret key lengths such as 128 bits, 192 bits and 256 bits,
Whereas DES and Triple DES have fixed length of 64 bits.
3 The cipher key is expanded into a larger key, which is later used for the actual
operation.
4 The expanded key shall Always be derived from the cipher and never be specified
directly.
5 AES is very hard to attack or crack when compared to DES.
6 AES will be faster when compared to the Triple DES.

83
7.4 Conclusion
The project work aims at implementing the secure data communication between any two
users based on the realization of advanced Symmetric-key Cryptographic algorithm called
Advanced Encryption Standard (AES) on an FPGA based processor. Basically, starting with
the selection of highly-structured and immensely secure Advanced Encryption Standard
Algorithm, and making suitable modifications in the AES algorithm to improve the Speed
and the Parallelism of instruction execution, which is designed selectively in a superior
Description Language— System C, simulated with a powerful debugging tool from Hyper
terminal, Spartan 3 EDK kit, and then synthesized in Xilinx Platform Studio with Speed as an
optimization goal aimed at reducing the unrelated logic and improving the maximum clock-
rate particularly targeted on a low cost, high speed and highly efficient architectural FPGA
chip SPARTAN-III-EDK using the low cost and Graphical User- Friendly (GUI)
configuration tool from SANDS, FPGA/CPLD Development Platform Software v 1.1, we
have ultimately achieved the proven tremendous performance and cost-effective parameters
of the hardware implementation of the Advanced Encryption Algorithm (AES) that suits the
greatest security demands from a wide variety of users and applications.
So, In future, there is a definite hope of vast utilization of the improved versions of AES
processors such as APES and ADES, wherein we may witness much greater security due to
increased key length as well as bit length and the enormous speeds of even the bulk
encryption/decryption achieved by employing sophisticated parallel execution schemes.
7.5. Future scenario and suggestions
The possibility of future scope in this domain and the implementation suggestions can be
directly mentioned by the Scope and Suggestion statements as follows:
1. New Algorithms and Improved Speed: The existing algorithms AES and DES could be
modified into Advanced Parallel Encryption Standard (APES) and Advanced Data
Encryption Standard (ADES) respectively by increasing the key length and bit length. By
this the probability method of predicting the data can be much dramatically increased
even compared with AES algorithm. Even the Speed of the bulk encryption/decryption
can be improved because of the Parallel Schemes employed.
2. Improvement in security: The probability of cracking the key becomes much less and
hence, the transmitted data will be more secure. Improvement in security may further be
possible by completely eliminating not only the precise timing attacks but also all the rest
of the side-channel attacks.

84
3. Improvements in FPGA and EDA tools: Modified algorithms would demand
implementations increasingly in FPGA rather than the DSP domain due to the further
possible growth in the fast processing, low power consumption and reduced size of VLSI
and evolution of the powerful EDA tools to implement.

85
CHAPTER-8
REFERENCES

[1] S. Sau , C. Pal and A Chakrabarti “Design and Implementation of Real Time Secured
RS232 Link for Multiple FPGA Communication, Proc. Of International Conference on
Communication, Computing & Security”,2011, ISBN - 978- 1-4503-0464- 1.

[2] C. D. Walter. August 1999. Montgomery's Multiplication Technique: “How to Make It


Smaller and Faster. Cryptographic Hardware and Embedded Systems, Lecture Notes in
Computer Science”, Springer. No. 17 17. pp. 80-93.

[3] A Mazzeo, L. Romano, G. P. Saggese and N. Mazzocca. 2003. “FPGABased


Implementation of a Serial RSA Processor”. Design. Proceedings of the conference on
Design, Automation and Test in Europe - Volume I. ISBN:O- 7695- 1870-2 .

[4] xilkernel_v3.00.pdf on www.xilinx .com.

[5] R. L. Rivest et al. 1978. “A Method for Obtaining Digital Signatures and Public-Key
Cryptosystems. Communications of the ACM”. Vol. 2 1. pp. 120- 126.

[6] “Cryptography & Network Security By Behrouz AForouzan”.

[7] “Montgomery Algorithm for Modular Multiplication Professor Dr. D. J. Guan” ,August
25, 2003.

[8] “RSA & Public Key Cryptography in FPGAs, John Fry, Martin Langhammer Altera
Corporation” - Europe

[9] “A. Tenca, C. Koc. 1999. A Scalable Architecture for Montgomery Multiplication.
Cryptographic Hardware and Embedded Systems”, Lecture Notes in Computer Science, No.
17 17, pp. 94- 108.

[10]. “A. Tenca, G. Todorov, C. Koc. May 200 1. High-radix design of a scalable modular
multiplier. Cryptographic Hardware and Embedded Systems”, Lecture Notes in Computer
Science, Springer. No. 2 162. pp. 185- 20 1. [II] High-Speed RSA Implementation, Cetin
Kaya Koc, November 1994, Version 2.0, ftp://ftp.rsa.comlpub/pdfs/tr20I.pdf.

[ 12] ] http://csrc.nist.gov/publications/fips/fipsI97Ifips-197.pdf.

[ 13] http://www.design-reuse.comlarticlesIl398 1 /fpga-implementation-ofaes- encryptio n-


and-decryptio n. html.

[ 14] B. Schneier. 1996. “Applied Cryptography, Protocols, Algorithms, and Source Code in
C”, John Wiley and Sons Inc. 2nd Edition. New York, U.S.A.

[ 15] “G.B. Arfken, D.F. Griffing , D.C. Kelly and J priest. University Physics San Diego,
CA Harcourt Brace”, Jovanovich Publishers , 1989.

86
[ 16] http://www.techmaish.comlmaximum-internet-speed-available-in-theworld/.

[ 17] “D. E. knuth , The Art of Computer Programming Semi numeritical Algorithm”,
Volume 2, Reading M.A. : Addison Wasley, Second Edition, 198 1.

[ 18] Qing Li , Caroline Yao "Real-Time Conce pts for Embedded Systems".

[ 19] Tran Nguyen Bao Anh*t, Su-Lim Tant “Survey and performance evaluation of real-time
operating systems (RTOS) for small microcontrollers", *Renesas Technology Singa pore,
Singapore Engineering Centre, Singapore 098632, t Sc h o ol of Computer Engineering,
Nanyang Technological University, Singapore 639708.

[20] Awais M. Kamboh, Adithya H. Krishnamurthy and Jaya Krishna K. Vallabhaneni


"Demonstration of Multitasking using ThreadX RTOS on Microblaze and PowerPC"

[21] “Operating system for Xilinx embedded processor" at http://www.em.avnet.com.

[22] Sarat Yoowattana, Chinnapat Nantajiwakornchai, Manas Sangworasil "A Design of


Embedded DMX5 12 Controller using FPGA and XILKernel" ,2009 IEEE Symposium on
Industrial Electronics and Applications (lSIEA 2009), October 4-6, 2009, Kuala Lumpur,
Malaysia. http://www.xilinx.com

[23] M. Ibrahimy, M.B.Reaz, K.Asaduzzaman and S.Hussain. 2007. “FPGA Implementation


of RSA Encryption Engine with Flexible Key Size”. International Journal of
Communications. Issue 3. Volume I.

87
CHAPTER – 9
BIBILOGRAPHY
9.1. Book References:
1. Computer Networks, Third Edition, By Andrew S. Tanenbaum;
2. Cryptography and Network Security- Principles and Practices, Third Edition, By
William Stallings;
3. Digital Design- Principles and Practices, Third Edited Updated, By John F. Wakerly;
4. A VHDL Primer, Third Edition, By J. Bhasker;
5. Basic VLSI Design, Third Edition, By Pucknell & Eshraghian;
9.2. Web References:
1. AES FIPS-197 page available via http://www.nist.gov/CryptoToolkit;
2. Computer Security Objects Register (CSOR): http://csrc.nist.gov/csor/;
3. J. Daemen and V. Rijmen, AES Proposal: Rijndael, AES Algorithm Submission,
September 3, 1999, available at [1].
4. Xilinx tools and datasheets on FPGAs from www.xilinx.com;
5. www.giac.org;
6. www.tldp.org;
7. e-books on VHDL from www.aldec.com/downloads;
8. www.latticesemiconductor.com;

88
APPENDIX – A
SYSTEM C CODE
#include <stdio.h>
#include <string.h>
#define MAXBC (256/32)
#define MAXKC (256/32)
#define MAXROUNDS 14
#define SC ((BC - 4) >> 1)
typedef unsigned int uint ;
typedef unsigned char word8;
typedef unsigned short word16;
typedef unsigned long word32;

void main_aes(uint first_key[],uint datain[],int key_bits,int block_bits,int enc_dec,uint


dataout[]) ;

int rijndaelKeySched (word8 k[4][MAXKC], int keyBits, int blockBits,


word8 rk[MAXROUNDS+1][4][MAXBC]);
int rijndaelEncrypt (word8 a[4][MAXBC], int keyBits, int blockBits,
word8 rk[MAXROUNDS+1][4][MAXBC]);
int rijndaelDecrypt (word8 a[4][MAXBC], int keyBits, int blockBits,
word8 rk[MAXROUNDS+1][4][MAXBC]);
void print_result(uint temp[], int len) ;

static word8 shifts[3][4][2] = {


0, 0,
1, 3,
2, 2,
3, 1,

0, 0,
1, 5,
2, 4,

89
3, 3,

0, 0,
1, 7,
3, 5,
4, 4
};

word8 Logtable[256] = {
0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3,
100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193,
125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120,
101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142,
150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56,
102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16,
126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186,
43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87,
175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232,
44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160,
127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123, 183,
204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157,
151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209,
83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171,
68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165,
103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7,
};

word8 Alogtable[256] = {
1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53,
95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170,
229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49,
83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205,
76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136,
131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154,
90
181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163,
254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160,
251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65,
195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117,
159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128,
155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84,
252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202,
69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14,
18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23,
57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1,
};

word8 S[256] = {
99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118,
202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192,
183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21,
4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117,
9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132,
83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207,
208, 23 9, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168,
81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210,
205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115,
96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219,
224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121,
231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8,
186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138,
112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158,
225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223,
140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22,
};

word8 Si[256] = {
82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251,
124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203,
91
84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78,
8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37,
114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146,
108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132,
144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6,
208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107,
58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115,
150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110,
71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27,
252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244,
31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95,
96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239,
160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97,
23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125,
};

word32 rcon[30] = {
0x01,0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab,
0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d,
0xfa, 0xef, 0xc5, 0x91, };

uint
initial_key[]={0xd5d0d92a,0xd3a90372,0x9089018b,0x9fca4c3b,0x53198a16,0x561ce01f} ;
uint
initial_data[]={0x12121212,0x22334455,0x00000000,0x00000000,0x00000000,0x00000000
,0x00000000,0x00000000} ;
uint
last_data[]={0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0x00000000,0
x00000000,0x00000000} ;

int main()
{
int data_num=256 ;
int key_num=192 ;
92
xil_printf("\n**** Key length is : %d\n",key_num) ;
xil_printf("\n**** Data length is : %d\n",data_num) ;

xil_printf("This is Encryption") ;
main_aes(initial_key,initial_data,key_num,data_num,1,last_data) ;

xil_printf("\nThis is Decryption") ;
main_aes(initial_key,last_data,key_num,data_num,0,initial_data) ;

return 0 ;
}

//////////////////////////////////////////////////////////////////////////
void main_aes(uint first_key[],uint datain[],int key_bits,int block_bits,int enc_dec,uint
dataout[])
{
int i,j ;
uint temp_byte ;

uint temp_data[8] ;
uint temp_key[6] ;

word8 data[4][MAXBC]={
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0
};

word8 initial_key[4][MAXKC]={
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,
93
0,0,0,0,0,0,0,0
};
word8 keys[MAXROUNDS+1][4][MAXBC] ;

xil_printf("\nFirst_key is : \n") ;
print_result(first_key,key_bits/32) ;
xil_printf("\nDatain is : \n") ;
print_result(datain,block_bits/32) ;

for(i=0 ; i < (key_bits/32) ; i++)


temp_key[i]=first_key[i] ;

for(i=0 ; i < (key_bits/32) ; i++)


for (j=0 ; j < 4 ; j++)
{
temp_byte = temp_key[i] ;
temp_byte = temp_byte << (j*8) ;
initial_key[j][i] = ((temp_byte & 0xff000000) >> 24 ) ;
}

for(i=0 ; i < (block_bits/32) ; i++)


temp_data[i]=datain[i] ;

for(i=0 ; i < (block_bits/32) ; i++)


for (j=0 ; j < 4 ; j++)
{
temp_byte = temp_data[i] ;
temp_byte = temp_byte << (j*8);
data[j][i] = ((temp_byte & 0xff000000) >> 24 );
}

/* xil_printf("key\n") ;
for(i=0 ; i < 4 ; i++)
{
94
for ( j=0 ; j < (key_bits/32) ; j++)
xil_printf(" %x ",initial_key[i][j]) ;
xil_printf("\n");
}*/

xil_printf("Data is : \n") ;
for(i=0 ; i < 4 ; i++)
{
for ( j=0 ; j < (block_bits/32) ; j++)
xil_printf(" %x ",data[i][j]) ;
xil_printf("\n");
}

rijndaelKeySched ( initial_key , key_bits , block_bits , keys ) ;

if ( enc_dec == 1 )
rijndaelEncrypt ( data , key_bits , block_bits , keys ) ;
else
rijndaelDecrypt ( data , key_bits , block_bits , keys ) ;

xil_printf("Data after encry_decry is \n") ;


for(i=0 ; i < 4 ; i++)
{
for ( j=0 ; j < (block_bits/32) ; j++)
xil_printf(" %x ",data[i][j]) ;
xil_printf("\n");
}

for ( i=0 ; i< (block_bits/32) ; i++ )


{

temp_data[i] = 0 ;
for (j=0 ; j < 4 ; j++)
{
95
temp_byte = 0 ;
temp_byte = data[j][i] ;
temp_byte = temp_byte << (24-j*8) ;
temp_data[i] = temp_data[i] | temp_byte ;
}
}

for(i=0 ; i < (block_bits/32) ; i++)


dataout[i]=temp_data[i] ;

xil_printf("\nDataout is : \n") ;
print_result(dataout,block_bits/32) ;
xil_printf("\n") ;

}
/************************************************************************/
word8 mul(word8 a, word8 b) {

if (a && b) return Alogtable[(Logtable[a] + Logtable[b])%255];


else return 0;
}
/************************************************************************/
void KeyAddition(word8 a[4][MAXBC], word8 rk[4][MAXBC], word8 BC) {

int i, j;

for(i = 0; i < 4; i++)


for(j = 0; j < BC; j++) a[i][j] ^= rk[i][j];
}
/************************************************************************/
void ShiftRow(word8 a[4][MAXBC], word8 d, word8 BC) {

word8 tmp[MAXBC];
int i, j;
96
for(i = 1; i < 4; i++) {
for(j = 0; j < BC; j++) tmp[j] = a[i][(j + shifts[SC][i][d]) % BC];
for(j = 0; j < BC; j++) a[i][j] = tmp[j];
}
}
/************************************************************************/
void Substitution(word8 a[4][MAXBC], word8 box[256], word8 BC) {

int i, j;

for(i = 0; i < 4; i++)


for(j = 0; j < BC; j++) a[i][j] = box[a[i][j]] ;
}
/************************************************************************/
void MixColumn(word8 a[4][MAXBC], word8 BC) {

word8 b[4][MAXBC];
int i, j;

for(j = 0; j < BC; j++)


for(i = 0; i < 4; i++)
b[i][j] = mul(2,a[i][j])
^ mul(3,a[(i + 1) % 4][j])
^ a[(i + 2) % 4][j]
^ a[(i + 3) % 4][j];
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}
/************************************************************************/
void InvMixColumn(word8 a[4][MAXBC], word8 BC) {

word8 b[4][MAXBC];
int i, j;
97
for(j = 0; j < BC; j++)
for(i = 0; i < 4; i++)
b[i][j] = mul(0xe,a[i][j])
^ mul(0xb,a[(i + 1) % 4][j])
^ mul(0xd,a[(i + 2) % 4][j])
^ mul(0x9,a[(i + 3) % 4][j]);
for(i = 0; i < 4; i++)
for(j = 0; j < BC; j++) a[i][j] = b[i][j];
}
/************************************************************************/
int rijndaelKeySched (word8 k[4][MAXKC], int keyBits, int blockBits, word8
W[MAXROUNDS+1][4][MAXBC]) {

int KC, BC, ROUNDS;


int i, j, t, rconpointer = 0;
word8 tk[4][MAXKC];

switch (keyBits) {
case 128: KC = 4; break;
case 192: KC = 6; break;
case 256: KC = 8; break;
default : return (-1);
}
switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
}

switch (keyBits >= blockBits ? keyBits : blockBits) {


case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
98
case 256: ROUNDS = 14; break;
default : return (-3);
}

for(j = 0; j < KC; j++)


for(i = 0; i < 4; i++)
tk[i][j] = k[i][j];
t = 0;

for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)


for(i = 0; i < 4; i++) W[t / BC][i][t % BC] = tk[i][j];

while (t < (ROUNDS+1)*BC) {


for(i = 0; i < 4; i++)
tk[i][0] ^= S[tk[(i+1)%4][KC-1]];
tk[0][0] ^= rcon[rconpointer++];

if (KC != 8)
for(j = 1; j < KC; j++)
for(i = 0; i < 4; i++) tk[i][j] ^= tk[i][j-1];
else {
for(j = 1; j < KC/2; j++)
for(i = 0; i < 4; i++) tk[i][j] ^= tk[i][j-1];
for(i = 0; i < 4; i++) tk[i][KC/2] ^= S[tk[i][KC/2 - 1]];
for(j = KC/2 + 1; j < KC; j++)
for(i = 0; i < 4; i++) tk[i][j] ^= tk[i][j-1];
}

for(j = 0; (j < KC) && (t < (ROUNDS+1)*BC); j++, t++)


for(i = 0; i < 4; i++) W[t / BC][i][t % BC] = tk[i][j];
}

return 0;
}
99
int rijndaelEncrypt (word8 a[4][MAXBC], int keyBits, int blockBits, word8
rk[MAXROUNDS+1][4][MAXBC])
{

int r, BC, ROUNDS;

switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
}

switch (keyBits >= blockBits ? keyBits : blockBits) {


case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);
}
KeyAddition(a,rk[0],BC);
for(r = 1; r < ROUNDS; r++) {
Substitution(a,S,BC);
ShiftRow(a,0,BC);
MixColumn(a,BC);
KeyAddition(a,rk[r],BC);
}

Substitution(a,S,BC);
ShiftRow(a,0,BC);
KeyAddition(a,rk[ROUNDS],BC);

return 0;
}
100
int rijndaelDecrypt (word8 a[4][MAXBC], int keyBits, int blockBits, word8
rk[MAXROUNDS+1][4][MAXBC])
{
int r, BC, ROUNDS;
switch (blockBits) {
case 128: BC = 4; break;
case 192: BC = 6; break;
case 256: BC = 8; break;
default : return (-2);
}
switch (keyBits >= blockBits ? keyBits : blockBits) {
case 128: ROUNDS = 10; break;
case 192: ROUNDS = 12; break;
case 256: ROUNDS = 14; break;
default : return (-3);
}
KeyAddition(a,rk[ROUNDS],BC);
Substitution(a,Si,BC);
ShiftRow(a,1,BC);

for(r = ROUNDS-1; r > 0; r--) {


KeyAddition(a,rk[r],BC);
InvMixColumn(a,BC);
Substitution(a,Si,BC);
ShiftRow(a,1,BC);
}

KeyAddition(a,rk[0],BC);

return 0;
}

void print_result(uint temp[], int len)


101
{
int i ;
for ( i = 0 ; i<len ; i++)
xil_printf ("%x ",temp[i]) ;
xil_printf ("\n") ;
}
/**************************************************************************
*****/

102

Das könnte Ihnen auch gefallen