Aes Final

CHAPTER 1
INTRODUCTION:
The Advanced Encryption Standard, in the following referenced as AES, is the winner of the contest, held in
1997 by the US Government, after the Data Encryption Standard was found too weak because of its small key size
and the technological advancements in processor power. Fifteen candidates were accepted in 1998 and based on public
comments the pool was reduced to five finalists in 1999. In October 2000, one of these five algorithms was selected as
the forthcoming standard: a slightly modified version of the Rijndael.
The Rijndael, whose name is based on the names of its two Belgian inventors, Joan Daemen and Vincent
Rijmen, is a Block cipher, which means that it works on fixed-length group of bits, which are called blocks. It takes an
input block of a certain size, usually 128, and produces a corresponding output block of the same size. The
transformation requires a second input, which is the secret key. It is important to know that the secret key can be of any
size (depending on the cipher used) and that AES uses three different key sizes: 128, 192 and 256 bits.
To encrypt messages longer than the block size, a mode of operation is chosen, which I will explain at the very end of
this tutorial, after the implementation of AES. While AES supports only block sizes of 128 bits and key sizes of 128,
192 and 256 bits, the original Rijndael supports key and block sizes in any multiple of 32, with a minimum of 128 and
a maximum of 256 bits.
The need for privacy has become a high priority for both governments and civilians desiring protection
from signal and data interception. Widespread use of personal communications devices has only increased demand for
a level of security on previously insecure communications. Both DES (Data Encryption Standard) and AES are defined
as symmetric key block ciphers, with the main difference being the bit length of the key (56 bit for DES).
These symmetric-key encryption schemes use the same key for both the sender and receiver, and as a result
eliminate the need for the verification server needed in public keying. Symmetric keying lends itself to work
independently of an open network and in turn a higher level of system interoperability.
Ever since DES was phased out in 2001 and its successor, the Advanced Encryption Standard (also known as
Rijndael) took its place, various AES implementations have been proposed both in software and hardware. This paper
presents low cost and low power hardware architecture for the Advanced Encryption Standard (AES). In 1997, the
National Institute of Standards and Technology promoted worldwide research into a replacement for DES, or the
widely accepted Data Encryption Standard. In this brief, we present an efficient and cost-effective AES co-processor
design. To minimize cost, focusing on efficiency reduced overall hardware complexity. By incorporating most of the
algorithm complexity into the controller, components are reused and efficiency increased. A VHDL hardware
implementation is also presented, utilizing a field programmable gate array (FPGA) as a prototyping platform. Thus,
the design can be easily migrated to an ASIC implementation in a SoC. In this architecture, the main priority was not to
increase throughput or decrease processing time but to balance these factors in order to minimize cost. A focus on low
power and cost allows for scaling of the architecture towards vulnerable portable communications devices in consumer
and military applications such as cellular phones, PDAs, digital radios, pagers, and similar lower speed communication
embedded systems.
CHAPTER 2
Literature Survey
2.1 Modern Cryptography
The modern field of cryptography can be divided into several areas of study. The chief ones are discussed here; see
Topics in Cryptography for more
2.2 Symmetric-key cryptography
Symmetric-key cryptography refers to encryption methods in which both the sender and receiver share the same key
(or, less commonly, in which their keys are different, but related in an easily computable way). This was the only kind
of encryption publicly known until June 1976.
Figure: Symmetric-key cryptography
The modern study of symmetric-key ciphers relates mainly to the study of block ciphers and stream ciphers and to their
applications. A block cipher is, in a sense, a modern embodiment of Alberta’s polyalphabetic cipher: block ciphers take
as input a block of plaintext and a key, and output a block of cipher text of the same size. Since messages are almost
always longer than a single block, some method of knitting together successive blocks is required. Several have been
developed, some with better security in one aspect or another than others. They are the modes of operation and must be
carefully considered when using a block cipher in a cryptosystem.
The Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are block cipher designs which
have been designated cryptography standards by the US government (though DES's designation was finally withdrawn
after the AES was adopted). Despite its deprecation as an official standard, DES (especially its still-approved and much
more secure triple-DES variant) remains quite popular; it is used across a wide range of applications, from ATM
encryption to e-mail privacy and secure remote access. Many other block ciphers have been designed and released, with
considerable variation in quality. Many have been thoroughly broken. See Category: Block ciphers.
Stream ciphers, in contrast to the 'block' type, create an arbitrarily long stream of key material, which is combined with
the plaintext bit-by-bit or character-by-character, somewhat like the one-time pad. In a stream cipher, the output stream is
created based on a hidden internal state which changes as the cipher operates. That internal state is initially set up using
the secret key material. RC4 is a widely used stream cipher; see Category: Stream ciphers. Block ciphers can be used as
stream ciphers; see Block cipher modes of operation.
Cryptographic hash functions are a third type of cryptographic algorithm. They take a message of any length as input,
and output a short, fixed length hash which can be used in (for example) a digital signature. For good hash functions, an
attacker cannot find two messages that produce the same hash. MD4 is a long-used hash function which is now broken;
MD5, a strengthened variant of MD4, is also widely used but broken in practice. The U.S. National Security Agency
developed the Secure Hash Algorithm series of MD5-like hash functions: SHA-0 was a flawed algorithm that the agency
withdrew; SHA-1 is widely deployed and more secure than MD5, but cryptanalysts have identified attacks against it; the
SHA-2 family improves on SHA-1, but it isn't yet widely deployed, and the U.S. standards authority thought it "prudent"
from a security perspective to develop a new standard to "significantly improve the robustness of NIST's overall hash
algorithm toolkit." Thus, a hash function design competition is underway and meant to select a new U.S. national
standard, to be called SHA-3, by 2012.
Message authentication codes (MACs) are much like cryptographic hash functions, except that a except that a secret key
is used to authenticate the hash value on receipt.
2.3 Public-key cryptography
Symmetric-key cryptosystems use the same key for encryption and decryption of a message, though a message or group
of messages may have a different key than others. A significant disadvantage of symmetric ciphers is the key
management necessary to use them securely. Each distinct pair of communicating parties must, ideally, share a different
key, and perhaps each cipher text exchanged as well. The number of keys required increases as the square of the number
of network members, which very quickly requires complex key management schemes to keep them all straight and
secret. The difficulty of securely establishing a secret key between two communicating parties, when a secure channel
doesn't already exist between them, also presents a chicken-and-egg problem which is a considerable practical obstacle
for cryptography users in the real world.
Public-key cryptography refers to a cryptographic system requiring two separate keys, one of which is secret
and one of which is public. Although different, the two parts of the key pair are mathematically linked. One key locks
or encrypts the plaintext, and the other unlocks or decrypts the ciphertext. Neither key can perform both functions by
itself. The public key may be published without compromising security, while the private key must not be revealed to
anyone not authorized to read the messages.
Public-key cryptography uses asymmetric key algorithms and can also be referred to by the more generic term
"asymmetric key cryptography." The algorithms used for public key cryptography are based on mathematical
relationships (the most notable ones being the integer factorization and discrete logarithm problems) that presumably
have no efficient solution. Although it is computationally easy for the intended recipient to generate the public and
private keys, to decrypt the message using the private key, and easy for the sender to encrypt the message using the
public key, it is extremely difficult (or effectively impossible) for anyone to derive the private key, based only on their
knowledge of the public key. This is why, unlike symmetric key algorithms, a public key algorithm does not require
a secure initial exchange of one (or more) secret keys between the sender and receiver. The use of these algorithms also
allows the authenticity of a message to be checked by creating a digital signature of the message using the private key,
which can then be verified by using the public key. In practice, only a hash of the message is typically encrypted for
signature verification purposes.
The distinguishing technique used in public-key cryptography is the use of asymmetric key algorithms, where
the key used to encrypt a message is not the same as the key used to decrypt it. Each user has a pair of cryptographic
keys – a public encryption key and a private decryption key. The publicly available encrypting-key is widely
distributed, while the private decrypting-key is known only to its proprietor. The keys are related mathematically, but
the parameters are chosen so that calculating the private key from the public key is either impossible or prohibitively
expensive.
In contrast, symmetric-key algorithms – variations of which have been used for thousands of years – use
a single secret key, which must be shared and kept private by both the sender and the receiver, for both encryption and
decryption. To use a symmetric encryption scheme, the sender and receiver must securely share a key in advance.
Because symmetric key algorithms are nearly always much less computationally intensive than asymmetric
ones, it is common to exchange a key using a key-exchange algorithm, then transmit data using that key and a
symmetric key algorithm. PGP and the SSL/TLS family of schemes use this procedure, and are thus called hybrid
cryptosystems
The two main uses for public-key cryptography are:

 Public-key encryption: a message encrypted with a recipient's public key cannot be decrypted by anyone except a
possessor of the matching private key – it is presumed that this will be the owner of that key and the person
associated with the public key used. This is used to attempt to ensure confidentiality.
 Digital signatures: a message signed with a sender's private key can be verified by anyone who has access to the
sender's public key, thereby proving that the sender had access to the private key and, therefore, is likely to be the
person associated with the public key used. This also ensures that the message has not been tampered with (on the
question of authenticity, see also message digest).
Public-key cryptography is widely used. It is an approach used by many cryptographic algorithms and
cryptosystems. It underpins such Internet standards as Transport Layer Security (TLS), PGP, and GPG. There are three
primary kinds of public key systems: public key distribution systems, digital signature systems, and public key
cryptosystems, which can perform both public key distribution and digital signature services. Diffie–Hellman key
exchange is the most widely used public key distribution system, while the Digital Signature Algorithm is the most
widely used digital signature system
Whitfield Diffie and Martin Hellman, authors of the first paper on public-key cryptography.In a groundbreaking 1976
paper, Whitfield Diffie and Martin Hellman proposed the notion of public-key (also, more generally, called asymmetric
key) cryptography in which two different but mathematically related keys are used — a public key and a private key. A
public key system is so constructed that calculation of one key (the 'private key') is computationally infeasible from the
other (the 'public key'), even though they are necessarily related. Instead, both keys are generated secretly, as an
interrelated pair. The historian David Kahn described public-key cryptography as "the most revolutionary new concept in
the field since polyalphabetic substitution emerged in the Renaissance".
In public-key cryptosystems, the public key may be freely distributed, while its paired private key must remain secret.
The public key is typically used for encryption, while the private or secret key is used for decryption. Diffie and Hellman
showed that public-key cryptography was possible by presenting the Diffie-Hellman key exchange protocol.
In 1978, Ronald Rivest, Adi Shamir, and Len Adleman invented RSA, another public-key system. In 1997, it finally
became publicly known that asymmetric key cryptography had been invented by James H. Ellis at GCHQ, a British
intelligence organization, and that, in the early 1970s, both the Diffie-Hellman and RSA algorithms had been previously
developed (by Malcolm J. Williamson and Clifford Cocks, respectively).
The Diffie-Hellman and RSA algorithms, in addition to being the first publicly known examples of high quality public-
key algorithms, have been among the most widely used. Others include the Cramer-Shoup cryptosystem, ElGamal
encryption, and various elliptic curve techniques.
Padlock icon from the Firefox Web browser, meant to indicate a page has been sent in SSL or TLS-encrypted protected
form. However, such an icon is not a guarantee of security; any subverted browser might mislead a user by displaying
such an icon when a transmission is not actually being protected by SSL or TLS.
Figure: Public key cryptography
In addition to encryption, public-key cryptography can be used to implement digital signature schemes. A digital
signature is reminiscent of an ordinary signature; they both have the characteristic that they are easy for a user to
produce, but difficult for anyone else to forge. Digital signatures can also be permanently tied to the content of the
message being signed; they cannot then be 'moved' from one document to another, for any attempt will be detectable. In
digital signature schemes, there are two algorithms: one for signing, in which a secret key is used to process the message
(or a hash of the message, or both), and one for verification, in which the matching public key is used with the message
to check the validity of the signature. RSA and DSA are two of the most popular digital signature schemes. Digital
signatures are central to the operation of public key infrastructures and many network security schemes (eg, SSL/TLS,
many VPNs, etc).
Public-key algorithms are most often based on the computational complexity of "hard" problems, often from number
theory. For example, the hardness of RSA is related to the integer factorization problem, while Diffie-Hellman and DSA
are related to the discrete logarithm problem. More recently, elliptic curve cryptography has developed in which security
is based on number theoretic problems involving elliptic curves. Because of the difficulty of the underlying problems,
most public-key algorithms involve operations such as modular multiplication and exponentiation, which are much more
computationally expensive than the techniques used in most block ciphers, especially with typical key sizes. As a result,
public-key cryptosystems are commonly hybrid cryptosystems, in which a fast high-quality symmetric-key encryption
algorithm is used for the message itself, while the relevant symmetric key is sent with the message, but encrypted using a
public-key algorithm. Similarly, hybrid signature schemes are often used, in which a cryptographic hash function is
computed, and only the resulting hash is digitally signed.
An analogy that can be used to understand the advantages of an asymmetric system is to imagine two
people, Alice and Bob, who are sending a secret message through the public mail. In this example, Alice wants to send
a secret message to Bob, and expects a secret reply from Bob.
With a symmetric key system, Alice first puts the secret message in a box, and locks the box using a padlock to which
she has a key. She then sends the box to Bob through regular mail. When Bob receives the box, he uses an identical
copy of Alice's key (which he has somehow obtained previously, maybe by a face-to-face meeting) to open the box,
and reads the message. Bob can then use the same padlock to send his secret reply.
In an asymmetric key system, Bob and Alice have separate padlocks. First, Alice asks Bob to send his open padlock to
her through regular mail, keeping his key to himself. When Alice receives it she uses it to lock a box containing her
message, and sends the locked box to Bob. Bob can then unlock the box with his key and read the message from Alice.
To reply, Bob must similarly get Alice's open padlock to lock the box before sending it back to her.
The critical advantage in an asymmetric key system is that Bob and Alice never need to send a copy of their keys to
each other. This prevents a third party – perhaps, in this example, a corrupt postal worker – from copying a key while it
is in transit, allowing the third party to spy on all future messages sent between Alice and Bob. So, in the public key
scenario, Alice and Bob need not trust the postal service as much. In addition, if Bob were careless and allowed
someone else to copy his key, Alice's messages to Bob would be compromised, but Alice's messages to other people
would remain secret, since the other people would be providing different padlocks for Alice to use.
CHAPTER 3
PROJECT IMPLEMENTATION
DES is now considered to be insecure for many applications. This is chiefly due to the 56-bit key size being too
small; DES keys have been broken in less than 24 hours. There are also some analytical results which demonstrate
theoretical weaknesses in the cipher, although they are infeasible to mount in practice. The algorithm is believed to be
practically secure in the form of Triple DES, although there are theoretical attacks. In recent years, the cipher has been
superseded by the Advanced Encryption Standard (AES).
The Advanced Encryption Standard (AES) Algorithm, adopted by the U.S. government in 2001, is a block
cipher transforms 128-bit data blocks under a 128-bit, 192-bit or 256-bit secret key, by means of permutation and
substitution. In January 1997, the National Institute of Standards and Technology (NIST) announced the initiation of an
effort to develop the AES and made a formal call for algorithms on 12th September 1997. After reviewed the results of
this preliminary research, the algorithms MARS, RC6TM, Rijndael, Serpent and Twofish were selected as finalist. And
further reviewed public analysis of the finalist, NIST has decided to propose Rijndael as the new Advanced Encryption
Standard (AES) on 2nd October 2000. It is expected to replace the DES and Triple DES so as to fulfill the stricter data
security requirement because its enhanced security levels (Chih et al., 2002). In the summer of 2001, AES replaced the
aging DES as the Federal Information Processing Encryption Standard (FIPS). DES is seen as reaching the end of its
life, as cracking of its cipher is seen to be more tractable on current computer hardware. The AES algorithm will be
used for many applications within the government an in the private sector. Breaking an AES encrypted cipher text by
trying all possible keys is currently computationally infeasible with technology advances.
The AES specifies the Rijndael algorithm, which is a symmetric block cipher that processes fixed 128 bit data
blocks using cipher keys with lengths of 128, 192 and 256 bits. The original Rijndael algorithm had the option of
combining data block sizes of 128, 192 or 256 bits with any of key lengths. Due to the hard task of verifying that all
possible combinations were secure against cryptographic attacks, only the block size of 128 bits data and key were
included in the AES standard (NIST, 2002). Starting for this section, all discussion is narrowed down to 128-bit AES
algorithm and its implementation.
2.1 BACKGROUND MATHEMATICS
This section provides a brief introduction to the fundamental mathematical concepts of finite fields needed to
understand. For in-depth discussion on the subject, one should refer to Joan Daemen (1999), Brian Gladman (2002) and
FIPS (2001). Several operations in AES are defined at byte level, with bytes representing elements in the finite field GF
(28). Other operations are defined in terms of 4-byte words. This section introduces the basic mathematical concepts
needed for the AES algorithm
2.1.1 THE FIELD GF (28)
The elements of a finite field can be represented in several different ways. For any prime power there is a single
finite field, hence all representations of GF ( 28 ) are isomorphic. Despite this equivalence, the representation has an
impact on the implementation complexity. Joan Daemen and Vincent Rijmen (1999) have chosen for the classical
polynomial representation.
The byte value in AES is represented as a set of bits (0 or 1) and is represented as the collection of bits
separated by comma as {b7, b6, b5, b4, b3, b2, b1, b0}. These bytes are interpreted as finite field elements using
polynomial representation as
b7x7 + b6x6+ b5x5+ b4x4 + b3x3+ b2x2+ b1x+ b0 ------------------------------------ 2.1
Example 1:
The byte with hexadecimal value ‘57’ (binary 01010111) corresponds with polynomial
x6 +x 4 + x2 +x+1 . -----------------------------------------------------------------2.2
2.1.2 FINITE FIELD ADDITION
The addition of two finite field elements is achieved by adding the coefficients for corresponding powers of
their polynomial representations, this addition being performed in GF (28), that is, modulo 2, so that 1 + 1 = 0.
Consequently, addition and subtraction are both equivalent to an exclusive-or (XOR) operation on the bytes that
represent field elements. Addition operations for finite field elements will be denoted by the symbol  .
Example 2:
Steps to get result of {57}  {8E} ≡ {D4}
(Polynomial notation) (x6 + x4 + x2 + x + 1) + (x7 + x + 1) = x7 + x6 + x4 + x2
(Binary notation) {01010111}  {10000011} = {11010100}
(Hexadecimal notation) {57}  {8E} = {D4}
2.1.3 FINITE FIELD MULTIPLICATION
Finite field multiplication is more difficult than addition and is achieved by multiplying the polynomials for the
two elements concerned and collecting like powers of x in the result. Since each polynomial can have powers of x up to
7, the result can have powers of x up to 14 and will no longer fit within a single byte. This situation is handled by
replacing the result with the remainder polynomial after division by a special eight order irreducible polynomial,
which for AES is m(x) = x8 + x4+ x3+ x+1 Since this polynomial has powers of x up to 8, it cannot be represented by
a single byte and will be written as either 1{00011011} or 1{1B}.
Example 3:
This process is illustrated in the following example product {57} · {83} ≡ {C1}
(Where · is used to represent finite field multiplication):
This intermediate result is now divided by m(x) above:

Subtract to give immediate remainder
Subtract to give the final remainder x7 + x6 + 1
The final result is x7 + x6 + 1 = {C1}
2.1.4 MULTIPLICATIVE INVERSE
In mathematics, multiplicative inverse of a number a, is the number which, when multiplied by x, yields 1 or (a
· x) = 1
It is denoted
1 or x 1 .
x
In modular arithmetic, the multiplicative inverse of x is defined as the number such that
(a · x) mod n = 1
However, this multiplicative inverse exists only if a and n are relatively prime. The extended Euclidean algorithm may
be used to compute the multiplicative inverse modulo a number.
Example 4:
The multiplicative inverse of 3 modulo 11 is 4 because 4 is the solution to (3 · x) mod11 = 1. In hexadecimal notation,
{03} mod {0B} = 1.
In calculating multiplicative inverse for a set of 8 bits numbers, there would be a set of 256 different byte values.
Multiplicative inverse is used later in Sub Byte and Inv Sub Byte transformation.
2.1.5 POLYNOMIALS WITH COEFFICIENTS IN GF (28)
Four term polynomial are polynomial whose coefficients are finite field elements like
a(x) = a3x3+ a2x2 +a1x + a0 ----------------------------------------------------- 2.3
Where [a3, a2, a1, a0] will form a word. The coefficients of these polynomial i.e. a3, a2, a1, a0 are valid bytes in the
Finite field. The addition of such polynomials is similar to that of our bit addition. To illustrate that let
b(x) = b3x3+ b2x2 +b1x + b0 ----------------------------------------------------- 2.4
be the second four term polynomial. Addition is done by adding the coefficients of polynomial with like powers of x.
This addition is the XOR operation between the corresponding BYTES in each polynomial
Thus a(x) + b(x) = (a3 b3) x3 + (a2 b2) x2 + (a1 b1) x + (a0 b0) -------------2.5
Multiplication is achieved by algebraically expanding the polynomial product and collecting like powers of x to give
c(x)= c7x7 + c6x6+ c5x5+ c4x4 + c3x3+ c2x2+ c1x+ c0 ---------------------------------2.6
where
and  representing finite field multiplication and addition (XOR) respectively
This result requires six bytes to represent its coefficients but it can be reduced modulo a degree four polynomial to
produce a result that is of degree less than 4. In Rijndael the polynomial used is (x4 + 1) and reduction produces the
following polynomial coefficients
d3 = a3 · b0  a2 · b1  a1 · b2  a0 · b3
d2 = a2 · b0  a1 · b1  a0 · b2  a3 · b3
d1 = a1 · b0  a0 · b1  a3 · b2  a2 · b3
d0 = a0 · b0  a3 · b1  a2 · b2  a1 · b3 -------------------------------------------2.7
If one of the polynomials is fixed, this can conveniently be written in matrix form as:
Because (x4 + 1) is not an irreducible polynomial, not all polynomial multiplications are invertible. For Rijndael,
however, a polynomial that has an inverse has been chosen:
This transformation is used in Mix Column and Inv Mix Column. Another polynomial that Rijndael uses has a0 = a2 =
a3 = {00} and a1 = {01}, which is the polynomial x. Inspection of above will show that its effect is to form the output
word by rotating the bytes in the input word so that [b3, b2, b1, b0] is transformed into [b2, b1, b0, b3], with bytes
moving to higher index positions and the top byte wrapping round to the lowest position. Higher powers of x
correspond to the other cyclic permutations of the four bytes within a 32-bit word. The Rot function that is used in the
key expander corresponds to x3.
2.2 THE STATE
Internally, the AES algorithm’s operations are performed on a two-dimensional array of bytes called the State.
The State consists of four rows of bytes, each containing Nb bytes, where Nb is the block length divided by 32. In the
State array denoted by the symbol s, each individual byte has two indices, with its row number r in the range 0 ε r < 4
and its column number c in the range 0 ε c < Nb. This allows an individual byte of the State to be referred to as either
Sr,c or s[r,c]. For this standard, Nb=4, i.e., 0 ε c < 4
At the start of the Cipher and Inverse Cipher the input – the array of bytes in0, in1, … in15 is copied into the
State array as illustrated in Fig. 3. The Cipher or Inverse Cipher operations are then conducted on this State array, after
which its final value is copied to the output – the array of bytes out0, out1, … out15.
Figure 2.1 Matrix form of state
Hence, at the beginning of the Cipher or Inverse Cipher, the input array, in, is copied to the State array according to the
scheme:
s[r, c] = in[r + 4c] for 0 ε r < 4 and 0 ε c < Nb, ---------------------------------2.8

and at the end of the Cipher and Inverse Cipher, the State is copied to the output array out as
follows:
out[r + 4c] = s[r, c] for 0 ε r < 4 and 0 ε c < Nb. -------------------------------- 2.9
2.2.1 THE STATE AS AN ARRAY OF COLUMNS
The four bytes in each column of the State array form 32-bit words, where the row number r provides an index
for the four bytes within each word. The state can hence be interpreted as a one-dimensional array of 32 bit words
(columns), w0...w3, where the column number c provides an index into this array. Hence, for the example in Fig. 5, the
State can be considered as an array of four words, as follows:
w0 = s0,0 s1,0 s2,0 s3,0 w2 = s0,2 s1,2 s2,2 s3,2
w1 = s0,1 s1,1 s2,1 s3,1 w3 = s0,3 s1,3 s2,3 s3,3. -----------------------------------------------2.10
2.2.2 STANDARD ALGORITHM SPECIFICATION
For the AES algorithm, the length of the input block, the output block and the State is 128 bits. This is
represented by Nb = 4, which reflects the number of 32-bit words (number of columns) in the State.
For the AES algorithm, the length of the Cipher Key, K, is 128, 192, or 256 bits. The key length is represented by Nk =
4, 6, or 8, which reflects the number of 32-bit words (number of columns) in the Cipher Key.
For the AES algorithm, the number of rounds to be performed during the execution of the algorithm is dependent on
the key size. The number of rounds is represented by Nr, where Nr =10 when Nk = 4, Nr = 12 when Nk = 6, and Nr =
14 when Nk = 8.
The only Key-Block-Round combinations that conform to this standard are given in Figure 2.2
Figure 2.2: key-Block-Round Combinations
For both its Cipher and Inverse Cipher, the AES algorithm uses a round function that is composed of four different
byte-oriented transformations:
1) Byte substitution using a substitution table (S-box)
2) Shifting rows of the State array by different offsets
3) Mixing the data within each column of the State array
4) Adding a Round Key to the State
These transformations (and their inverses) are described in following sections
2.3 CIPHER
At the start of the Cipher, the input is copied to the State array using the conventions described in Sec. 2.2.
After an initial Round Key addition, the State array is transformed by implementing a round function 10, 12, or 14
times (depending on the key length), with the final round differing slightly from the first Nr -1 rounds. The final State
is then copied to the output
The AES cipher core consists of a key expansion module, an initial permutation module, a round permutation
module and a final permutation module. The round permutation module will loop internally to perform 10 iteration (for
128 bit keys).
The round function is parameterized using a key schedule that consists of a one-dimensional array of four-byte words
derived using the Key Expansion routine described in Sec. 3.6. The individual transformations - SubBytes(),
ShiftRows(), MixColumns(), and AddRoundKey() process the State and are described in the following subsections.
All Nr rounds are identical with the exception of the final round, which does not include the MixColumns()
transformation.
2.3.1 SUBBYTES() TRANSFORMATION
The SubBytes() transformation is a non-linear byte substitution that operates independently on each byte of the
State using a substitution table (S-box). This S-box which is invertible is constructed by composing two
transformations:
1. Take the multiplicative inverse in the finite field GF(28), described in Sec. 2.1.4; the element {00} is mapped
to itself.
2. Apply the following affine transformation (over GF(28),):

-----------------2.11
for 0 ε i < 8 , where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the value {63} or {01100011}. Here
and elsewhere, a prime on a variable indicates that the variable is to be updated with the value on the right. In matrix
form, the affine transformation element of the S-box can be expressed as
Figure 2.3 : Affine Transformation of S-BOX
Figure below illustrates the effect of the SubBytes() transformation on the State.
Figure 2.4 : SubBytes() applies the S-Box to each byte of the state
The S-box used in the SubBytes() transformation is presented in hexadecimal form in Figure 2.5. For example, if = 1,1
s{53}, then the substitution value would be determined by the intersection of the row with index ‘5’ and the column
with index ‘3’ in Fig. 9. This would result in 1,1 is having a value of {ed}.
Figure 2.5: Substitution values for the bytes xy (in hexadecimal format)
2.3.2 SHIFTROWS() TRANSFORMATION
In the ShiftRows() transformation, the bytes in the last three rows of the State are cyclically shifted over different
numbers of bytes (offsets). The first row, r = 0, is not shifted. Specifically, the ShiftRows() transformation proceeds as
follows:
----------------2.12
where the shift value shift(r,Nb) depends on the row number, r, as follows (recall that Nb = 4):
shift(1,4) =1; shift(2,4) = 2 ; shift(3,4) = 3 .
This has the effect of moving bytes to “lower” positions in the row (i.e., lower values of c in a given row), while the
“lowest” bytes wrap around into the “top” of the row (i.e., higher values of c in a given row).
Figure below illustrates the ShiftRows() transformation

Figure 2.6: Shiftrows() cyclically shifts the last three rows in the state
2.3.3 MIXCOLUMNS () TRANSFORMATION
The MixColumns() transformation operates on the State column-by-column, treating each column as a four-
term polynomial as described in Sec.3.1. The columns are considered as polynomials over GF(28) and multiplied
modulo x4 + 1 with a fixed polynomial a(x), given by a(x) = {03}x3 + {01}x2 + {01}x + {02} ---------
-------------------------------------2.13
The above equation can be described in the matrix form as below
As a result of this multiplication, the four bytes in a column are replaced by the following
-----------------------------------------------------------------------------2.14
Figure below illustrates the MixColumns() transformation.
Figure 2.7: Mixcolumn() Transformation
2.3.4 ADD ROUND KEY TRANSFORMATION
In the Add Round Key() transformation, a Round Key is added to the State by a simple bitwise XOR
operation. Each Round Key consists of Nb words from the key schedule (described in Sec.5.2). Those Nb words are
each added into the columns of the State, such that
-----------------------------2.15
where [wi] are the key schedule words described in Sec. 3.5, and round is a value in the range 0 ε round ε Nr. In the
Cipher, the initial Round Key addition occurs when round = 0, prior to the first application of the round function. The
application of the AddRoundKey() transformation to the Nr rounds of the Cipher occurs when 0 ε round ε Nr. The
byte address within words of the key schedule was described in Sec. 3.5.
Figure 2.8: AddRoundkey() XORs each column of the state with a word from the key schedule
AES IMPLEMENTATION USING FULLYPIPELINED ARCHITECTURE:
The goal of AES implementation using fully pipelined architecture is to achieve the highest throughput and to
optimize the delay. The block diagram of AES implementation using fully pipelined architecture is as shown in Fig.
Basically two types of pipelining are possible in AES implementation:inner round pipelining and outer round
pipelining.Here in our design we have used the outer round pipelining.Here pipelining is done after each round hence it
is called outer round pipelining and also pipelining is done in non-feedback mode so it is called fully pipelined AES. In
fully pipelined AES pipeline registers are placed after each round . The pipelining between each rounds will achieve
high performance implementation both at encryption and decryption side. Although implementing an iterative
pipelining based approach is one option ,for simplicity and clarity,we have used the fully expanded implementation for
all ten rounds. The data generated in each individual round is utilized as an input data to next round. Here placing the
pipeline registers is the key of achieving better performance. Pipeline registers are basically
used for intermediate data processing. . So pipelining is one of the easiest method were high performance can be
achieved in very minimal amount of time thus reducing the overall
implementation cycles.
2.3.5 KEY SCHEDULE
The Round Keys are derived from the Cipher Key by means of the key schedule. This consists of two
components: the Key Expansion and the Round Key Selection. The basic principle is the following:
The total number of Round Key bits is equal to the block length multiplied by the number of rounds plus 1. (e.g., for a
block length of 128 bits and 10 rounds, 1408 Round Key bits are needed).
 The Cipher Key is expanded into an Expanded Key.

 Round Keys are taken from this Expanded Key in the following way: the first Round
 Key consists of the first Nb words, the second one of the following Nb words, and so on
2.3.6 KEY EXPANSION
The AES algorithm takes the Cipher Key, K, and performs a Key Expansion routine to generate a key schedule.
The Key Expansion generates a total of Nb (Nr + 1) words: the algorithm requires an initial set of Nb words, and each
of the Nr rounds requires Nb words of key data. The resulting key schedule consists of a linear array of 4-byte words,
denoted [wi ], with i in the range 0 ε i < Nb(Nr + 1).
SubWord(): It is a function that takes a four-byte input word and applies the S-box of fig above to each of the four
bytes to produce an output word. The function RotWord() takes a word [a0,a1,a2,a3] as input, performs a cyclic
permutation, and returns the word [a1,a2,a3,a0]. The round constant word array, Rcon[i], contains the values given by
[xi-1,{00},{00},{00}], with x i-1 being powers of x (x is denoted as {02}) in the field GF(2^8), as discussed above
(note that i starts at 1, not 0).From Fig. 11, it can be seen that the first Nk words of the expanded key are filled with the
Cipher Key. Every following word, w[i], is equal to the XOR of the previous word, w[i-1], and the word Nk positions
earlier, w[i-Nk]. For words in positions that are a multiple of Nk, a transformation is applied to w[i-1] prior to the
XOR, followed by an XOR with a round constant, Rcon[i]. This transformation consists of a cyclic shift of the bytes in
a word (RotWord()), followed by the application of a table lookup to all four bytes of the word (SubWord()).
It is important to note that the Key Expansion routine for 256-bit Cipher Keys (Nk = 8) is slightly different than
for 128- and 192-bit Cipher Keys. If Nk = 8 and i-4 is a multiple of Nk, then SubWord() is applied to w[i-1] prior to
the XOR.
2.3.7 ROUND KEY SELECTION
Round key i is given by the Round Key buffer words W[Nb*i]to W[Nb*(i+1)]. This is illustrated in Figure 6.
Figure 2.9: Key expansion and Round key selection for Nb=6 and Nk=4
2.4 INVERSE CIPHER
The Cipher transformations in Sec. 5.1 can be inverted and then implemented in reverse order to produce a
straightforward Inverse Cipher for the AES algorithm. The individual transformations used in the Inverse Cipher -
InvShiftRows(), InvSubBytes(),Inv MixColumns(), and AddRoundKey() – process the State and are described in
the following subsections
The AES inverse cipher core consists of a key expansion module, a key reversal buffer, an initial permutation module,
a round permutation module and a final permutation module. The key reversal buffer first store keys for all rounds and
the presents them in reverse order to the rounds. The round permutation module will loop maternally to perform 10
iterations (for 128 bit keys).
2.4.1 INVSHIFTROWS () TRANSFORMATION
InvShiftRows() is the inverse of the ShiftRows() transformation. The bytes in the last three rows of the State
are cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is not shifted. The bottom three
rows are cyclically shifted by Nb shift(r, Nb) bytes, where the shift value shift(r,Nb) depends on the row number,
and is given in the section 2.3
Specifically, the InvShiftRows() transformation proceeds as follows:
--------------2.16
Figure 2.10 illustrates the InvShiftRows() transformation.

Figure 2.10 InvShiftRows () transformation
2.4.2 INVSUBBYTES () TRANSFORMATION
InvSubBytes() is the inverse of the byte substitution transformation, in which the inverse Sbox is applied to
each byte of the State. This is obtained by applying the inverse of the affine transformation followed by taking the
multiplicative inverse in GF(28).The inverse S-box used in the InvSubBytes() transformation is presented in Fig below.
Figure 2.11: Inverse S-BOX
2.4.3 INVMIXCOLUMNS () TRANSFORMATION
InvMixColumns() is the inverse of the MixColumns() transformation.
InvMixColumns() operates on the State column-by-column, treating each column as a four term polynomial as
described in Sec. 4.3. The columns are considered as polynomials over GF(28) and multiplied modulo x4 + 1 with a
fixed polynomial a-1(x), given by
-------------------------2.17
this can be written as a matrix multiplication. Let

2.4.4 INVERSE OF THE ADDROUNDKEY () TRANSFORMATION
AddRoundKey(), which was described in Sec. 2.3.4, is its own inverse, since it only involves an application of
the XOR operation
AES S-Box with minimal power consumption gives a typical challenge in today's research environment. Low power AES
design is benefited for many of the applications like smart cards, security sensor nodes and radio frequency identification
(RFID) tags. Using 3-stage PPRM implementation [4,5] less area is achieved with high throughput. But it consumes more
power than unshared 3-stage PPRMS-Box. Similarly, the usage of three input XOR gate the conventional method, which is
implemented in the ASIC application and it is not suitable for other implementations. pipe lining composite field
proposed using the Algebraic Normal Form in It reduces the dynamic hazards to make a process with high throughput
and has the complicated crossing and branching. Another one is converting entire GF (2^8) inversion circuit into logic
expressions is a tedious effort. This paper is focused on pass transistor design for low power and small Area. The design gives
the flexibility of low power and the low power is achieved by replacing XOR and AND gate with XNOR and NAND
respectively, it makes the system to increase the complexity, so the throughput is reduced extensively. The implementation of
S-Box algorithm in simulation as a net list of AND, OR, NOT and XOR logic gates mentioned. Unfortunately it has a large
number of components and also the higher die size.
The Proposed AES encryption round architecture is abstractly described by the block diagram shown in FigThrough the following
subsections the implementation of each module is presented followed by the description of the sub-pipelining scheme utilized. The
SubBytestransformation processes the 128-bit state value on a byte level. Each byte is substituted by its multiplicative inverse
followed by an affine transformation the main complexity lies in computing the multiplicative
S-box has the main two transfonnations. One is the multiplicative inversion and another one is the affine transfonnation.
Fig shows subbyte and invsubbyte In Subbytes the operation is multiplicative inversion to affine transformation. In inv
subbytes, the operation is inv affine transfonnation to multiplicative inversion.
Affme Transformation (AT):
The matrix multiplication followed by the addition of a vector is affine transformation. The sum of multiple rotation of byte
is a vector. Here the addition operation is the XOR operation.
Inv Affine Transformation (ATI):
The reverse process is inverse affine transformation. The detail function of these both affine and inv affine transformation is
as follows:
Multiplicative Inversion:
Composite field of GF(2^8) cannot directly apply through the multiplicative inversion. The computation process is made by
the decomposing the complex fonn of GF(2^8) in the lower order form of GF(2
2), GF(2^1) and GF((2^2)2). The irreducible polynomial used.to go for several arithmetic operations like squaring,
multiplication, inversion and addition . Multiplicative inversion is the costliest field. These are simplified by the simply XOR-
AND gates.
GF(2^4) Multiplication
GF(2^2) Multiplication
Multiplication with constant φ
Figure Contd….
Figure2.11 Tabular Verification of AES algorithm.
SIMULATIONS AND RESULTS
Area:
Figure 1: AES Existing
Figure 2: AES Previous
Figure 3: Decryption
Figure 4: Edecryption
Figure 5: Pdecryption
Figure 6 : AES
Schematic


Figure 12: AES
Wave Forms

Figure 18: AES

CHAPTER 4
Tools And HDL Used
Tools and HDL Used
We have used Xilinx ISE 13.2i for simulation and synthesis purposes. We implemented the prescribed design in
Verilog HDL.
Verilog
Hardware description languages, such as Verilog, differ from software programming languages in several fundamental
ways. HDLs add the concept of concurrency, which is parallel execution of multiple statements in explicitly specified threads,
propagation of time, and signal dependency (sensitivity). There are two assignment operators, a blocking assignment (=), and a
non-blocking (<=) assignment. The non-blocking assignment allows designers to describe a state-machine update without
needing to declare and use temporary storage variables. Since these concepts are part of the Verilog's language semantics,
designers could quickly write descriptions of large circuits, in a relatively compact and concise form. At the time of Verilog's
introduction (1984), Verilog represented a tremendous productivity improvement for circuit designers who were already using
graphical schematic-capture, and specially-written software programs to document and simulate electronic circuits.
The designers of Verilog wanted a language with syntax similar to the C programming language, which was already widely
used in engineering software development. Verilog is case-sensitive, has a basic preprocessor (though less sophisticated than
ANSI C/C++), and equivalent control flow keywords (if/else, for, while, case, etc.), and compatible language operators
precedence. Syntactic differences include variable declaration (Verilog requires bit-widths on net/reg types), demarcation of
procedural-blocks (begin/end instead of curly braces {}), though there are many other minor differences.
A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy, and communicate with other
modules through a set of declared input, output, and bidirectional ports. Internally, a module can contain any combination of
the following: net/variable declarations (wire, reg, integer, etc.), concurrent and sequential statement blocks, instances of other
modules (sub-hierarchy.) Sequential statements are placed inside a begin/end block and executed in sequential order within the
block. But the blocks themselves are executed concurrently, qualifying Verilog as a Dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating, undefined"), and strengths (strong, weak,
etc.) This system allows abstract modeling of shared signal-lines, where multiple sources drive a common net. When a wire has
multiple drivers, the wire's (readable) value is resolved by a function of the source drivers and their strengths.
A subset of statements in the Verilog language is synthesizable. Verilog modules that conform to a synthsizeable coding-
style, known as RTL (register transfer level), can be physically realized by synthesis software. Synthesis-software algorithmically
transforms the (abstract) Verilog source into a netlist, a logically-equivalent description consisting only of elementary logic
primitives (AND, OR, NOT, flipflops, etc.) Further manipulations to the netlist ultimately lead to a circuit fabrication blueprint
(such as a photo mask-set for an ASIC), or a bitstream-file for an FPGA)
History
Beginning
Verilog was invented by Phil Moorby and Prabhu Goel during the winter of 1983/1984 at Automated Integrated Design
Systems (later renamed to Gateway Design Automation) in 1985 as a hardware modeling language. Gateway Design Automation
was later purchased by Cadence Design Systems in 1990. Cadence now has full proprietary rights to Gateway's Verilog and the
Verilog-XL simulator logic simulators.
Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make the language available for open standardization.
Cadence transferred Verilog into the public domain under the Open Verilog International (OVI) (now known as Accellera)
organization. Verilog was later submitted to IEEE and became IEEE Standard 1364-1995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards support behind its analog simulator
Spectre. Verilog-A was never intended to be a standalone language and is a subset of Verilog-AMS which encompassed Verilog-
95.
Verilog 2001
Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that users had found in the original Verilog
standard. These extensions became IEEE Standard 1364-2001 known as Verilog-2001.
Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for (2's complement) signed nets and
variables. Previously, code authors had to perform signed-operations using awkward bit-level manipulations (for example, the
carry-out bit of a simple 8-bit addition required an explicit description of the boolean-algebra to determine its correct value.)
The same function under Verilog-2001 can be more succinctly described by one of the built-in operators: +, -, /, *, >>>. A
generate/endgenerate construct (similar to VHDL's generate/endgenerate) allows Verilog-2001 to control instance and
statement instantiation through normal decision-operators (case/if/else). Using generate/endgenerate, Verilog-2001 can
instantiate an array of instances, with control over the connectivity of the individual instances. File I/O has been improved by
several new system-tasks. And finally, a few syntax additions were introduced to improve code-readability (eg. always @*,
named-parameter override, C-style function/task/module header declaration.)
Verilog-2001 is the dominant flavor of Verilog supported by the majority of commercial EDA software packages.
Verilog 2005
Not to be confused with SystemVerilog, Verilog 2005 (IEEE Standard 1364-2005) consists of minor corrections, spec
clarifications, and a few new language features (such as the uwire keyword.)
A separate part of the Verilog standard , Verilog-AMS, attempts to integrate analog and mixed signal modelling with
traditional Verilog.
Example
A hello world program looks like this:
module main;
initial
begin
$display("Hello world!");
$finish;
end
endmodule
A simple example of two flip-flops follows:
module toplevel(clock,reset);
input clock;
input reset;
reg flop1;
reg flop2;
always @ (posedge reset or posedge clock)
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
endmodule
The "<=" operator in verilog is another aspect of its being a hardware description language as opposed to a normal
procedural language. This is known as a "non-blocking" assignment. When the simulation runs, all of the signals assigned with a
"<=" operator have their assignment scheduled to occur after all statements occurring during the same point in time have
executed. After all the statements have been executed for one event, the scheduled assignments are performed. This makes it
easier to code behaviours that happen simultaneously.
In the above example, flop1 is assigned flop2, and flop2 is assigned flop1. These statements are executed during the same
time event. Since the assignments are coded with the "<=" non-blocking operator, the assignments are scheduled to occur at the
end of the event. Until then, all reads to flop1 and flop2 will use the values they had at the beginning of the time event.
This means that the order of the assignments are irrelevant and will produce the same result. flop1 and flop2 will swap
values every clock.
The other choice for assignment is an "=" operator and this is known as a blocking assignment. When the "=" operator is
used, things occur in the sequence they occur much like a procedural language.
In the above example, if the statements had used the "=" blocking operator instead of "<=", the order of the statements
would affect the behaviour: the reset would set flop2 to a 1, and flop1 to a 0. A clock event would then set flop1 to flop2, which
is a 1 after the reset. The next statement would be executed subsequently and would set flop2 to flop1, which is now a 1. Rather
than swap values every clock, flop1 and flop2 would both become 1 and remain that way.
An example counter circuit follows:
module Div20x (rst, clk, cet, cep, count,tc);
// TITLE 'Divide-by-20 Counter with enables'
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
// a counter using the Verilog language
parameter size = 5;
parameter length = 20;
input rst; // These inputs/outputs represent
input clk; // connections to the module.
input cet;
input cep;
output [size-1:0] count;
output tc;
reg [size-1:0] count; // Signals assigned

// within an always
// (or initial)block
// must be of type reg
wire tc; // Other signals are of type wire
// The always statement below is a parallel
// execution statement that
// executes any time the signals
// rst or clk transition from low to high
always @ (posedge clk or posedge rst)
if (rst) // This causes reset of the cntr
count <= 5'b0;
else
if (cet && cep) // Enables both true
begin
if (count == length-1)
count <= 5'b0;
else
count <= count + 5'b1; // 5'b1 is 5 bits
end // wide and equal
// to the value 1.
// the value of tc is continuously assigned

// the value of the expression
assign tc = (cet && (count == length-1));
endmodule
An example of delays:
...
reg a, b, c, d;
wire e;
...
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end
The always clause above illustrates the other type of method of use, i.e. the always clause executes any time any of the
entities in the list change, i.e. the b or e change. When one of these changes, immediately a and b are assigned new values. After
a delay of 5 time units, c is assigned the value of b and the value of c ^ e is tucked away in an invisible store. Then after 6 more
time units, d is assigned the value that was tucked away.
Signals that are driven from within a process (an initial or always block) must be of type reg. Signals that are driven from
outside a process must be of type wire. The keyword reg does not necessarily imply a hardware register.
Definition of Constants
The definition of constants in Verilog supports the addition of a width parameter. The basic syntax is:
<Width in bits>'<base letter><number>
Examples:
 12'h123 - Hexadecimal 123 (using 12 bits)
 20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)
 4'b1010 - Binary 1010 (using 4 bits)
 6'o77 - Octal 77 (using 6 bits)
Synthesizable constructs
As mentioned previously, there are several basic templates that can be used to represent hardware.
// Mux examples - Three ways to do the same thing.
// The first example uses continuous assignment
wire out ;
assign out = sel ? a : b;
// the second example uses a procedure
// to accomplish the same thing.
reg out;
always @(a or b or sel)
begin
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally - you can use if/else in a
// procedural structure.
reg out;
always @(a or b or sel)
if (sel)
out = a;
else
out = b;
The next interesting structure is a transparent latch; it will pass the input to the output when the gate signal is set for "pass-
through", and captures the input and store it upon transition of the gate signal to "hold". The output will remain stable
regardless of the input signal while the gate is set to "hold". In the example below the "pass-through" level of the gate would be
when the value of the if clause is true, i.e. gate = 1. This is read "if gate is true, the din is fed to latch_out continuously." Once the
if clause is false, the last value at latch_out will remain and is independent of the value of din.
// Transparent latch example
reg out;
always @(gate or din)
if(gate)
out = din; // Pass through state
// Note that the else isn't required here. The variable
// out will follow the value of din while gate is high.
// When gate goes low, out will remain constant.
The flip-flop is the next significant template; in verilog, the D-flop is the simplest, and it can be modeled as:
reg q;
always @(posedge clk)
q <= d;
The significant thing to notice in the example is the use of the non-blocking assignment. A basic rule of thumb is to use <=
when there is a posedge or negedge statement within the always clause.
A variant of the D-flop is one with an asynchronous reset; there is a convention that the reset state will be the first if clause
within the statement.
reg q;
always @(posedge clk or posedge reset)
if(reset)
q <= 0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set condition; again the convention comes into
play, i.e. the reset term is followed by the set term.
reg q;
always @(posedge clk or posedge reset or posedge set)
if(reset)
q <= 0;
else if(set)
q <= 1;
else
q <= d;
The final basic variant is one that implements a D-flop with a mux feeding its input. The mux has a d-input and feedback
from the flop itself. This allows a gated load function.
// Basic structure with an EXPLICIT feedback path
if(gate)
q <= d;
else
q <= q; // explicit feedback path
// The more common structure ASSUMES the feedback is present
// This is a safe assumption since this is how the
// hardware compiler will interpret it. This structure
// looks much like a Latch. The differences are the
// '''@(posedge clk)''' and the non-blocking '''<='''
//
if(gate)
q <= d; // the "else" mux is "implied"
Looking at the original counter example you can see a combination of the basic asynchronous reset flop and Gated input
flop used. The register variable count is set to zero on the rising edge or rst. When rst is 0, the variable count will load new data
when cet && cep is true.
Initial and Always
There are two separate ways of declaring a verilog process. These are the always and the initial keywords. The always
keyword indicates a free-running process that triggers on the accompanying event-control (@) clause. The initial keyword
indicates a process executes exactly once. Both constructs begin execution at simulator time 0, and both execute until the end of
the block. Once an always block has reached its end, it is rescheduled (again). It is a common misconception to believe that an
initial block will execute before an always block. In fact, it is better to think of the initial-block as a special-case of the always-
block, one which terminates after it completes for the first time.
//Examples:
initial
begin
a = 1; // Assign a value to reg a at time 0
#1; // Wait 1 time unit

b = a; // Assign the value of reg a to reg b
end
always @(a or b) // Anytime a or b CHANGE, run the process
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant additional uses. The most common of these
is an always keyword without the @() sensitivity list. It is possible to use always as shown below:
always
begin // Always begins executing at time 0 and NEVER stops
clk = 0; // Set clk to 0
#1; // Wait for 1 time unit
end // Keeps executing - so continue back at the top of the begin
The always keyword acts similar to the "C" construct while(1) {..} in the sense that it will execute forever.
The other interesting exception is the use of the initial keyword with the addition of the forever keyword.
The example below is functionally identical to the always example above.
initial forever // Start at time 0 and repeat the begin/end forever
begin
#1; // Wait for 1 time unit
end
Fork/Join
The fork/join pair are used by Verilog to create parallel processes. All statements (or blocks) between a fork/join pair begin
execution simultaneously upon execution flow hitting the fork. Execution continues after the join upon completion of the
longest running statement or block between the fork and join.
initial
fork
$write("A"); // Print Char A
$write("B"); // Print Char B
begin
$write("C");// Print Char C
end
join
The way the above is written, it is possible to have either the sequences "ABC" or "BAC" print out. The order of simulation
between the first $write and the second $write depends on the simulator implementation. This illustrates one of the biggest
issues with Verilog. You can have race conditions where the language execution order doesn't guarantee the results.
Race Conditions
The order of execution isn't always guaranteed within verilog. This can best be illustrated by a classic example. Consider the
code snippet below:
initial
a = 0;
initial
b = a;
initial
begin
#1;
$display("Value a=%b Value of b=%b",a,b);
end
What will be printed out for the values of a and b? Well - it could be 0 and 0, or perhaps 0 and X! This all depends on the
order of execution of the initial blocks. If the simulators scheduler works from the top of the file to the bottom, then you would
get 0 and 0. If it begins from the bottom of the module and works up, then b will receive the initial value of a at the beginning of
the simulation before it has been initialized to 0 (the value of any variable not set explicitily is set to X.) This is the way you can
experience a race condition in a simulation. So be careful! Note that the 3rd initial block will execute as you expect because of
the #1 there. That is a different point on the time wheel beyond time 0, consequently both of the earlier initial blocks have
completed execution.
Operators
Operator Operator
Operation performed
type symbols
~ 1's complement
Bitwise
& Bitwise AND
| Bitwise OR
^ Bitwise XOR
~^ or ^~ Bitwise XNOR
! NOT
Logical && AND
|| OR
& Reduction AND
~& Reduction NAND
| Reduction OR
Reductio
n
~| Reduction NOR
^ Reduction XOR
~^ or ^~ Reduction XNOR
+ Addition
- Subtraction
- 2's complement
Arithmeti
c
* Multiplication
/ Division
** exponent (*Verilog-2001)
> Greater than
< Less than

Relationa
l
>= Greater than or equal to
<= Less than or equal to

logical equality (bit-value 1'bX is removed from
==
comparison)
Logical inequality (bit-value 1'bX is removed from

!=
comparison)
=== 4-state logical equality (bit-value 1'bX is taken as literal)
4-state Logical inequality (bit-value 1'bX is taken as

!==
literal)
>> Logical Right shift
<< Logical Left shift

Shift
>>> Arithmetic Right shift (*Verilog-2001)
<<< Arithmetic Left shift (*Verilog-2001)
Concaten
{,} Concatenation
ation
Replicati
{{ }} Replication
on
Conditio
?: Conditional
nal
System tasks
System tasks are available to handle simple I/O, and various design measurement functions. All system tasks are prefixed
with $ to distinguish them from user tasks and functions. This section presents a short list of the most often used tasks. It is by
no means a comprehensive list.
 $display - Print to screen a line followed by an automatic newline.

 $write - Print to screen a line without the newline.
 $swrite - Print to variable a line without the newline.
 $sscanf - Read from variable a format-specified string. (*Verilog-2001)
 $fopen - Open a handle to a file (read or write)
 $fdisplay - Write to file a line followed by an automatic newline.
 $fwrite - Write to file a line without the newline.
 $fscanf - Read from file a format-specified string. (*Verilog-2001)
 $fclose - Close and release an open file-handle.
 $readmemh - Read hex file content into a memory array.
 $readmemb - Read binary file content into a memory array.
 $monitor - Print out all the listed variables when any change value.
 $time - Value of current simulation time.
 $dumpfile - Declare the VCD (Value Change Dump) format output file name.
 $dumpvars - Turn on and dump the variables.
 $dumpports - Turn on and dump the variables in Extended-VCD format.
 $random - Return a random value.
Program Language Interface (PLI)
Program Language Interface provides a programmer with transferring control from Verilog to a program function written in
C language. It is officially deprecated by IEEE Std 1364-2005 in favor of the newer Verilog Procedural Interface, which
completely replaces the PLI.
The PLI enables Verilog to cooperate with other programs written in the C language such as test harness, Instruction Set
Simulator of microcontroller, debugger, and so on. For example, it provides C functions named tf_putlongp() and tf_getlongp()
which are used to write and read the argument of the current Verilog task or function, respectively.
CHAPTER 5
Field Programmable Gate Array

FPGA Design Flow:
The ISE™ design flow comprises the following steps: design entry, design synthesis, design
implementation, and Xilinx® device programming. Design verification, which includes both functional verification and
timing verification, takes places at different points during the design flow.
FPGA Design Flow
Design Entry:
Create an ISE project as follows:
1. Create a project.
2. Create files and add them to your project, including a user constraints (UCF) file.
3. Add any existing files to your project.
4. Assign constraints such as timing constraints, pin assignments, and area constraints.
Functional Verification:
You can verify the functionality of your design at different points in the design flow as follows:
 Before synthesis, run behavioral simulation (also known as RTL simulation).

 After Translate, run functional simulation (also known as gate-level simulation), using the SIMPRIM library.
 After device programming, run in-circuit verification.
Design Synthesis:
Synthesize your design.
Design Implementation:
Implement your design as follows:
1. Implement your design, which includes the following steps:

 Translate
 Map
 Place and Route
2. Review reports generated by the Implement Design process, such as the Map Report or Place & Route Report,
and change any of the following to improve your design:
 Process properties
 Constraints
 Source files
3. Synthesize and implement your design again until design requirements are met.
Timing Verification:
You can verify the timing of your design at different points in the design flow as follows:
 Run static timing analysis at the following points in the design flow:
 After Map
 After Place & Route
 Run timing simulation at the following points in the design flow:
 After Map (for a partial timing analysis of CLB and IOB delays)
 After Place and Route (for full timing analysis of block and net delays)
Xilinx Device Programming:
Program your Xilinx device as follows:
1. Create a programming file (BIT) to program your FPGA.

2. Generate a PROM, ACE, or JTAG file for debugging or to download to your device.
3. Use iMPACT to program the device with a programming cable.
Implementation Overview for FPGAs:
After synthesis, you run design implementation, which comprises the following steps:
1. Translate, which merges the incoming net lists and constraints into a Xilinx® design file
2. Map, which fits the design into the available resources on the target device
3. Place and Route, which places and routes the design to the timing constraints
4. Programming file generation, which creates a bit stream file that can be downloaded to the device
In the Sources tab, select Synthesis/Implementation from the Design View drop-down list, and select the top
module. In the Processes tab, double-click Implement Design to run the implementation process in one step, or double
click Translate, Map, and Place & Route to run each of the implementation steps separately. To generate the
programming file, double-click Generate Programming File. Alternatively, you can select Process -> Implement -> Top
Module to run Implement Design on the top module. For details, see implementing the Top Module.
Default property values are used for the implementation process, unless you modify them. Properties can be set for
the Implement Design process or for each of the separate implementation processes.
ASIC (Application-specific integrated circuit):
An ASIC (pronounced “a-sick”; bold typeface defines a new term) is an application specific integrated
circuit at least that is what the acronym stands for. Before we answer the question of what that means we first look at
the evolution of the silicon chip or integrate circuit (IC).
Standard cell design
In the mid 1980s a designer would choose an ASIC manufacturer and implement their design using the
design tools available from the manufacturer. While third party design tools were available, there was not an effective
link from the third party design tools to the layout and actual semiconductor process performance characteristics of the
various ASIC manufacturers. Most designers ended up using factory specific tools to complete the implementation of
their designs. A solution to this problem that also yielded a much higher density device was the implementation of
Standard Cells.
Gate array design
Gate array design is a manufacturing method in which the diffused layers, i.e. transistors and other active
devices, are predefined and wafers containing such devices are held in stock prior to metallization, in other words,
unconnected. The physical design process then defines the interconnections of the final device. For most ASIC
manufacturers, this consists of from two to as many as five metal layers, each metal layer running parallel to the one
below it. Non-recurring engineering costs are much lower as photo-lithographic masks are required only for the metal
layers, and production cycles are much shorter as metallization is a comparatively quick process.
Full-custom design
By contrast, full-custom ASIC design defines all the photo lithographic layers of the device. Full-custom
design is used for both ASIC design and for standard product design.
The benefits of full-custom design usually include reduced area (and therefore recurring component cost),
performance improvements, and also the ability to integrate analog components and other pre-designed (and thus fully
verified) components such as microprocessor cores that form a system-on-chip.
The disadvantages of full-custom design can include increased manufacturing and design time, increased
non-recurring engineering costs, more complexity in the computer-aided design (CAD) system and a much higher skill
requirement on the part of the design team.
Structured/platform design
Structured ASIC design (also referred to as platform ASIC design) has different meanings in different
contexts. This is a relatively new term in the industry, which is why there is some variation in its definition. However,
the basic premise of a structured/platform ASIC is that both manufacturing cycle time and design cycle time are
reduced compared to cell-based ASIC by virtue of there being pre-defined metal layers (thus reducing manufacturing
time) and pre-characterization of what is on the silicon (thus reducing design cycle time).
Xilinx ISE Overview:
The Integrated Software Environment (ISE™) is the Xilinx® design software suite that allows us to take our
design from design entry through Xilinx device programming. The ISE Project Navigator manages and processes our
design through the following steps in the ISE design flow.
Design Entry:
Design entry is the first step in the ISE design flow. During design entry, we create our source files based on our
design objectives. We can create our top-level design file using a Hardware Description Language (HDL), such as
VHDL, Verilog, using a schematic.
Synthesis:
After design entry and optional simulation, we run synthesis. During this step, VHDL, Verilog, or mixed language
designs become Net-list files that are accepted as input to the implementation step.
Implementation:
After synthesis, we run design implementation, which converts the logical design into a physical file format that
can be downloaded to the selected target device. From Project Navigator, we can run the implementation process in one
step, or we can run each of the implementation processes separately. Implementation processes vary depending on
whether we are targeting a Field Programmable Gate Array (FPGA) or a Complex Programmable Logic Device
(CPLD).
Verification:
We can verify the functionality of our design at several points in the design flow. we can use simulator software to
verify the functionality and timing of our design or a portion of our design. The simulator interprets VHDL or Verilog
code into circuit functionality and displays logical results of the described HDL to determine correct circuit operation.
Simulation allows us to create and verify complex functions in a relatively small amount of time. We can also run in-
circuit verification after programming your device.
Device Configuration:
After generating a programming file, we configure our device. During configuration, we generate configuration
files and download the programming files from a host computer to a Xilinx device.
CPLD and FPGA Overview:
Introduction:
The Programmability of a device can be achieved using PLA’s, PAL’s, PROM’s, EPROM’s etc., The rigid two-
level-logic-plus-register architecture in conjunction with the limited numbers of inputs, outputs, product terms, and
flip-flops always restricted SPLDs to small applications. More scalable and flexible architectures had thus to be sought,
and the spectacular progress of VLSI technology has made their implementation economically feasible from the late
1980s onwards. Two broad classes of hardware organization prevail today.
Complex Programmable Logic Devices (CPLD):
CPLDs expand the general idea behind SPLDs by providing many of them on a single chip. Up to hundreds of
identical sub-circuits, each of which conforms to a classic SPLD, are combined with a large programmable interconnect
matrix or network, see fig.2.1. A difficulty with this type of organization is that a partitioning into a bunch of
cooperating SPLDs has to be imposed artificially on any given computational task, which benefits neither hardware nor
design efficiency. Depending on the manufacturer, products are known as complex programmable logic device
(CPLD), programmable large-scale integration (PLSI), erasable programmable logic device (EPLD), and the like in the
commercial world.
Field-programmable gate arrays (FPGA):
FPGA have their overall organization patterned after that of gate arrays. Many configurable logic cells are
arranged in a two-dimensional array with bundles of parallel wires in between. A switchbox is present wherever two
wiring channels intersect, see fig.2.2. Depending on the product, each logic cell can be configured so as to carry out
some not-too-complex combinational operation, to store a bit or two, or both.
As opposed to traditional gate arrays, it is the state of programmable links rather than fabrication masks that
decides on logic functions and signal routing. Parts with this organization are being promoted under names such as
field-programmable gate array (FPGA), logic cell array (LCA), and programmable multilevel device (PMD). The
number of configurable logic cells greatly varies between products, with typical figures ranging between a few dozens
and hundreds of thousands.
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a
designer after manufacturing—hence "field-programmable". The FPGA configuration is generally specified using
a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC)
(circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly
rare). Contemporary FPGAs have large resources of logic gates and RAM blocks to implement complex digital
computations. As FPGA designs employ very fast IOs and bidirectional data buses it becomes a challenge to verify
correct timing of valid data within setup time and hold time. Floor planning enables resources allocation within FPGA
to meet these time constraints. FPGAs can be used to implement any logical function that an ASIC could perform. The
ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non-
recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer
advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects
that allow the blocks to be "wired together"—somewhat like many (changeable) logic gates that can be inter-wired in
(many) different configurations. Logic blocks can be configured to perform complex combinational functions, or
merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which
may be simple flip-flops or more complete blocks of memory.
Some FPGAs have analog features in addition to digital functions. The most common analog feature is
programmable slew rate and drive strength on each output pin, allowing the engineer to set slow rates on lightly loaded
pins that would otherwise ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-speed
channels that would otherwise run too slow. Another relatively common analog feature is differential comparators on
input pins designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have integrated
peripheral analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) with analog signal
conditioning blocks allowing them to operate as a system-on-a-chip. Such devices blur the line between an FPGA,
which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog
array(FPAA), which carries analog values on its internal programmable interconnect fabric.
The FPGA industry sprouted from programmable read-only memory (PROM) and programmable logic devices (PLDs).
PROMs and PLDs both had the option of being programmed in batches in a factory or in the field (field
programmable), however programmable logic was hard-wired between logic gates.
In the late 1980s the Naval Surface Warfare Department funded an experiment proposed by Steve Casselman to
develop a computer that would implement 600,000 reprogrammable gates. Casselman was successful and a patent
related to the system was issued in 1992.
Some of the industry’s foundational concepts and technologies for programmable logic arrays, gates, and logic blocks
are founded in patents awarded to David W. Page and LuVerne R. Peterson in 1985.
Xilinx co-founders Ross Freeman and Bernard Vonderschmitt invented the first commercially viable field
programmable gate array in 1985 – the XC2064. The XC2064 had programmable gates and programmable
interconnects between gates, the beginnings of a new technology and market. The XC2064 boasted a mere 64
configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More than 20 years later, Freeman was
entered into the National Inventors Hall of Fame for his invention.
Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s, when competitors sprouted up,
eroding significant market-share. By 1993, Actel was serving about 18 percent of the market.
The 1990s were an explosive period of time for FPGAs, both in sophistication and the volume of production. In the
early 1990s, FPGAs were primarily used in telecommunications and networking. By the end of the decade, FPGAs
found their way into consumer, automotive, and industrial applications
Modern Developments
A recent trend has been to take the coarse-grained architectural approach a step further by combining the logic
blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form a
complete "system on a programmable chip". This work mirrors the architecture by Ron Perlof and Hana Potash of
Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a single chip called the
SB24. That work was done in 1982. Examples of such hybrid technologies can be found in the Xilinx Zynq™-7000 All
Programmable SoC, which includes a 1.0 GHz dual-core ARM Cortex-A9 MPCore processor embedded within the
FPGA's logic fabric or in the Altera Arria V FPGA which includes a 800 Mhz dual-core ARM Cortex-A9 MPCore.
The Atmel FPSLIC is another such device, which uses an AVR processor in combination with Atmel's programmable
logic architecture. The Actel SmartFusion devices incorporate an ARM Cortex-M3 hard processor core (with up to
512 kB of flash and 64 kB of RAM) and analog peripherals such as a multi-channel ADC and DACs to their flash-
based FPGA fabric.
In 2010, Xilinx Inc introduced the first All Programable System on a Chip branded Zynq™-7000 that fused features of
an ARM high-end microcontroller (hard-core implementations of a 32-bit processor, memory, and I/O) with an FPGA
fabric to make FPGAs easier for embedded designers to use. By incorporating the ARM processor-based platform into
a 28 nm FPGA family, the extensible processing platform enables system architects and embedded software developers
to apply a combination of serial and parallel processing to address the challenges they face in designing today's
embedded systems, which must meet ever-growing demands to perform highly complex functions. By allowing them to
design in a familiar ARM environment, embedded designers benefit from multiple advantages including: decreased
time-to-market, significantly reduced power, reduced BOM (bill of materials) cost, etc. These are among many
advantages of an All Programmable FPGA platform compared to more traditional design cycles associated with ASICs.
An alternate approach to using hard-macro processors is to make use of soft processor cores that are implemented
within the FPGA logic. Nios II,MicroBlaze and Mico32 are examples of popular softcore processors.
As previously mentioned, many modern FPGAs have the ability to be reprogrammed at "run time," and this is leading
to the idea of reconfigurable computing or reconfigurable systems — CPUs that reconfigure themselves to suit the task
at hand.
Additionally, new, non-FPGA architectures are beginning to emerge. Software-configurable microprocessors such as
the Stretch S5000 adopt a hybrid approach by providing an array of processor cores and FPGA-like programmable
cores on the same chip.
VLSI DESIGN FLOW
VHDL is a fairly general-purpose language, and it doesn't require a simulator on which to run the code.
There are a lot of VHDL compilers, which build executable binaries. It can read and write files on the host computer,
so a VHDL program can be written that generates another VHDL program to be incorporated in the design being
developed..
Specifications
High level design
Low level design
RTL coding
Functional
Verification
Gate level synthesis

Logic synthesis
Programming into
FPGA
Typical Design flow
Specification:
This is the stage at which we define what are the important parameters of the system design that you are
planning to design. A simple example would be: I want to design a counter; it should be 4 bit wide, should have
synchronous reset, with active high enable; when reset is active, counter output should go to "0".
High Level Design

This is the stage at which you define various blocks in the design and how they communicate. Let's assume that we
need to design a microprocessor: high level design means splitting the design into blocks based on their function; in our
case the blocks are registers, ALU, Instruction Decode, Memory Interface, etc.
I8155 High Level Block Diagram
Micro Design/Low level design

Low-level design or Micro design is the phase in which the designer describes how each block is
implemented. It contains details of State machines, counters, Mux, decoders, internal registers. It is always a good idea
to draw waveforms at various interfaces. This is the phase where one spends lot of time.
Sample Low-level design

RTL Coding
In RTL coding, Micro design is converted into Verilog/VHDL code, using synthesizable constructs of the
language. Normally we like to lint the code, before starting verification or synthesis.
Simulation
Simulation is the process of verifying the functional characteristics of models at any level of abstraction. We use
simulators to simulate the Hardware models. To test if the RTL code meets the functional requirements of the
specification, we must see if all the RTL blocks are functionally correct. To achieve this we need to write a testbench,
which generates clk, reset and the required test vectors. A sample testbench for a counter is shown below. Normally we
spend 60-70% of time in design verification
simulation
We use the waveform output from the simulator to see if the DUT (Device Under Test) is functionally correct.
Most of the simulators come with a waveform viewer. As design becomes complex, we write self checking testbench,
where testbench applies the test vector, then compares the output of DUT with expected values.
There is another kind of simulation, called timing simulation, which is done after synthesis or after P&R (Place
and Route). Here we include the gate delays and wires delays and see if DUT works at rated clock speed. This is also
called as SDF simulation or gate level simulation.
COUNTER SIMULATIOIN
Counter Waveform
SYNTHESIS
Synthesis is the process in which synthesis tools like design compiler or Synplify take RTL in Verilog or VHDL,
target technology, and constrains as input and maps the RTL to target technology primitives. Synthesis tool, after
mapping the RTL to gates, also do the minimal amount of timing analysis to see if the mapped design is meeting the
timing requirements. (Important thing to note is, synthesis tools are not aware of wire delays, they only know of gate
delays). After the synthesis there are a couple of things that are normally done before passing the net list to backend
(Place and Route).
VLSI Testing Problems:
VLSI – hence manually testing an IC is nearly impossible.

1. Test Generation
2. I/P combinatorial problem
3. Gate to I/O pin ratio problem
Test Generation:
1. VLSI has large number of gates, hence pushed automatic test generation times to weeks and moths
2. It is nearly impossible for an external tester to handle the test cases, and hence resulted in high cost in
terms of both Money and Time.
3. Another test generation problem is, ATPG work well for combinatorial logic
But, doesn’t suit for sequential logic (as it requires more space and comp time – as it has to evaluate all the states).
I/p combinatorial problem:
1. A combinatorial circuit, with N number of inputs will have a total set of 2 N number of test vectors for
exhaustive testing.
2. How about a microprocessor (VLSI) chip, which has some 32 pins (and again of course in different
combinations).
3. Selective no. of inputs can be applied (but it is not an exhaustive testing).
Gate to I/O pin ratio problem:
1. Typically, a VLSI IC will have 32 pins – 40 pins; but the number of gates inside the chip will be around
> 50,000.
2. To test all the gates inside, it is highly impossible; resulting in poor observability and controllability.
Boundary Scan, JTAG, IEEE 1149.1 Tutorial
Boundary Scan Overview:
Since its introduction in the early 1990s, boundary scan, also known as JTAG or IEEE 1149, has become an
essential tool used for testing boards in development, production and in the field. JTAG, boundary scan is a test
technique that enables information about the state of a board to be gained when it is not possible to gain access to all
the nodes that would be required if other means of test were used.
In view of the way in which the density of boards has been increasing in recent years, it is normally very difficult
to be able to probe electronic circuits and gain the information that is required to test these boards. As JTAG, boundary
scan enables much of a board to be tested with only minimal access, it is now widely used for the test of electronic
circuits at all stages of their life. In view of the fact that other forms of test require access either in terms of bed of nails
fixtures, while others need to probe a variety of places on the board, boundary scan offers a unique solution to many
test requirements.
Although the JTAG, boundary scan technique is aimed at testing circuits, its flexibility enables it to be used for a
wide variety of applications including test applications:
 System level test

 BIST access
 Memory testing
 Flash programming
 FPGA / CPLD programming
 CPU emulation
While testing remains the major application for boundary scan, it can be see that it is also useful in other
applications as well. In view of its flexibility, the technique is widely used, and a powerful tool in both development
and production applications.
Boundary scan history:
With the problem of lack of access to boards starting to become a problem, a group known as the Joint Test Action
Group (JTAG) was set up in 1985. Its aim was to address the issues being faced by electronics manufacturers in test
strategies and to enable tests to be undertaken where no other technologies could gain access.
The original goal for boundary scan was to complement existing techniques including in-circuit test, functional
built in test and other techniques and to provide a standard that would enable the testing digital, analogue and mixed
signal circuits.
The standard for boundary scan that was devised has been adopted by the Institute or Electrical and Electronics
Engineers, IEEE in the USA as IEEE 1149. The first issue of the standard, IEEE 1149, was in 1990. The stated purpose
of IEEE 1149 was to test the interconnections between integrated circuits mounted on boards, modules, hybrids and
other substrates. As most of the problems occurring with electronic circuits occur with the interconnections, the IEEE
1149 test strategy would reveal most of the problems.
In 1993, a revised version of the boundary scan, IEEE 1149 standard was issued which contained many
clarifications, enhancements and corrections. Then in 1994, a further issue of the IEEE 1149 standard took place. This
introduced the Boundary Scan Description Language, BSDL. This enabled boundary scan tests to be written in a
common language, thereby improving the way in which tests could be written and code re-used, thereby saving
development time.
Boundary scan basics:

The JTAG, boundary scan test technique uses a shift register latch cell built into each external connection of every
boundary scan compatible device. One boundary scan cell is included in the integrated circuit line adjacent to each I/O
pin, and when used in the shift register mode it can transfer data along to the next cell in the device. There are defined
entry and exit points for the data to enter and exit the device, and it is therefore possible to chain several devices
together.
Under normal operating conditions the cell is set so that it has no effect and it becomes invisible. However when
the device is set to test mode, it permits a serial data stream (test vector) to be passed from one shift register latch cell
to the next. Boundary-scan cells in a device can capture data from integrated circuit line, or force data onto them. In
this way a test system that can input a data stream to the shift register chain can set up states on the board, and also
monitor data. By setting up one serial data stream, latching this into place, and then monitor the returning data stream,
it is possible to gain access to the circuits on the board and check that a returning data stream is what is expected. If it
is, then the test can pass, but if not the boundary scan system has detected and problem that can be further investigated.
There are a number of boundary scan, IEEE 1149 control and data lines. There lines known as TCK, TMS and the
optional TRST line are connected in parallel to the chips in the boundary scan chain. Connections designated TDI
(input) and TDO (output) are daisy chained together to provide a path around the boundary scan chips for the data.
Data is sent into the TDI of the first chip, and then TDO from the first chip is connected to TDI of the next and so forth.
Finally the data is taken from the TDO of the last IC in the daisy chain.
 TAP Test Access Port - The pins associated with the test access controller.
 TCK Test Clock - this pin is the clock signal used for ensuring the timing of the boundary scan system. The
TDI shifts values into the appropriate register on the rising edge of TCK. The selected register contents shift out onto
TDO on the falling edge of TCK.
 TDI Test Data Input - Test instructions shift into the device through this pin.
 TDO Test Data Output - This pin provides data from the boundary scan registers, i.e. test data shifts out on
this pin.
 TMS Test Mode Select - This input which also clocks through on the rising edge of TCK determines the state
of the TAP controller.
 TRST Test Reset - This is an optional active low test reset pin. It permits asynchronous TAP controller
initialization without affecting other device or system logic.
This technique can obviously only be used with integrated circuits that have the boundary scan cells included in
the chip. Many of the smaller devices do not have them, but larger chips including memory devices, microprocessors
and the like often do. When designing a board, an early decision about the way in which it will be tested is needed. If
boundary scan techniques are to be used, then devices incorporating boundary scan facilities must be chosen.
Applications for boundary scan:

JTAG, boundary scan is an ideal test tool for use in many applications. The most obvious applications for
boundary scan are within the production environment. Here the boards can be tested and problems that might otherwise
go un-detected because of lack of test access can be adequately tested. In fact boundary scan technology is being
combined with other technologies to provide what is termed a combinational tester.
In addition to being used in production test, boundary scan, JTAG, IEEE 1149, can also be used in a variety of
other test scenarios, including product development and debugging as well as field service. This means that the
boundary scan code can be re-used for test areas, and hence the cost can be split over these applications. Not only does
this indicate that boundary scan is a powerful tool, but it also makes it financially attractive.
Programme generation:
One of the chief costs for any development these days is the cost of the software, and this is particularly true for
boundary scan where there is little hardware. This means that any savings that can be made in the time taken for the
software development can significantly reduce the costs. Accordingly a Test Programme Generator (TPG) is an integral
part of a boundary scan system.
Typically the test programme generator requires the net-list of the Unit Under Test (UUT) and the Boundary Scan
Description Language (BSDL) files of the boundary scan components contained within the circuit. With this
information it is possible for the test programme generator to create the test patterns used for the test. These allow the
system to detect and isolate any faults for all boundary-scan testable nets within the circuit. It is also possible for the
test programme generator to create test vectors that enable the system to detect faults on the nodes or pins components
non-boundary scan components that are surrounded by boundary scan devices.
JTAG Summary:
JTAG, boundary scan, IEEE 1149 is a test technique that is now well established. Although it requires test
programmes to be generated before it can be used, it nevertheless provides a very cost effective method of gaining
access for test vectors into an electronic circuit board. With circuit board real estate being at a premium, the cost of
adding probe or access points for other type of electronic test technologies would be prohibitive, if indeed it were
possible.
SOFTWARE TOOLS:
Simulation: Xilinx ISE 13.2i
Synthesis: Xilinx XST

CHAPTER 6
IMPLEMENTING Verilog HDL DESIGNS USING XILINX ISE: A BRIEF TUTORIAL
Xilinx, Inc. is the world's largest supplier of programmable logic devices, the inventor of the field programmable gate
array (FPGA) and the first semiconductor company with a fabless manufacturing model. It was founded in Silicon
Valley in 1984.
The ISE Design Suite is the central electronic design automation (EDA) product family sold by Xilinx. The ISE Design
Suite features include design entry and synthesis supporting Verilog or VHDL, place-and-route (PAR), completed
verification and debug using ChipScope Pro tools, and creation of the bit files that are used to configure the chip.
3.1.1XILINX DESIGN FLOW
Figure
3.1.1 Xilinx Design
Flow
3.1.2 XILINX ISE

PROJECT
NAVIGATOR
Figure 3.1.2 Xilinx ISE Project Navigator
3.1.3 DESIGN IMPLEMENTATION USING XILINX ISE :
Figure 3.1.3(a) Xilinx ISE Design Implementation

Figure 3.1.3(b) Xilinx ISE Design Implementation status
3.1.4 XILINX SIMULATION
Figure 3.1.4 Xilinx ISE Design Simulation
3.1.5 XILINX ISE SUBPROCESSES

Figure 3.1.5 Xilinx ISE Design Subprocesses view
3.1.6 XILINX ISE DESIGN SUMMARY DATA VIEW
Figure 3.1.6 Xilinx ISE Design Summary

3.1.7 XILINX ISE PROGRAMMING IN FPGA
Figure 3.1.7 Xilinx ISE Design programming in FPGA
Conclusion
The Advanced Encryption Standard algorithm is an iterative private key symmetric block cipher that can process data blocks of
128 bits through the use of cipher keys with lengths of 128, 192, and 256 bits. In this paper we have shown the simulation
results of AES encryption and AES decryption. We will be doing the full outer round pipelining for all the 10 rounds in it to
optimize the delay. The AES algorithm implemented using Xilinx 13.2.
REFERENCES
[1] National Institute Of Standards And Technology, “Advanced Encryption Standard(AES),”2001.

[2] t. A. Pham, s. H. Mohammad and h. Yu, "area and power optimization for AES encryption module
implementation on FPGA," in 18th international conference on automation and computing, pp. 1-6,
September 2012.
[3] M,Anitha Christy,S.Sridevi Sathya Priya,N.M. Siva Mangai, “Design And Implementation Of Low Power
Advanced Encryption Standard S-Box Using Pass Transistor Xor-And Logic,”In International Conference,Feb
2014.
[4] Edwin Nc Mui, "Practical Implementation Of Rijndael S-Box Using Combinational Logic", Custom R&D Engineer
Texco Enterprise Pvt.Ltd.
[5] Kit Choy Xintong "Understanding Aes Mix-Columns Transformation Calculation “,University Of Wollongong.
[6] Kit Choy Xintong "Understanding Aes Inverse Mix-Columns Transformation Calculation “,University Of
Wollongong.
[7] X. Zhang And K. K. Parhi, "High Speed Vlsi Architectures For The Aes Algorithm," Ieee Transactions On Very
Large Scale Integration (Vlsi) Systems, Vol. 12, No. 9, Pp. 957-967, September 2004.
[8] Sushma R Huddar, Sudhir Rao Rupanagudi, Ramya Ravi, Shikha Yadav & Sanjay Jain“Novel Architecture for
Inverse Mix Columns for AES using Ancient Vedic Mathematics on FPGA.”2013 IEEE.
[9] Salma Hesham, Mohamed A. Abd El Ghany.” High Throughput Architecture for the Advanced Encryption
Standard Algorithm”2014 IEEE.es and Practices” fourth edition
CODE
AES
`timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 17:06:33 08/16/2012
// Design Name:
// Module Name: aes
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
//////////////////////////////////////////////////////////////////////////////////
module aes(aesin, keyin,clk,rst,keyout,aesout);
input [127:0] aesin;
input [127:0] keyin;
input clk,rst;
output [127:0] keyout;
output [127:0] aesout;
wire [127:0] r1,r2,r3,r4,r5,r6,r7,r8,r9;
wire [127:0] k1,k2,k3,k4,k5,k6,k7,k8,k9;
wire [127:0] rr1,rr2,rr3,rr4,rr5,rr6,rr7,rr8,rr9,rr10;
wire [127:0] kk1,kk2,kk3,kk4,kk5,kk6,kk7,kk8,kk9,kk10;
round u1(aesin,keyin,8'h01,k1,r1);
reg3 z1(k1,r1,clk,rst,kk1,rr1);
round1 u2(rr1,kk1,8'h02,k2,r2);
round1 u9(rr8,kk8,8'h1b,k9,r9);
round2 u10(rr9,kk9,8'h36,kk10,rr10);
reg3 z10(kk10,rr10,clk,rst,keyout,aesout);
endmodule
AES existing
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:08:15 02/11/2016
// Design Name:
// Module Name: aesexisting
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module aesexisting(aesin, keyin,clk,rst,keyout,aesout);
input clk,rst;
wire [127:0] r1,r2,r3,r4,r5,r6,r7,r8,r9;
wire [127:0] k1,k2,k3,k4,k5,k6,k7,k8,k9;
eround u1(aesin,keyin,8'h01,k1,r1);
eround1 u2(rr1,kk1,8'h02,k2,r2);
eround1 u9(rr8,kk8,8'h1b,k9,r9);
eround2 u10(rr9,kk9,8'h36,kk10,rr10);
reg3 z10(kk10,rr10,clk,rst,keyout,aesout);
endmodule
AES previous
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 11:31:08 02/12/2016
// Design Name:
// Module Name: aesprevious
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module aesprevious(aesin, keyin,keyout,aesout);
wire [127:0] r1,r2,r3,r4,r5,r6,r7,r8,r9;
wire [127:0] k1,k2,k3,k4,k5,k6,k7,k8,k9;
eround u1(aesin,keyin,8'h01,k1,r1);
eround1 u2(r1,k1,8'h02,k2,r2);
eround1 u9(r8,k8,8'h1b,k9,r9);
eround2 u10(r9,k9,8'h36,keyout,aesout);
endmodule
affinie transform
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 10:13:56 02/03/2015
// Design Name:
// Module Name: affinetransform
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module affinetransform(
input [7:0] a,
output [7:0] e
);
assign e[7]=a[7] ^ a[6] ^ a[5] ^ a[4] ^ a[3] ^ 1'b0;

assign e[6]= a[6] ^ a[5] ^ a[4] ^ a[3] ^ a[2] ^ 1'b1;
assign e[5]= a[5] ^ a[4] ^ a[3] ^ a[2] ^ a[1] ^ 1'b1;
assign e[4]=a[4] ^ a[3] ^ a[2] ^ a[1] ^ a[0] ^ 1'b0;
assign e[3]=a[7] ^ a[3] ^ a[2] ^ a[1] ^ a[0] ^ 1'b0;
assign e[2]=a[7] ^ a[6] ^ a[2] ^ a[1] ^ a[0] ^ 1'b0;
assign e[1]=a[7] ^ a[6] ^ a[5] ^ a[1] ^ a[0] ^ 1'b1;
assign e[0]=a[7] ^ a[6] ^ a[5] ^ a[4] ^ a[0] ^ 1'b1;
endmodule
Decryption
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 16:26:13 02/11/2016
// Design Name:
// Module Name: decryption
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module decryption(aesin,keyin,keyout,aesout);
wire [127:0] r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13;

wire [127:0] k1,k2,k3,k4,k5,k6,k7,k8,k9,k10;
wire [127:0] invk1,invk2,invk3,invk4,invk5,invk6,invk7,invk8,invk9;
key1 q1(keyin,8'h01,k1);
key1 q2(k1,8'h02,k2);
key1 q3(k2,8'h04,k3);
key1 q4(k3,8'h08,k4);
key1 q5(k4,8'h10,k5);
key1 q6(k5,8'h20,k6);
key1 q7(k6,8'h40,k7);
key1 q8(k7,8'h80,k8);
key1 q9(k8,8'h1b,k9);
key1 q10(k9,8'h36,k10);
invmix e1(k9,invk1);
einv_round u1(aesin,k10,invk1,r1);
einv_round1 u2(r1,invk2,r2);
einv_round2 u10(r9,keyin,aesout);
assign keyout=keyin;
endmodule
Edecryption
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:12:06 02/12/2016
// Design Name:
// Module Name: edecryption
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module edecryption(aesin,keyin,clk,rst,keyout,aesout);
input clk,rst;

wire [127:0] k1,k2,k3,k4,k5,k6,k7,k8,k9,k10;
key1 q1(keyin,8'h01,k1);
key1 q2(k1,8'h02,k2);
key1 q3(k2,8'h04,k3);
key1 q4(k3,8'h08,k4);
key1 q5(k4,8'h10,k5);
key1 q6(k5,8'h20,k6);
key1 q7(k6,8'h40,k7);
key1 q8(k7,8'h80,k8);
key1 q9(k8,8'h1b,k9);
key1 q10(k9,8'h36,k10);
einv_round u1(aesin,k10,invk1,r1);
reg2 z1(r1,clk,rst,rr1);
einv_round1 u2(rr1,invk2,r2);
einv_round2 u10(rr9,keyin,rr10);
reg2 z10(rr10,clk,rst,aesout);
endmodule
envi_round
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:50:58 02/12/2016
// Design Name:
// Module Name: einv_round
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module einv_round(roundin,key,key1,roundout);
input [127:0] roundin;
input[127:0] key,key1;
output [127:0] roundout;
wire [127 :0]preout,subout,shiftout,mixcolout;
preroundoperaton q1(roundin,key,preout);
einv_subbyte q2(preout,subout);
inv_shiftrow q3(subout,shiftout);
invmix q4(shiftout,mixcolout);
assign roundout=mixcolout^key1;
endmodule
einv_round1
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:52:51 02/12/2016
// Design Name:
// Module Name: einv_round1
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module einv_round1(roundin,key,roundout);
input[127:0] key;
wire [127 :0]subout,shiftout,mixcolout;
einv_subbyte q2(roundin,subout);
assign roundout=mixcolout^key;
endmodule
einv_round2
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:53:31 02/12/2016
// Design Name:
// Module Name: einv_round2
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module einv_round2(roundin,key,roundout);
input[127:0] key;
wire [127 :0]subout,shiftout;
einv_subbyte q2(roundin,subout);
assign roundout=shiftout^key;
endmodule
einv_subbyte
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:46:02 02/12/2016
// Design Name:
// Module Name: einv_subbyte
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module einv_subbyte(invsubin,invsubout);
input [127:0] invsubin;
output [127:0]invsubout;
wire [7:0]
a0,a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15,p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10
,p11,p12,p13,p14,p15;
assign a0=invsubin[7:0];
einvsbox w1(a0,p0);
einvsbox w2(a1,p1);
einvsbox w3(a2,p2);
einvsbox w4(a3,p3);
einvsbox w5(a4,p4);
einvsbox w6(a5,p5);
einvsbox w7(a6,p6);
einvsbox w8(a7,p7);
einvsbox w9(a8,p8);
einvsbox w10(a9,p9);
assign invsubout={p15,p14,p13,p12,p11,p10,p9,p8,p7,p6,p5,p4,p3,p2,p1,p0};
endmodule
einvsbox
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 16:24:05 02/11/2016
// Design Name:
// Module Name: einvsbox
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module einvsbox(a, b);
input [7:0] a;
output reg [7:0] b;
always@(*)
begin
case (a)
8'b00000000: b<= 8'h52;

8'b00000001: b<= 8'h09;
8'b00000010: b<= 8'h6a;
8'b00000011: b<= 8'hd5;
8'b00000100: b<= 8'h30;
8'b00000101: b<= 8'h36;
8'b00000110: b<= 8'ha5;
8'b00000111: b<= 8'h38;
8'b00001000: b<= 8'hbf;
8'b00001001: b<= 8'h40;
8'b00001010: b<= 8'ha3;
8'b00001011: b<= 8'h9e;
8'b00001100: b<= 8'h81;
8'b00001101: b<= 8'hf3;
8'b00001110: b<= 8'hd7;
8'b00001111: b<= 8'hfb;
8'b00010000: b<= 8'h7c;
8'b00010001: b<= 8'he3;
8'b00010010: b<= 8'h39;
8'b00010011: b<= 8'h82;
8'b00010100: b<= 8'h9b;
8'b00010101: b<= 8'h2f;
8'b00010110: b<= 8'hff;
8'b00010111: b<= 8'h87;
8'b00011000: b<= 8'h34;
8'b00011001: b<= 8'h8e;
8'b00011010: b<= 8'h43;
8'b00011011: b<= 8'h44;
8'b00011100: b<= 8'hc4;
8'b00011101: b<= 8'hde;
8'b00011110: b<= 8'he9;
8'b00011111: b<= 8'hcb;
8'b00100000: b<= 8'h54;
8'b00100001: b<= 8'h7b;
8'b00100010: b<= 8'h94;
8'b00100011: b<= 8'h32;
8'b00100100: b<= 8'ha6;
8'b00100101: b<= 8'hc2;
8'b00100110: b<= 8'h23;
8'b00100111: b<= 8'h3d;
8'b00101000: b<= 8'hee;
8'b00101001: b<= 8'h4c;
8'b00101010: b<= 8'h95;
8'b00101011: b<= 8'h0b;
8'b00101100: b<= 8'h42;
8'b00101101: b<= 8'hfa;
8'b00101110: b<= 8'hc3;
8'b00101111: b<= 8'h4e;
8'b00110000: b<= 8'h08;
8'b00110001: b<= 8'h2e;
8'b00110010: b<= 8'ha1;
8'b00110011: b<= 8'h66;
8'b00110100: b<= 8'h28;
8'b00110101: b<= 8'hd9;
8'b00110110: b<= 8'h24;
8'b00110111: b<= 8'hb2;
8'b00111000: b<= 8'h76;
8'b00111001: b<= 8'h5b;
8'b00111010: b<= 8'ha2;
8'b00111011: b<= 8'h49;
8'b00111100:b<= 8'h6d;
8'b00111101:b<= 8'h8b;
8'b00111110: b<= 8'hd1;
8'b00111111: b<= 8'h25;
8'b01000000: b<= 8'h72;
8'b01000001: b<= 8'hf8;
8'b01000010: b<= 8'hf6;
8'b01000011: b<= 8'h64;
8'b01000100: b<= 8'h86;
8'b01000101: b<= 8'h68;
8'b01000110: b<= 8'h98;
8'b01000111: b<= 8'h16;
8'b01001000: b<= 8'hd4;
8'b01001001: b<= 8'ha4;
8'b01001010: b<= 8'h5c;
8'b01001011: b<= 8'hcc;
8'b01001100: b<= 8'h5d;
8'b01001101: b<= 8'h65;
8'b01001110: b<= 8'hb6;
8'b01001111: b<= 8'h92;
8'b01010000: b<= 8'h6c;
8'b01010001: b<= 8'h70;
8'b01010010: b<= 8'h48;
8'b01010011: b<= 8'h50;
8'b01010100: b<= 8'hfd;
8'b01010101: b<= 8'hed;
8'b01010110: b<= 8'hb9;
8'b01010111: b<= 8'hda;
8'b01011000: b<= 8'h5e;
8'b01011001: b<= 8'h15;
8'b01011010: b<= 8'h46;
8'b01011011: b<= 8'h57;
8'b01011100: b<= 8'ha7;
8'b01011101: b<= 8'h8d;
8'b01011110: b<= 8'h9d;
8'b01011111: b<= 8'h84;
8'b01100000: b<= 8'h90;
8'b01100001: b<= 8'hd8;
8'b01100010: b<= 8'hab;
8'b01100011: b<= 8'h00;
8'b01100100: b<= 8'h8c;
8'b01100101: b<= 8'hbc;
8'b01100110: b<= 8'hd3;
8'b01100111: b<= 8'h0a;
8'b01101000: b<= 8'hf7;
8'b01101001: b<= 8'he4;
8'b01101010: b<= 8'h58;
8'b01101011: b<= 8'h05;
8'b01101100: b<= 8'hb8;
8'b01101101: b<= 8'hb3;
8'b01101110: b<= 8'h45;
8'b01101111: b<= 8'h06;
8'b01110000: b<= 8'hd0;
8'b01110001: b<= 8'h2c;
8'b01110010: b<= 8'h1e;
8'b01110011: b<= 8'h8f;
8'b01110100: b<= 8'hca;
8'b01110101: b<= 8'h3f;
8'b01110110: b<= 8'h0f;
8'b01110111: b<= 8'h02;
8'b01111000: b<= 8'hc1;
8'b01111001: b<= 8'haf;
8'b01111010:b<= 8'hbd;
8'b01111011: b<= 8'h03;
8'b01111100: b<= 8'h01;
8'b01111101: b<= 8'h13;
8'b01111110: b<= 8'h8a;
8'b01111111: b<= 8'h6b;
8'b10000000: b<= 8'h3a;
8'b10000001: b<= 8'h91;
8'b10000010: b<= 8'h11;
8'b10000011: b<= 8'h41;
8'b10000100: b<= 8'h4f;
8'b10000101: b<= 8'h67;
8'b10000110: b<= 8'hdc;
8'b10000111: b<= 8'hea;
8'b10001000: b<= 8'h97;
8'b10001001: b<= 8'hf2;
8'b10001010: b<= 8'hcf;
8'b10001011: b<= 8'hce;
8'b10001100: b<= 8'hf0;
8'b10001101: b<= 8'hb4;
8'b10001110: b<= 8'he6;
8'b10001111: b<= 8'h73;
8'b10010000: b<= 8'h96;
8'b10010001: b<= 8'hac;
8'b10010010: b<= 8'h74;
8'b10010011: b<= 8'h22;
8'b10010100: b<= 8'he7;
8'b10010101: b<= 8'had;
8'b10010110: b<= 8'h35;
8'b10010111: b<= 8'h85;
8'b10011000: b<= 8'he2;
8'b10011001: b<= 8'hf9;
8'b10011010: b<= 8'h37;
8'b10011011: b<= 8'he8;
8'b10011100: b<= 8'h1c;
8'b10011101: b<= 8'h75;
8'b10011110: b<= 8'hdf;
8'b10011111: b<= 8'h6e;
8'b10100000: b<= 8'h47;
8'b10100001: b<= 8'hf1;
8'b10100010: b<= 8'h1a;
8'b10100011: b<= 8'h71;
8'b10100100: b<= 8'h1d;
8'b10100101: b<= 8'h29;
8'b10100110: b<= 8'hc5;
8'b10100111: b<= 8'h89;
8'b10101000: b<= 8'h6f;
8'b10101001: b<= 8'hb7;
8'b10101010: b<= 8'h62;
8'b10101011: b<= 8'h0e;
8'b10101100: b<= 8'haa;
8'b10101101: b<= 8'h18;
8'b10101110: b<= 8'hbe;
8'b10101111: b<= 8'h1b;
8'b10110000: b<= 8'hfc;
8'b10110001: b<= 8'h56;
8'b10110010: b<= 8'h3e;
8'b10110011: b<= 8'h4b;
8'b10110100: b<= 8'hc6;
8'b10110101: b<= 8'hd2;
8'b10110110: b<= 8'h79;
8'b10110111: b<= 8'h20;
8'b10111000: b<= 8'h9a;
8'b10111001: b<= 8'hdb;
8'b10111010: b<= 8'hc0;
8'b10111011: b<= 8'hfe;
8'b10111100: b<= 8'h78;
8'b10111101: b<= 8'hcd;
8'b10111110: b<= 8'h5a;
8'b10111111: b<= 8'hf4;
8'b11000000: b<= 8'h1f;
8'b11000001: b<= 8'hdd;
8'b11000010: b<= 8'ha8;
8'b11000011: b<= 8'h33;
8'b11000100: b<= 8'h88;
8'b11000101: b<= 8'h07;
8'b11000110: b<= 8'hc7;
8'b11000111: b<= 8'h31;
8'b11001000: b<= 8'hb1;
8'b11001001: b<= 8'h12;
8'b11001010: b<= 8'h10;
8'b11001011: b<= 8'h59;
8'b11001100: b<= 8'h27;
8'b11001101: b<= 8'h80;
8'b11001110: b<= 8'hec;
8'b11001111: b<= 8'h5f;
8'b11010000: b<= 8'h60;
8'b11010001: b<= 8'h51;
8'b11010010: b<= 8'h7f;
8'b11010011: b<= 8'ha9;
8'b11010100: b<= 8'h19;
8'b11010101: b<= 8'hb5;
8'b11010110: b<= 8'h4a;
8'b11010111: b<= 8'h0d;
8'b11011000: b<= 8'h2d;
8'b11011001: b<= 8'he5;
8'b11011010: b<= 8'h7a;
8'b11011011: b<= 8'h9f;
8'b11011100: b<= 8'h93;
8'b11011101: b<= 8'hc9;
8'b11011110: b<= 8'h9c;
8'b11011111: b<= 8'hef;
8'b11100000: b<= 8'ha0;
8'b11100001: b<= 8'he0;
8'b11100010: b<= 8'h3b;
8'b11100011: b<= 8'h4d;
8'b11100100: b<= 8'hae;
8'b11100101: b<= 8'h2a;
8'b11100110: b<= 8'hf5;
8'b11100111: b<= 8'hb0;
8'b11101000: b<= 8'hc8;
8'b11101001: b<= 8'heb;
8'b11101010: b<= 8'hbb;
8'b11101011: b<= 8'h3c;
8'b11101100: b<= 8'h83;
8'b11101101: b<= 8'h53;
8'b11101110: b<= 8'h99;
8'b11101111: b<= 8'h61;
8'b11110000: b<= 8'h17;
8'b11110001: b<= 8'h2b;
8'b11110010: b<= 8'h04;
8'b11110011: b<= 8'h7e;
8'b11110100: b<= 8'hba;
8'b11110101: b<= 8'h77;
8'b11110110: b<= 8'hd6;
8'b11110111: b<= 8'h26;
8'b11111000: b<= 8'he1;
8'b11111001: b<= 8'h69;
8'b11111010: b<= 8'h14;
8'b11111011: b<= 8'h63;
8'b11111100: b<= 8'h55;
8'b11111101: b<= 8'h21;
8'b11111110: b<= 8'h0c;
8'b11111111: b<= 8'h7d;
default: b<= 8'h00;
endcase
end
endmodule
eround

//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:19:45 02/11/2016
// Design Name:
// Module Name: eround
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module eround(roundin,key,rcon,roundkeyout,roundout);
input[127:0] key;
input[7:0] rcon;
output[127:0] roundkeyout;
wire [127 :0]preout,subout,shiftout,mixcolout,keyout;
subbyte_p q2(preout,subout);
shiftrows q3(subout,shiftout);
mixcoloums q4(shiftout,mixcolout);
key1 q5(key,rcon,keyout);
assign roundout=mixcolout^keyout;
assign roundkeyout=keyout;
endmodule
eround1
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:20:59 02/11/2016
// Design Name:
// Module Name: eround1
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module eround1(roundin,key,rcon,roundkeyout,roundout);
input[127:0] key;
input[7:0] rcon;
subbyte_p q2(roundin,subout);
endmodule
eround2
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:22:22 02/11/2016
// Design Name:
// Module Name: eround2
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module eround2(roundin,key,rcon,roundkeyout,roundout);
input[127:0] key;
input[7:0] rcon;
wire [127 :0]preout,subout,shiftout,keyout;
subbyte_p q2(roundin,subout);
assign roundout=shiftout^keyout;
endmodule
esbox
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:25:24 02/11/2016
// Design Name:
// Module Name: esbox
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module esbox(a, b);
input [7:0] a;
output reg [7:0] b;
always@(*)
begin
case (a)
8'b00000000:b<=8'h63;
8'b00000001:b<=8'h7c;
8'b00000010:b<=8'h77;
8'b00000011:b<=8'h7b;
8'b00000100:b<=8'hf2;
8'b00000101:b<=8'h6b;
8'b00000110:b<=8'h6f;
8'b00000111:b<=8'hc5;
8'b00001000:b<=8'h30;
8'b00001001:b<=8'h01;
8'b00001010:b<=8'h67;
8'b00001011:b<=8'h2b;
8'b00001100:b<=8'hfe;
8'b00001101:b<=8'hd7;
8'b00001110:b<=8'hab;
8'b00001111:b<=8'h76;
8'b00010000:b<=8'hca;
8'b00010001:b<=8'h82;
8'b00010010:b<=8'hc9;
8'b00010011:b<=8'h7d;
8'b00010100:b<=8'hfa;
8'b00010101:b<=8'h59;
8'b00010110:b<=8'h47;
8'b00010111:b<=8'hf0;
8'b00011000:b<=8'had;
8'b00011001:b<=8'hd4;
8'b00011010:b<=8'ha2;
8'b00011011:b<=8'haf;
8'b00011100:b<=8'h9c;
8'b00011101:b<=8'ha4;
8'b00011110:b<=8'h72;
8'b00011111:b<=8'hc0;
8'b00100000:b<=8'hb7;
8'b00100001:b<=8'hfd;
8'b00100010:b<=8'h93;
8'b00100011:b<=8'h26;
8'b00100100:b<=8'h36;
8'b00100101:b<=8'h3f;
8'b00100110:b<=8'hf7;
8'b00100111:b<=8'hcc;
8'b00101000:b<=8'h34;
8'b00101001:b<=8'ha5;
8'b00101010:b<=8'he5;
8'b00101011:b<=8'hf1;
8'b00101100:b<=8'h71;
8'b00101101:b<=8'hd8;
8'b00101110:b<=8'h31;
8'b00101111:b<=8'h15;
8'b00110000:b<=8'h04;
8'b00110001:b<=8'hc7;
8'b00110010:b<=8'h23;
8'b00110011:b<=8'hc3;
8'b00110100:b<=8'h18;
8'b00110101:b<=8'h96;
8'b00110110:b<=8'h05;
8'b00110111:b<=8'h9a;
8'b00111000:b<=8'h07;
8'b00111001:b<=8'h12;
8'b00111010:b<=8'h80;
8'b00111011:b<=8'he2;
8'b00111100:b<=8'heb;
8'b00111101:b<=8'h27;
8'b00111110:b<=8'hb2;
8'b00111111:b<=8'h75;
8'b01000000:b<=8'h09;
8'b01000001:b<=8'h83;
8'b01000010:b<=8'h2c;
8'b01000011:b<=8'h1a;
8'b01000100:b<=8'h1b;
8'b01000101:b<=8'h6e;
8'b01000110:b<=8'h5a;
8'b01000111:b<=8'ha0;
8'b01001000:b<=8'h52;
8'b01001001:b<=8'h3b;
8'b01001010:b<=8'hd6;
8'b01001011:b<=8'hb3;
8'b01001100:b<=8'h29;
8'b01001101:b<=8'he3;
8'b01001110:b<=8'h2f;
8'b01001111:b<=8'h84;
8'b01010000:b<=8'h53;
8'b01010001:b<=8'hd1;
8'b01010010:b<=8'h00;
8'b01010011:b<=8'hed;
8'b01010100:b<=8'h20;
8'b01010101:b<=8'hfc;
8'b01010110:b<=8'hb1;
8'b01010111:b<=8'h5b;
8'b01011000:b<=8'h6a;
8'b01011001:b<=8'hcb;
8'b01011010:b<=8'hbe;
8'b01011011:b<=8'h39;
8'b01011100:b<=8'h4a;
8'b01011101:b<=8'h4c;
8'b01011110:b<=8'h58;
8'b01011111:b<=8'hcf;
8'b01100000:b<=8'hd0;
8'b01100001:b<=8'hef;
8'b01100010:b<=8'haa;
8'b01100011:b<=8'hfb;
8'b01100100:b<=8'h43;
8'b01100101:b<=8'h4d;
8'b01100110:b<=8'h33;
8'b01100111:b<=8'h85;
8'b01101000:b<=8'h45;
8'b01101001:b<=8'hf9;
8'b01101010:b<=8'h02;
8'b01101011:b<=8'h7f;
8'b01101100:b<=8'h50;
8'b01101101:b<=8'h3c;
8'b01101110:b<=8'h9f;
8'b01101111:b<=8'ha8;
8'b01110000:b<=8'h51;
8'b01110001:b<=8'ha3;
8'b01110010:b<=8'h40;
8'b01110011:b<=8'h8f;
8'b01110100:b<=8'h92;
8'b01110101:b<=8'h9d;
8'b01110110:b<=8'h38;
8'b01110111:b<=8'hf5;
8'b01111000:b<=8'hbc;
8'b01111001:b<=8'hb6;
8'b01111010:b<=8'hda;
8'b01111011:b<=8'h21;
8'b01111100:b<=8'h10;
8'b01111101:b<=8'hff;
8'b01111110:b<=8'hf3;
8'b01111111:b<=8'hd2;
8'b10000000:b<=8'hcd;
8'b10000001:b<=8'h0c;
8'b10000010:b<=8'h13;
8'b10000011:b<=8'hec;
8'b10000100:b<=8'h5f;
8'b10000101:b<=8'h97;
8'b10000110:b<=8'h44;
8'b10000111:b<=8'h17;
8'b10001000:b<=8'hc4;
8'b10001001:b<=8'ha7;
8'b10001010:b<=8'h7e;
8'b10001011:b<=8'h3d;
8'b10001100:b<=8'h64;
8'b10001101:b<=8'h5d;
8'b10001110:b<=8'h19;
8'b10001111:b<=8'h73;
8'b10010000:b<=8'h60;
8'b10010001:b<=8'h81;
8'b10010010:b<=8'h4f;
8'b10010011:b<=8'hdc;
8'b10010100:b<=8'h22;
8'b10010101:b<=8'h2a;
8'b10010110:b<=8'h90;
8'b10010111:b<=8'h88;
8'b10011000:b<=8'h46;
8'b10011001:b<=8'hee;
8'b10011010:b<=8'hb8;
8'b10011011:b<=8'h14;
8'b10011100:b<=8'hde;
8'b10011101:b<=8'h5e;
8'b10011110:b<=8'h0b;
8'b10011111:b<=8'hdb;
8'b10100000:b<=8'he0;
8'b10100001:b<=8'h32;
8'b10100010:b<=8'h3a;
8'b10100011:b<=8'h0a;
8'b10100100:b<=8'h49;
8'b10100101:b<=8'h06;
8'b10100110:b<=8'h24;
8'b10100111:b<=8'h5c;
8'b10101000:b<=8'hc2;
8'b10101001:b<=8'hd3;
8'b10101010:b<=8'hac;
8'b10101011:b<=8'h62;
8'b10101100:b<=8'h91;
8'b10101101:b<=8'h95;
8'b10101110:b<=8'he4;
8'b10101111:b<=8'h79;
8'b10110000:b<=8'he7;
8'b10110001:b<=8'hc8;
8'b10110010:b<=8'h37;
8'b10110011:b<=8'h6d;
8'b10110100:b<=8'h8d;
8'b10110101:b<=8'hd5;
8'b10110110:b<=8'h4e;
8'b10110111:b<=8'ha9;
8'b10111000:b<=8'h6c;
8'b10111001:b<=8'h56;
8'b10111010:b<=8'hf4;
8'b10111011:b<=8'hea;
8'b10111100:b<=8'h65;
8'b10111101:b<=8'h7a;
8'b10111110:b<=8'hae;
8'b10111111:b<=8'h08;
8'b11000000:b<=8'hba;
8'b11000001:b<=8'h78;
8'b11000010:b<=8'h25;
8'b11000011:b<=8'h2e;
8'b11000100:b<=8'h1c;
8'b11000101:b<=8'ha6;
8'b11000110:b<=8'hb4;
8'b11000111:b<=8'hc6;
8'b11001000:b<=8'he8;
8'b11001001:b<=8'hdd;
8'b11001010:b<=8'h74;
8'b11001011:b<=8'h1f;
8'b11001100:b<=8'h4b;
8'b11001101:b<=8'hbd;
8'b11001110:b<=8'h8b;
8'b11001111:b<=8'h8a;
8'b11010000:b<=8'h70;
8'b11010001:b<=8'h3e;
8'b11010010:b<=8'hb5;
8'b11010011:b<=8'h66;
8'b11010100:b<=8'h48;
8'b11010101:b<=8'h03;
8'b11010110:b<=8'hf6;
8'b11010111:b<=8'h0e;
8'b11011000:b<=8'h61;
8'b11011001:b<=8'h35;
8'b11011010:b<=8'h57;
8'b11011011:b<=8'hb9;
8'b11011100:b<=8'h86;
8'b11011101:b<=8'hc1;
8'b11011110:b<=8'h1d;
8'b11011111:b<=8'h9e;
8'b11100000:b<=8'he1;
8'b11100001:b<=8'hf8;
8'b11100010:b<=8'h98;
8'b11100011:b<=8'h11;
8'b11100100:b<=8'h69;
8'b11100101:b<=8'hd9;
8'b11100110:b<=8'h8e;
8'b11100111:b<=8'h94;
8'b11101000:b<=8'h9b;
8'b11101001:b<=8'h1e;
8'b11101010:b<=8'h87;
8'b11101011:b<=8'he9;
8'b11101100:b<=8'hce;
8'b11101101:b<=8'h55;
8'b11101110:b<=8'h28;
8'b11101111:b<=8'hdf;
8'b11110000:b<=8'h8c;
8'b11110001:b<=8'ha1;
8'b11110010:b<=8'h89;
8'b11110011:b<=8'h0d;
8'b11110100:b<=8'hbf;
8'b11110101:b<=8'he6;
8'b11110110:b<=8'h42;
8'b11110111:b<=8'h68;
8'b11111000:b<=8'h41;
8'b11111001:b<=8'h99;
8'b11111010:b<=8'h2d;
8'b11111011:b<=8'h0f;
8'b11111100:b<=8'hb0;
8'b11111101:b<=8'h54;
8'b11111110:b<=8'hbb;
8'b11111111:b<=8'h16;
default b <= 8'h00;

endcase
end
endmodule
4bitxor
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 00:30:43 02/03/2015
// Design Name:
// Module Name: fourbitxor
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module fourbitxor(
input [3:0] a,
input [3:0] b,
output [3:0] c
);
assign c= a^b;
endmodule
gf_2
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 00:50:23 02/03/2015
// Design Name:
// Module Name: gf_2
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module gf_2(
input [1:0] a,
output [1:0] c
);
assign c[1]=a[1]^a[0];
assign c[0]=a[1];
endmodule
gf_2_2
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 00:55:36 02/03/2015
// Design Name:
// Module Name: gf_2_2
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module gf_2_2(
input [1:0] a,b,
output [1:0] c
);
wire a1,b1,a2,b2,ab1,ab2,b3;
assign a1=a[0] ^ a[1];

assign b1=b[0] ^ b[1];
assign a2=a[1] & b[1];
assign b2=a[0] & b[0];
assign ab1=a1 & b1;
assign ab2=ab1 ^ b2;
assign b3=b2^a2;
assign c={ab2,b3};
endmodule
gf_4
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 01:02:55 02/03/2015
// Design Name:
// Module Name: gf_4
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module gf_4(
input [3:0] a,b,
output [3:0] c
);
wire[1:0]a1,b1,aa,bb,ab,c1;
gf_2_2 q1(a[3 : 2],b[3 : 2],a1);

gf_2_2 q2(a[1 : 0],b[1 : 0],b1);
assign aa=a[3 : 2] ^ a[1 : 0];
assign bb=b[3 : 2] ^ b[1 : 0];
gf_2_2 q3(aa,bb,ab);
assign c[3 : 2]=ab ^ b1;
gf_2 q4(a1,c1);
assign c[1 : 0]=c1 ^ b1;
endmodule
inv_affine
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 10:36:04 02/03/2015
// Design Name:
// Module Name: inv_affine
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_affine(
input [7:0] a,
output [7:0] c
);
wire [7:0] c1;
inverse_isomapping d1(a,c1);
affinetransform d2(c1,c);
endmodule
inv_affine_trans
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 21:11:07 02/03/2015
// Design Name:
// Module Name: inv_affine_trans
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_affine_trans(
input [7:0] a,
output [7:0] e
);
assign e[7] = a[6] ^ a[4] ^ a[1] ^ 1'b0;
assign e[6] = a[5] ^ a[3] ^ a[0] ^ 1'b0;
assign e[5] = a[7] ^ a[4] ^ a[2] ^ 1'b0;
assign e[4] = a[6] ^ a[3] ^ a[1] ^ 1'b0;
assign e[3] = a[5] ^ a[2] ^ a[0] ^ 1'b0;
assign e[2] = a[7] ^ a[4] ^ a[1] ^ 1'b1;
assign e[1] = a[6] ^ a[3] ^ a[0] ^ 1'b0;
assign e[0] = a[7] ^ a[5] ^ a[2] ^ 1'b1;
endmodule
inv_round
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 11:41:10 02/12/2016
// Design Name:
// Module Name: inv_round
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_round(roundin,key,key1,roundout);
input[127:0] key,key1;
wire [127 :0]preout,subout,shiftout,mixcolout;
inv_subbyte q2(preout,subout);
assign roundout=mixcolout^key1;
endmodule
inv_round1
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 11:44:50 02/12/2016
// Design Name:
// Module Name: inv_round1
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_round1(roundin,key,roundout);
input[127:0] key;
wire [127 :0]subout,shiftout,mixcolout;
inv_subbyte q2(roundin,subout);
assign roundout=mixcolout^key;
endmodule
inv_round3
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 11:46:14 02/12/2016
// Design Name:
// Module Name: inv_round3
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_round3(roundin,key,roundout);
input[127:0] key;
wire [127 :0]subout,shiftout;
inv_subbyte q2(roundin,subout);
assign roundout=shiftout^key;
endmodule
inv_sbox
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:47:21 08/24/2015
// Design Name:
// Module Name: inv_sbox
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_sbox(
input [7:0] a,
output [7:0] b
);
wire [3 : 0]is1,is2,x,sq1,sc1,fm1,x1,x2,ifm1,fm2,fm3;
wire [7 : 0]is0;
wire [7 : 0]f,f1 ;
invaffine_isomarphic sb0 (a,is0);
assign is1 = is0[7 : 4];

assign is2 = is0[3: 0];
fourbitxor sb1 (is2,is1,x);

squarer sb2 (is1,sq1);
scaler sb3 (sq1,sc1);
gf_4 sb4 (is2,x,fm1);
fourbitxor sb5 (fm1,sc1,x1);
inversegf_4 sb6 (x1,ifm1);
gf_4 sb7 (ifm1,is1,fm2);
gf_4 sb8 (x,ifm1,fm3);
assign f = {fm2,fm3};
inverse_isomapping sb9 (f,b);
endmodule
inv_shiftrow
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 18:04:23 08/21/2012
// Design Name:
// Module Name: inv_shiftrow
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_shiftrow(a,c);
input [127:0]a;
output [127:0]c;
wire [7:0] p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15;
assign p15=a[7:0];
assign p11=a[39:32];
assign p7=a[71:64];
assign p14=a[15:8];
assign p6=a[79:72];
assign p2=a[111:104];
assign p9=a[55:48];
assign p5=a[87:80];
assign p1=a[119:112];
assign p8=a[63:56];
assign p4=a[95:88];
assign p0=a[127:120];
assign c={p0,p13,p10,p7,p4,p1,p14,p11,p8,p5,p2,p15,p12,p9,p6,p3};
endmodule
inv_subbyte
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 16:35:10 08/21/2012
// Design Name:
// Module Name: inv_subbyte
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inv_subbyte(invsubin,invsubout);
input [127:0] invsubin;
output [127:0]invsubout;
wire [7:0]
,p11,p12,p13,p14,p15;
inv_sbox w1(a0,p0);
inv_sbox w2(a1,p1);
inv_sbox w3(a2,p2);
inv_sbox w4(a3,p3);
inv_sbox w5(a4,p4);
inv_sbox w6(a5,p5);
inv_sbox w7(a6,p6);
inv_sbox w8(a7,p7);
inv_sbox w9(a8,p8);
inv_sbox w10(a9,p9);
assign invsubout={p15,p14,p13,p12,p11,p10,p9,p8,p7,p6,p5,p4,p3,p2,p1,p0};
endmodule
invaffine_isomarphic
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 07:11:14 02/05/2015
// Design Name:
// Module Name: invaffine_isomarphic
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module invaffine_isomarphic(
input [7:0] a,
output [7:0] c
);
wire [7:0] c1;

inv_affine_trans w1(a,c1);
isomarphic_mapping w2(c1,c);
endmodule
invcoltransform
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 16:47:11 08/21/2012
// Design Name:
// Module Name: invcoltransform
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module invcoltransform(a,b,c,d, f);
input [7:0] a;
input [7:0] b;
input [7:0] c;
input [7:0] d;
output reg [7:0] f;
reg [7:0]x0,x1,x2,x3;
reg [7:0]y0,y1,y2,y3;
reg [7:0]z0,z1,z2,z3;
reg [7:0]sub0,sub1,sub2,sub3;
always@(*)
begin
if(a[7] == 1)
begin
x0 = {a[6 : 0] , 1'b0} ^ 8'h1b;
end
else
begin
x0 = {a[6 : 0] , 1'b0};
end
if(x0[7] ==1)
begin
y0 = {x0[6 : 0] , 1'b0} ^ 8'h1b;
end
else
begin
y0 = {x0[6 : 0] , 1'b0};
end
if(y0[7] ==1)
begin
z0 = {y0[6 : 0] , 1'b0} ^ 8'h1b;
end
else
begin
z0 = {y0[6 : 0] , 1'b0};
end
sub0 = (x0 ^ y0 ^ z0);
if(b[7] ==1)
begin
x1 = {b[6 : 0] , 1'b0} ^ 8'h1b;
end
else
begin
x1 = {b[6 : 0] , 1'b0};
end
if(x1[7] ==1)
begin
y1 = {x1[6 : 0] , 1'b0}^ 8'h1b;
end
else
begin
y1 = {x1[6 : 0] , 1'b0};
end
if(y1[7] == 1)
begin
z1 = {y1[6 : 0] , 1'b0} ^ 8'h1b;
end
else
begin
z1 = {y1[6 : 0] , 1'b0};
end
sub1 = (x1 ^ z1 ^ b);
if(c[7] ==1)
begin
x2 = {c[6 : 0] , 1'b0} ^ 8'h1b;
end
else
begin
x2 = {c[6 : 0] , 1'b0};
end
if(x2[7] ==1)
begin
y2 = {x2[6 : 0] , 1'b0} ^ 8'h1b;
end
else begin
y2 = {x2[6 : 0] , 1'b0};
end
if(y2[7] ==1)
begin
z2 = {y2[6 : 0] , 1'b0} ^ 8'h1b;
end
else begin
z2 = {y2[6 : 0] , 1'b0};
end
sub2 = (y2 ^ z2 ^ c);
if(d[7] ==1)
begin
x3 = {d[6 : 0] , 1'b0} ^ 8'h1b;
end
else begin
x3 = {d[6 : 0] , 1'b0};
end
if(x3[7] == 1)
begin
y3 = {x3[6 : 0] , 1'b0} ^ 8'h1b;
end
else begin
y3 = {x3[6 : 0] , 1'b0};
end
if(y3[7] == 1)
begin
z3 = {y3[6 : 0] , 1'b0} ^ 8'h1b;
end
else begin
z3 = {y3[6 : 0] , 1'b0};
end
sub3 = (z3 ^ d);
f = sub0 ^ sub1 ^ sub2 ^ sub3;

end
endmodule
inverse_isomapping
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 02:06:02 02/03/2015
// Design Name:
// Module Name: inverse_isomapping
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inverse_isomapping(
input [7:0] a,
output [7:0] c
);
assign c[7] =a[7] ^ a[6] ^ a[5] ^ a[1];
assign c[6] = a[6] ^ a[2];
assign c[5] = a[6] ^ a[5] ^ a[1];
assign c[4] = a[6] ^ a[5] ^ a[4] ^ a[2] ^ a[1];
assign c[3] = a[5] ^ a[4] ^ a[3] ^ a[2] ^ a[1];
assign c[2] = a[7] ^ a[4] ^ a[3] ^ a[2] ^ a[1];
assign c[1] = a[5] ^ a[4];
assign c[0] = a[6] ^ a[5] ^ a[4] ^ a[2] ^ a[0];
endmodule
inversegf_4
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 01:17:20 02/03/2015
// Design Name:
// Module Name: inversegf_4
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module inversegf_4(
input [3:0] a,
output [3:0]q);
assign q[3]=(a[3] ^ (a[3] & a[2] & a[1]) ^ (a[3] & a[0])^ a[2]);
assign q[2]=(a[3] & a[2] & a[1]) ^ (a[3] & a[2] & a[0]) ^(a[3] & a[0]) ^ a[2] ^ (a[2] &
a[1]) ;
assign q[1]=(a[3] & a[2] & a[1]) ^ (a[3] & a[1] & a[0]) ^(a[2] & a[0]) ^ (a[2] ^ a[3] ^
a[1]);
assign q[0]=(a[3] & a[2] & a[1]) ^ (a[3] & a[2] & a[0]) ^(a[3] & a[1]) ^ (a[3] & a[1] &
a[0]) ^ ((a[3] & a[0]) ^ a[2]) ^ (a[2] & a[1]) ^ (a[2] & a[1] & a[0]) ^ a[1] ^ a[0];
endmodule
invmix
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 17:59:33 08/21/2012
// Design Name:
// Module Name: invmix
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module invmix(a,invmixout);
input [127:0] a;
output [127:0] invmixout;
wire [7:0]p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15;
invcoltransform z1(a[127:120],a[119:112],a[111:104],a[103:96],p0);
assign invmixout={p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15};
endmodule
isomarphic_mapping
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 17:15:28 02/02/2015
// Design Name:
// Module Name: isomarphic_mapping
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
///////////////////////////////////////////
module isomarphic_mapping(din,n);
input [7:0]din;
output [7:0] n;
assign n[7] = din[7] ^ din[5];
assign n[6] = din[7] ^ din[6] ^ din[4] ^ din[3] ^ din[2]^ din[1];
assign n[5] = din[7] ^ din[5] ^ din[3] ^ din[2];
assign n[4] = din[7] ^ din[5] ^ din[3] ^ din[2]^ din[1];
assign n[3] = din[7] ^ din[6] ^ din[2]^ din[1];
assign n[2] = din[7] ^ din[4] ^ din[3] ^ din[2]^ din[1];
assign n[1] = din[6] ^ din[4]^ din[1];
assign n[0] = din[6] ^ din[1]^ din[0];
endmodule
key
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 10:18:06 08/16/2012
// Design Name:
// Module Name: key
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module key(a,rcon,b);
input [127:0]a;
input [7:0]rcon;
output [127 :0]b;
wire [31:0]k0,k1,k2,k3;
wire [31:0]w0,w1,w2,w3;
wire[31:0]x0,x1,x2,x3;
wire [31:0]temp,rcon_xor;
wire [31:0] m,shift;
assign k3={a[31:24],a[23:16],a[15:8],a[7:0]};
assign k2={a[63:56],a[55:48],a[47:40],a[39:32]};
assign k1={a[95:88],a[87:80],a[79:72],a[71:64]};
assign k0={a[127:120],a[119:112],a[111:104],a[103:96]};
assign temp=k3;
assign w0=k0;
assign w1=k1;
assign w2=k2;
assign w3=k3;
assign shift={temp[23:16],temp[15:8],temp[7 :0],temp[31:24]};
sbox_1 z1(shift[7 :0],m[7 :0]);

sbox_1 z2(shift[15 :8],m[15 :8]);
sbox_1 z3(shift[23 :16],m[23 :16]);
sbox_1 z4(shift[31 :24],m[31 :24]);
assign rcon_xor={m[31:24]^rcon,m[23 :16],m[15 :8],m[7 :0]};
assign x0={rcon_xor[31:24] ^ w0[31:24],rcon_xor[23 :16] ^ w0[23 :16],rcon_xor[15 :8] ^

w0[15 :8],rcon_xor[7 :0] ^ w0[7 :0]};
assign x1={(x0[31:24] ^ w1[31:24]),(x0[23 :16] ^ w1[23 :16]),(x0[15 :8] ^ w1[15
:8]),(x0[7 :0] ^ w1[7 :0])};
assign x2={(x1[31:24] ^ w2[31:24]),(x1[23 :16] ^ w2[23 :16]),(x1[15 :8] ^ w2[15
:8]),(x1[7 :0] ^ w2[7 :0])};
assign x3={(x2[31:24] ^ w3[31:24]),(x2[23 :16] ^ w3[23 :16]),(x2[15 :8] ^ w3[15
:8]),(x2[7 :0] ^ w3[7 :0])};
assign b={x0,x1,x2,x3};
endmodule
key1
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:29:33 02/11/2016
// Design Name:
// Module Name: key1
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module key1(a,rcon,b);
input [127:0]a;
input [7:0]rcon;
output [127 :0]b;
wire [31:0]k0,k1,k2,k3;
wire [31:0]w0,w1,w2,w3;
wire[31:0]x0,x1,x2,x3;
wire [31:0]temp,rcon_xor;
wire [31:0] m,shift;
assign k3={a[31:24],a[23:16],a[15:8],a[7:0]};
assign k2={a[63:56],a[55:48],a[47:40],a[39:32]};
assign k1={a[95:88],a[87:80],a[79:72],a[71:64]};
assign k0={a[127:120],a[119:112],a[111:104],a[103:96]};
assign temp=k3;
assign w0=k0;
assign w1=k1;
assign w2=k2;
assign w3=k3;
assign shift={temp[23:16],temp[15:8],temp[7 :0],temp[31:24]};
esbox z1(shift[7 :0],m[7 :0]);

esbox z2(shift[15 :8],m[15 :8]);
esbox z3(shift[23 :16],m[23 :16]);
esbox z4(shift[31 :24],m[31 :24]);
assign rcon_xor={m[31:24]^rcon,m[23 :16],m[15 :8],m[7 :0]};
assign x0={rcon_xor[31:24] ^ w0[31:24],rcon_xor[23 :16] ^ w0[23 :16],rcon_xor[15 :8] ^

w0[15 :8],rcon_xor[7 :0] ^ w0[7 :0]};
assign x1={(x0[31:24] ^ w1[31:24]),(x0[23 :16] ^ w1[23 :16]),(x0[15 :8] ^ w1[15
:8]),(x0[7 :0] ^ w1[7 :0])};
assign x2={(x1[31:24] ^ w2[31:24]),(x1[23 :16] ^ w2[23 :16]),(x1[15 :8] ^ w2[15
:8]),(x1[7 :0] ^ w2[7 :0])};
assign x3={(x2[31:24] ^ w3[31:24]),(x2[23 :16] ^ w3[23 :16]),(x2[15 :8] ^ w3[15
:8]),(x2[7 :0] ^ w3[7 :0])};
assign b={x0,x1,x2,x3};
endmodule
mix
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 13:20:18 03/19/2015
// Design Name:
// Module Name: mix
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module mix(a,b,c,d, f);
input [7:0] a;
input [7:0] b;
input [7:0] c;
input [7:0] d;
output reg [7:0] f;
reg [7:0]m,n;
always@(*)
begin
if(a[7] == 1)
begin
m <= ({a[6 : 0] , 1'b0} ^ 8'b00011011);
end
else begin
m <= {a[6 : 0] , 1'b0};
end
if(b[7] == 1)
begin
n <= (({b[6 : 0] ,1'b0} ^ 8'b00011011) ^ b);
end
else begin
n <= ({b[6 : 0] , 1'b0} ^ b);
end
f<= (m ^ n) ^ (c ^ d);
end
endmodule
mixcolumns
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:28:32 08/13/2012
// Design Name:
// Module Name: mixcoloums
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module mixcoloums(a,mixout);
input [127:0] a;
output [127:0] mixout;

wire [7:0]p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15;
mix z1(a[127:120],a[119:112],a[111:104],a[103:96],p0);
mix z2(a[119:112],a[111:104],a[103:96],a[127:120],p1);
mix z3(a[111:104],a[103:96],a[127:120],a[119:112],p2);
mix z4(a[103:96],a[127:120],a[119:112],a[111:104],p3);
mix z5(a[95:88],a[87:80],a[79:72],a[71:64],p4);
mix z6(a[87:80],a[79:72],a[71:64],a[95:88],p5);
mix z7(a[79:72],a[71:64],a[95:88],a[87:80],p6);
mix z8(a[71:64],a[95:88],a[87:80],a[79:72],p7);
mix z9(a[63:56],a[55:48],a[47:40],a[39:32],p8);
mix z10(a[55:48],a[47:40],a[39:32],a[63:56],p9);
mix z11(a[47:40],a[39:32],a[63:56],a[55:48],p10);
mix z12(a[39:32],a[63:56],a[55:48],a[47:40],p11);
mix z13(a[31:24],a[23:16],a[15:8],a[7:0],p12);
mix z14(a[23:16],a[15:8],a[7:0],a[31:24],p13);
mix z15(a[15:8],a[7:0],a[31:24],a[23:16],p14);
mix z16(a[7:0],a[31:24],a[23:16],a[15:8],p15);
assign mixout={p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15};
endmodule
pdecryption
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:58:55 02/12/2016
// Design Name:
// Module Name: pdecryption
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module pdecryption(aesin,keyin,clk,rst,keyout,aesout);
input clk,rst;

wire [127:0] k1,k2,k3,k4,k5,k6,k7,k8,k9,k10;
key q1(keyin,8'h01,k1);
key q2(k1,8'h02,k2);
key q3(k2,8'h04,k3);
key q4(k3,8'h08,k4);
key q5(k4,8'h10,k5);
key q6(k5,8'h20,k6);
key q7(k6,8'h40,k7);
key q8(k7,8'h80,k8);
key q9(k8,8'h1b,k9);
key q10(k9,8'h36,k10);
inv_round u1(aesin,k10,invk1,r1);
inv_round1 u2(rr1,invk2,r2);
inv_round3 u10(rr9,keyin,rr10);
reg2 z10(rr10,clk,rst,aesout);
endmodule
preroundoperaton
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 10:13:15 08/16/2012
// Design Name:
// Module Name: preroundoperaton
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module preroundoperaton(a,b,c);
input [127:0] a;
input [127:0] b;
output [127:0] c;
assign c = a^b;
endmodule
reg2
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 12:25:23 02/12/2016
// Design Name:
// Module Name: reg2
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module reg2(
input [127:0] a,
input clk,input rst,
output reg [127:0] c
);
always@(posedge clk,posedge rst)
begin
if(rst==1)
begin
c<=8'h0;
end
else
begin
c<=a;
end
end
endmodule
reg3
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 06:00:27 02/05/2015
// Design Name:
// Module Name: reg3
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module reg3(
input [127:0] a,b,
input clk,input rst,
output reg [127:0] c,d
);
always@(posedge clk,posedge rst)
begin
if(rst==1)
begin
c<=8'h0;
d<=8'h0;
end
else
begin
c<=a;
d<=b;
end
end
endmodule
round
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 13:45:03 08/16/2012
// Design Name:
// Module Name: round
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module round(roundin,key,rcon,roundkeyout,roundout);
input[127:0] key;
input[7:0] rcon;
subbyte q2(preout,subout);
key q5(key,rcon,keyout);
endmodule
round1
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 17:45:30 08/16/2012
// Design Name:
// Module Name: round1
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module round1(roundin,key,rcon,roundkeyout,roundout);
input[127:0] key;
input[7:0] rcon;
subbyte q2(roundin,subout);
endmodule
round2
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 18:02:33 08/16/2012
// Design Name:
// Module Name: round2
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module round2(roundin,key,rcon,roundkeyout,roundout);
input[127:0] key;
input[7:0] rcon;
wire [127 :0]preout,subout,shiftout,keyout;
subbyte q2(roundin,subout);
assign roundout=shiftout^keyout;
endmodule
sbox_1
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 11:35:13 02/03/2015
// Design Name:
// Module Name: sbox_1
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module sbox_1(
input [7:0] a,
output [7:0] b
);
wire [3 : 0]is1,is2,x,sq1,sc1,fm1,x1,x2,ifm1,fm2,fm3;
wire [7 : 0]is0;
wire [7 : 0]f,f1 ;
isomarphic_mapping sb0 (a,is0);
assign is1 = is0[7 : 4];

assign is2 = is0[3: 0];
fourbitxor sb1 (is2,is1,x);

squarer sb2 (is1,sq1);
scaler sb3 (sq1,sc1);
gf_4 sb4 (is2,x,fm1);
fourbitxor sb5 (fm1,sc1,x1);
inversegf_4 sb6 (x1,ifm1);
gf_4 sb7 (ifm1,is1,fm2);
gf_4 sb8 (x,ifm1,fm3);
assign f = {fm2,fm3};
inv_affine sb9 (f,b);
endmodule
scaler
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 00:42:18 02/03/2015
// Design Name:
// Module Name: scaler
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module scaler(
input [3:0] scalerin,
output [3:0] scalerout
);
assign scalerout[0] = scalerin[2];
assign scalerout[1] = scalerin[3];
assign scalerout[2] = scalerin[3] ^ scalerin[2] ^ scalerin[1] ^ scalerin[0];
assign scalerout[3] = scalerin[2] ^ scalerin[0];
endmodule
shiftrows
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:55:38 08/13/2012
// Design Name:
// Module Name: shiftrows
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module shiftrows(a,c);
input [127:0]a;
output [127:0]c;
wire [7:0] p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15;
assign p15=a[7:0];
assign p7=a[71:64];
assign p14=a[15:8];
assign p6=a[79:72];
assign p2=a[111:104];
assign p9=a[55:48];
assign p5=a[87:80];
assign p1=a[119:112];
assign p8=a[63:56];
assign p4=a[95:88];
assign p0=a[127:120];
assign c={p0,p5,p10,p15,p4,p9,p14,p3,p8,p13,p2,p7,p12,p1,p6,p11};
endmodule
squarer
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 00:33:59 02/03/2015
// Design Name:
// Module Name: squarer
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module squarer(
input [3:0] squarerin,
output [3:0] squarerout
);
assign squarerout[0] = (squarerin[3] ^ squarerin[1]) ^ squarerin[0];
assign squarerout[1] = squarerin[2] ^ squarerin[1];
assign squarerout[2] = squarerin[3] ^ squarerin[2];
assign squarerout[3] = squarerin[3];
endmodule
subbyte
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 16:24:15 08/13/2012
// Design Name:
// Module Name: subbyte
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module subbyte(subin,subout);
input [127:0] subin;
output [127:0]subout;
wire [7:0]
,p11,p12,p13,p14,p15;
assign a0=subin[7:0];
sbox_1 w1(a0,p0);
sbox_1 w2(a1,p1);
sbox_1 w3(a2,p2);
sbox_1 w4(a3,p3);
sbox_1 w5(a4,p4);
sbox_1 w6(a5,p5);
sbox_1 w7(a6,p6);
sbox_1 w8(a7,p7);
sbox_1 w9(a8,p8);
sbox_1 w10(a9,p9);
sbox_1 w11(a10,p10);
sbox_1 w12(a11,p11);
sbox_1 w13(a12,p12);
sbox_1 w14(a13,p13);
sbox_1 w15(a14,p14);
sbox_1 w16(a15,p15);
assign subout={p15,p14,p13,p12,p11,p10,p9,p8,p7,p6,p5,p4,p3,p2,p1,p0};
endmodule
subbyte_p
//////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date: 15:25:11 07/22/2016
// Design Name:
// Module Name: subbyte_p
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
//
//////////////////////////////////////////////////////////////////////////////////
module subbyte_p(
input [127:0] subin,
output [127:0]subout);
wire [7:0]
,p11,p12,p13,p14,p15;
esbox w1(a0,p0);
esbox w2(a1,p1);
esbox w3(a2,p2);
esbox w4(a3,p3);
esbox w5(a4,p4);
esbox w6(a5,p5);
esbox w7(a6,p6);
esbox w8(a7,p7);
esbox w9(a8,p8);
esbox w10(a9,p9);
esbox w11(a10,p10);
esbox w12(a11,p11);
esbox w13(a12,p12);
esbox w14(a13,p13);
esbox w15(a14,p14);
esbox w16(a15,p15);
assign subout={p15,p14,p13,p12,p11,p10,p9,p8,p7,p6,p5,p4,p3,p2,p1,p0};
endmodule

Aes Final

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Aes Final

Hochgeladen von

Copyright:

Verfügbare Formate

CHAPTER 1

2.1 Modern Cryptography

2.2 Symmetric-key cryptography

Figure: Symmetric-key cryptography

2.3 Public-key cryptography

The two main uses for public-key cryptography are:

Figure: Public key cryptography

2.1.1 THE FIELD GF (28)

b7x7 + b6x6+ b5x5+ b4x4 + b3x3+ b2x2+ b1x+ b0 ------------------------------------ 2.1

2.1.2 FINITE FIELD ADDITION

Steps to get result of {57}  {8E} ≡ {D4}

(Polynomial notation) (x6 + x4 + x2 + x + 1) + (x7 + x + 1) = x7 + x6 + x4 + x2

(Binary notation) {01010111}  {10000011} = {11010100}

(Hexadecimal notation) {57}  {8E} = {D4}

2.1.3 FINITE FIELD MULTIPLICATION

(Where · is used to represent finite field multiplication):

This intermediate result is now divided by m(x) above:

Subtract to give the final remainder x7 + x6 + 1

The final result is x7 + x6 + 1 = {C1}

2.1.4 MULTIPLICATIVE INVERSE

2.1.5 POLYNOMIALS WITH COEFFICIENTS IN GF (28)

a(x) = a3x3+ a2x2 +a1x + a0 ----------------------------------------------------- 2.3

c(x)= c7x7 + c6x6+ c5x5+ c4x4 + c3x3+ c2x2+ c1x+ c0 ---------------------------------2.6

and  representing finite field multiplication and addition (XOR) respectively

2.2 THE STATE

Figure 2.1 Matrix form of state

s[r, c] = in[r + 4c] for 0 ε r < 4 and 0 ε c < Nb, ---------------------------------2.8

2.2.1 THE STATE AS AN ARRAY OF COLUMNS

w0 = s0,0 s1,0 s2,0 s3,0 w2 = s0,2 s1,2 s2,2 s3,2

w1 = s0,1 s1,1 s2,1 s3,1 w3 = s0,3 s1,3 s2,3 s3,3. -----------------------------------------------2.10

2.2.2 STANDARD ALGORITHM SPECIFICATION

1) Byte substitution using a substitution table (S-box)

2) Shifting rows of the State array by different offsets

3) Mixing the data within each column of the State array

4) Adding a Round Key to the State

These transformations (and their inverses) are described in following sections

2.3.1 SUBBYTES() TRANSFORMATION

2. Apply the following affine transformation (over GF(28),):

Figure 2.3 : Affine Transformation of S-BOX

2.3.2 SHIFTROWS() TRANSFORMATION

shift(1,4) =1; shift(2,4) = 2 ; shift(3,4) = 3 .

Figure below illustrates the ShiftRows() transformation

2.3.3 MIXCOLUMNS () TRANSFORMATION

The above equation can be described in the matrix form as below

Figure below illustrates the MixColumns() transformation.

Figure 2.7: Mixcolumn() Transformation

2.3.4 ADD ROUND KEY TRANSFORMATION

AES IMPLEMENTATION USING FULLYPIPELINED ARCHITECTURE:

2.3.5 KEY SCHEDULE

 The Cipher Key is expanded into an Expanded Key.

2.3.6 KEY EXPANSION

2.3.7 ROUND KEY SELECTION

2.4 INVERSE CIPHER

2.4.1 INVSHIFTROWS () TRANSFORMATION

Specifically, the InvShiftRows() transformation proceeds as follows:

Figure 2.10 illustrates the InvShiftRows() transformation.

2.4.2 INVSUBBYTES () TRANSFORMATION

Figure 2.11: Inverse S-BOX

2.4.3 INVMIXCOLUMNS () TRANSFORMATION

InvMixColumns() is the inverse of the MixColumns() transformation.

this can be written as a matrix multiplication. Let

Figure 1: AES Existing

Figure 2: AES Previous