Sie sind auf Seite 1von 46

Designing the Custom 8-bit THB14 Microcontroller

An EGEE320 Design Project

Trace Hill

Eugene Peyerk

Blake Dansfield

Dylan Brown

Team Honey Badgers


EGEE320, LSSU Eng.
Sault Ste. Marie, U.S.A
thill10@lssu.edu

Team Honey Badgers


EGEE320, LSSU Eng.
Sault Ste. Marie, U.S.A
epeyerk@lssu.edu

Team Honey Badgers


EGEE320, LSSU Eng.
Sault Ste. Marie, U.S.A
bdansfield@lssu.edu

Team Honey Badgers


EGEE320, LSSU Eng.
Sault Ste. Marie, U.S.A
dbrown19@lssu.edu

Abstract Team Honey Badgers decided to design,


implement, and validate a simple 8-bit microcontroller (MCU)
for our EGEE320 design project. The MCU was implemented
using VHDL within Altera Quartus II software on an Altera
Cyclone II FPGA, which was embedded in both the Altera DE1
and DE2 development boards the final design was synthesized
onto. The project was performed in three phases. Phase I
entailed research and initial design, Phase II entailed the
implementation of sub-units, and Phase III entailed the
implementation of the Control Matrix of the system. The result
of this project was a simple 8-bit RISC MCU with a Harvard
memory architecture and 16 instructions capable of reading and
writing from/to memory, performing basic arithmetic operations,
and executing simple conditional branches. As a result of this
project, the members of Team Honey Badgers have attained a
stronger understanding of the digital design principles covered in
EGEE320 as well as an increased affinity for team project soft
skills.
Keywords THB14, microcontroller, MCU, Control Matrix,
Reduced Instruction Set, RISC, Harvard architecture, CPU,
VHDL, Cyclone II FPGA, PLD

I. INTRODUCTION
Modern microcontrollers (MCUs) are the pinnacle of
digital circuit design principles. Serving as the bridge
between the realms of hardware and software, MCUs are used
extensively in our world. As a result, the power, speed, and
cost of MCUs undergo aggressive evolution to keep up with
ever-increasing software demands. A particular specialization
in the Electrical and Computer engineering field focuses on
the design and improvement of MCU architecture by
balancing factors such as power, speed, and cost to produce a
vast number of different variations suited for different
computational purposes.
Due to the prevalence of MCUs and the relevance of their
architecture to topics covered in our coursework, our team has
decided to design a simple MCU for our Digital Design final
project. This MCU will, be definition of the project, be
implemented using VHDL and synthesized onto an Altera
Quartus II Field programmable Gate Array (FPGA). This
paper will provide an overview of the specifications, design,
testing, and the final capabilities of our MCU, called the
THB14.

A. Assumptions
It is assumed that the reader has a basic understanding of both
synchronous and asynchronous digital design principles,
including, in particular, finite state machine design and
arithmetic logic circuits.
II. BACKGROUND
The study of MCU architecture is a specialization in the
area of Electrical and Computer Engineering by itself.
However, all MCUs share some similar architectural aspects,
whether that architecture is implemented in VLSI, MSI, or
through a PLD (an Altera Quartus II FPGA in our case). The
following sections will provide an introduction to how a
simple MCU works and attempt to explain some key
characteristics used to classify them.
A. High-Level Overview
An MCU, in its most basic form, is a collection of
synchronous registers and memory modules connected to a
common bus. The reading and writing privileges of each
module from/to that bus are controlled by a finite state
machine that coordinates the activation and de-activation
of each modules tri-state buffers. As a naming
convention, this finite state machine is called the Control
Matrix. Every state of the Control Matrix facilities a
single transfer of data over the bus from one module to
another (the exception being accesses to memory, etc.).
An MCU instruction requires multiple states to perform
and thus multiple stages of control signaling. More
information on the Control Matrix and the technical details
of its operation can be found in the following section.
A Random Access Memory (RAM) and an Arithmetic
Logic Unit (ALU) are the centerpieces of the MCU and are
interfaced with the bus with data registers. As a product of
these connections, data from RAM can be transferred to
registers where it can then be operated upon through the
ALU and finally stored back into other registers or back to
memory.
Fig. 1 shows a schematic for a simple MCU. The
following sections will make reference to this schematic
and a larger version can be found in Appendix A.

As can be noticed in the Fig. 3, the Control Matrix is a Mealy


machine, and uses an opcode (OPCODE) along with the
current CPU state (S_CURR) to determine which control bits
to send to the CPU (CTRL_WRD). The opcode is an
acronym of operation code and is an encoding of an
instruction. For example, an instruction such as Load A
might have the opcode 0x2F while another instruction like
Load B might have 0x3C.

Fig. 1 A Simple MCU Architecture

B. Control Matrix
The control matrix is a finite state machine that coordinates
which modules on the central bus send and receive data. Fig.
2 shows an example, highlighted in red, of one movement of
data required for a Load A instruction. In this example, data
in RAM is outputted to the bus by writing a 1 to DMO
(Data Memory Output). At the same time Register A is
configured to load data from the bus by writing a 1 to AL
(A Load). Each clock cycle (i.e. state in the finite state
machine) will trigger such an operation. Only one module can
output to the bus at a time. The set of all MCU control bits is
referred to as the control word, and it is the output of the
control matrix.

Fig. 2 DMO (Data RAM Out) and AL (Load A) are activated, faciliatng
tranfer ofdata from Data RAM to Register A

The internal architecture of the control matrix can be seen in


Fig. 3. As can be seen, the circuit is essentially a finite state
machine with the Opcode Decoder functioning as both the
Next State logic and the Output logic. The Opcode
Decoder is asynchronous.

Fig. 3 Generalized Architecture of a Control Matrix


The number of states (kept track of in State Memory)
required depends on the instruction (OPCODE) being
executed. Some instructions, such as NOP, will take fewer
states to complete than a Load instruction, which requires
states for accessing memory and then transferring that
memory into a register. This is why a form of Next State
Logic is necessary, and a simple counter does not suffice.
Given this information, there are two prevalent ways to design
the Opcode Decoder of the control matrix, which are
detailed below.
1) Hard-Wired Control Matrix
In this method, the control word is produced
asynchronously. This can be done by developing minimal
boolean expressions for each control bit in the control
word in terms of the Z-Flag, OPCODE, and S_CURR.
Fig. 4 shows an abstraction of an FPGAs implementation
of a hard-wired control matrix, which consists of a series
of multiplexers (one for each control bit).

Von Neumann architecture is the loss of throughput due to


the same bus having to transfer both instructions and data.
D. Instruction Set
1) Reduced Instruction Set (RISC)
RISC, which stands for Reduced Instruction-Set
CPU, is a type of CPU that computes using very few clock
cycles per instruction, but has a lot of instructions in a
typical program. In this case, in order for the instructions
to take so few clock cycles, each instruction literally only
does one operation. This type of CPU is used in
applications where speed is valued.
2) Complex Instruction Set (CISC)
CISC, which stands for Complex Instruction-Set
CPU, is a type of CPU that computes using a lot of clock
cycles per instruction, and has very few instructions in a
typical program. This type of CPU has literally hundreds
of different complex instructions, and each instruction can
do multiple operations. CISC is able to do more than a
RISC, although the pitfall is that it is slightly slower.
III. PROCEDURE

Fig. 4 FPGA Realization of the Otput Logic module of the Contol Matrix

2) Microprogrammed Control Matrix


In this method, the control word is actually stored in a
form of Read-Only Memory (ROM). The finite state
machine then uses the opcode and current state of the
machine to generate an address, which is used to index
into the ROM. The control word is then outputted directly
from ROM to the rest of the MCU.
C. Memory Architecture
1) Harvard Architecture
Harvard Architecture is a MCU memory architecture in
which the instruction data (i.e. the users program) is kept
in a separate memory module and accessed via separate
busses than the memory where live data is manipulated.
This architecture is primarily used in digital signal
processing applications, because it allows instructions to
be processed at the same time data is being stored.
However, the throughput increase is the only major
advantage of implementing a Harvard architecture as both
design difficulty and cost increase as a result.
2) Von Nuemann Architecture
Von Neumann Architecture combines both instruction data
and live data into the same memory device, and hence they
must be accessed one at a time. The Von Neumann
architecture is the most common architecture, especially
for basic MCUs like ours. The only disadvantage of the

As mandated by project requirements, the following design


will be synthesized onto an Altera Cyclone II Field
Programmable Gate Array (FPGA). The Development Board
we will be using will be the Altera DE1 and Altera DE2. The
design will be described using Very High-Speed Description
Language (VHDL) and synthesized using Altera Quartus II
software and tested using the ModelSim-Altera software suite.
The design, implementation, and testing of our MCU was
performed in three phases, which are detailed individually in
the following sections.
A. Phase I: Gravedigger
Phase I, affectionately known as the Gravedigger phase,
entailed extensive research and brainstorming followed by the
creation of an initial design that would serve as the basis for
future alterations. Two variants of the design were created, one
with a Harvard memory configuration (see Fig. A-1) and the
second with a Von Neumann memory configuration (see Fig.
A-2). Both MCU designs shared the following core
specifications, which stayed consistent through the life of the
design:
8-Bit Common Bus and ALU Datapath
8-bit ALU capable of the following operations:
o Addition
o Subtraction
o Bitwise-XOR
ALU Flags for status of operations result: Carry (C),
Zero (Z), Overflow (O), and Negative (N).
Psuedo-RISC Instruction Set capable of:
o Transferring Data between RAM and MCU
registers (A,B)

o
o

Performing each of the ALU Operation that


store the result in the Result register.
Simple branching based on the ALU Z Flag.

This design underwent numerous revisions as further research


and consultations were performed. Four critical design
decisions were made after a Design Review with Dr. Paul
Weber on November 20th, 2014. Further detail about these
decisions can be found in Team Honey Badgers Design
Review Document [1]. At this point in the project, the
following design decisions were made:
Flip-Flop register implementation was chosen over
Gated Latch implementation due to the potential for
precise timing analysis made possible with the flipflop implementation.
A Von Neumann memory architecture was chosen
over a Harvard memory architecture due to design
ease1.
The hard-wired Control Matrix realization was
chosen over the microprogrammed control matrix
due to design simplicity and its stricter relevance to
course objectives.
B. Phase II: Resurrection
Phase II saw implementation and simulation of the individual
modules of the MCU. Prior to implementation, two key
factors relating to synthesis were tested on a small scale. The
results of those brief tests are detailed in the following
sections.
1) Tri-State buffer synthesis interpretation
The Cyclone II FPGA does not have tri-state buffers
located inside every logic element (LE). To circumvent
this, Quartus IIs compilation tool converts tri-state
buffered signals into a selector. This implementation can
be imagined as a large multiplexer that controls one bus
line. All tri-state control signals (from every tri-state
buffer connected to that bus line) are grouped together to
form the select inputs of the MUX and each tri-state
input from every module is put on an input to that MUX.
Therefore, a unique combination on tri-state control
signals yield a selection of the proper input from the
MUX.
To verify this functionality, a small circuit seen in Fig. 6
was entered into a Block Diagram File (BDF) and
compiled. Because the Input-Output (IOE) blocks within
the Cyclone II do have tri-state buffers, the AND gate seen
in the schematic prevents the compiler from using the IOE
tri-state buffers.

This decision was later overturned due to issues


implementing a RAM module on the Quartus II FPGA.

Fig.6 Schematic of circuit to test the conversion of tri-state buffers to


selectors during compilation.

To view how the Quartus II compiler interpreted the circuit


seen in Fig. 6, the Technology Map was viewed. As can be
seen in Fig. 7, the compiler did indeed automatically
interpret the tri-state buffers enetered into the BDF as
selectors and implemented them as such.

Fig. 7 Quartus II Technology Map of circuit in Fig. 6. Tri-state buffers


were correctly compiled as asynchronous selectors.

Unfortunately, a VHDL description of a simple tri-state


buffer (i.e. one that does not require all 9 values of the
STD_LOGIC system) was not readily available.
Therefore, it was decided that any designs using tri-state
buffers would be performed in a BDF file and any subdesign or a design which does not use tri-state buffers
would be performed in VHDL.
2) Specifying Initial memory contents
The Altera DE1 Development Board initializes all designs
interpreted as memory to hold zeros upon download. This
is problematic, as our MCU requires a program that is
written in memory to operate upon. If memory could not
be initialized when downloaded to the FPGA, then a
method for writing the program to memory using some
combination of switches and pushbuttons would have to be
developed. Fortunately, Altera provides a means to write
initial contents to memory structures upon download using
a Memory Initialization File (MIF)and linking that file
with a VHDL memory structure. [2]
Specifying initial memory module contents is a
complicated task to perform in user-generated code; it is
easier and less error-prone to use an Altera IP
megafunction.
At this point in time, our team had decided to use Von
Neumann memory architecture (one RAM block used for
both instructions and data). However, the Altera IP
megafunction for a 1-port RAM module (required for a

Von Neumann memory setup) did not compile onto the


Cyclone II FPGA. However, the 1-port Read-Only
Memory (ROM) megafunction did compile successfully.
Therefore, our team decided to overturn our prior design
decision regarding memory architecture and use two
memory modules (Harvard Architecture). These memory
modules are implemented as follows:
Instruction Read-Only Memory (IROM)
o Implemented using an Altera IP
megafunction linked with a .mif file
(Memory Initialization File) into which
the program that the MCU will execute
is entered prior to programming.
Data Random Access Memory (DRAM)
o Implemented in VHDL code that was
adapted from a VHDL code supplied by
Dr. Weber.
C. Phase III: Heartbeat
Phase III involved the addition of the Control Matrix to the
THB14s design, giving the system a heartbeat. Aside from
small errors which were selected during troubleshooting, one
major issue was discovered relating to the Instruction ROM
module. This issue and its solution are detailed in the next
section.
1) Deleting IROM megafunction latched output
Another error that was encountered during the
Heartbeat phase of the project was how the ROM
accessed and outputted data. In the first iteration of the
ROM module both the input and output lines were
connected to a D flip-flop array (fig. 7). The error that was
being encountered was that the ROM would output one
memory location behind where it was currently pointing
to. This is noticeable during the startup sequence of the
THB14. When the THB14 starts it executes a NOP
instruction because the proper ROM memory location did
not have time to travel through the output D flip-flop array.

Fig.7 ROM with double latch

This is a problem for how the THB14 executes


branch instructions. If the branch is not taken then the PC
counter is incremented and the next opcode instruction is
loaded. However, when the branch was taken, the wrong
branch location would then be loaded into the PC counter.

This would often break the THB14 because when the new
PC counter value was loaded into ROM that location
would not be a valid opcode. The error was fixed by
removing the D flip-flop array on the output of the ROM
(Fig. 8). This error was not noticed for a long period of
time because all of the other instruction that accessed data
from ROM immediately after a new location was loaded
would load the data in the location that immediately
preceded the opcode.

Fig.8 ROM with single latch

IV. RESULTS
The final design of the THB14 consisted of an 8-bit RISC
MCU with a Harvard memory architecture complete with 16
instructions capable of reading and writing from/to memory,
performing basic arithmetic operations, and executing simple
conditional branch. The final design files used in the CPU can
be found in Appendix B, the final simulations can be found in
Appendix C, and the .do files used to produce those
simulations can be found in Appendix D.
A. Demonstration Program Analsysis
Table 1 is a sequential listing of the first program from the
final demonstration of our microcontroller. The idea was to
show the complete functionality of the device through the use
of specific instructions. Both the address and data columns are
in hexadecimal for readability.
The first four instructions are the basics of any
microcontroller. For the first instruction, register A is loaded
with the value $82 (130 decimal) immediately from the ROM
memory file. The $ indicates a number is in hexadecimal
format. The following instruction then stores this data in
register A to the RAM memory file, specifically at address $0.
The purpose of these instructions was to show that the
writing to memory aspect of our device is working properly.
The next two instructions do the exact same process with the
exception that register B is used instead of register A and the
address in the RAM is $1. The following instructions load $41
(65 decimal) immediately from the ROM file into both
registers A and B. The following instruction, SUBAB,
subtracts the value stored in register B from the value stored in
register A.

The purpose of this section of code it to show the functionality


of our zero flag from the CCR, and the subtraction capability
of the ALU as well. The next three instructions (beginning at
address $0D in Table 1) load the previously stored values from
RAM back into the A and B registers. These values ($82) are
then added together via the ALU. The purpose of this set of
instructions is to demonstrate the addition functionality of the
THB14 microcontroller, to show the functionality of the carry
and overflow flags, and to demonstrate the reading from
memory aspect of the device. Because $82 is a negative
number (MSB = 1) we are adding two negative numbers that
will overflow (exceeds 8-Bits or $FF) into a positive number.
The next three lines of instructions (beginning at address $12
in Table 1) load the values of $AF (%10101111) into register
A and $5F (%01011111) into register B.
These values are then XOR-ed together. The purpose of this
segment of instructions is to show the XOR functionality of
the THB14s ALU. The two numbers were chosen specifically
so that the result ($F0) was clear to see. The last five
instructions listed in the table above display the THB14s
ability to branch to another area in memory. First, register A is
loaded immediately from ROM with the value $64 (100
decimal). Next, register B is loaded immediately from ROM
with the value $0. Then these values are added together via the
ALU, and the sum from the ALU addition is stored
automatically in the RESULT register. The next instruction
(TRB) transfers the result from the previous addition to
register B. Since register A has been left untouched, it still
contains the value $64. The following branch instruction
jumps back up to memory location $1B, which is the
instruction to ADDAB. Thus, the end of the program
accumulates the value $64 (100 decimal) forever. The entire
program simulation can be seen in Appendix D.

Table 1 Final Demonstration Program

To demonstrate the rest of the functionality of our simple


microcontroller, two additional programs were written to show
a branch instruction branching when the Z flag is high and not
branching when the Z flag is low (Table 2). This is a
significant feature to our microcontroller since it can perform
checks after an ALU operation and branch to an appropriate
section of memory to perform another operation. In the two
figures below are the simulations of each of the two branching
programs that were written to test the branching functionality.

V. CONCLUSIONS
After reviewing those results and achieving the
correct result it was the biggest satisfaction to say that we
have created our own microcontroller. This project has given
us as a team a great deal of knowledge and skills on the topic
microcontrollers. We have learned a great deal regarding how
a microcontroller works and how modules within the
microcontroller communicate.
Even though this is a very basic microcontroller with
very few instructions, there are many improvements and
additions that could be made to make the THB14 more
powerful and more versatile. Some of these improvements
could be to add more functions to the ALU, add more
instructions to the Instruction Set, create multiple busses to
facilitate pipelining, etc..
Again, despite its simple design, the THB14 was
challenging to build. However to watch a project be built
from the ground up to a finish product it is very satisfying.
ACKNOWLEDGMENT
Team Honey Badgers would like to thank Dr. Paul Weber
for his oversight and assistance during the design and testing of
our MCU.
Table 2 Branch Demonstration

REFERENCES
The operation of branch taken program can be easily seen in
appendix D. The Z flag was set high by adding two zeros
together in the ALU. Since this would set the Z flag high,
when the branch if equal command (BEQ) was called the
LDAI $FE instruction was skipped and instead the LDAI $D5
was executed. This shows that the BEQ instruction was
executed properly. Next, the BEQ instruction is tested when
the Z flag is low (Appendix D). In this program the branch
was not taken because the Z flag was low. This can be seen
because LDAI $FE was executed first and then LDAI $D5
was executed. If the branch was taken then LDAI $D5 would
have been executed, and LDAI $FE would have been skipped
completely.
B. Timing Analysis of the ALU Design Unit
As requested, the worst case, best case, and typical
propagation delays for a particular sub-unit of the THB14
were calculated. The sub-unit chosen for this analysis was the
ALU, and the parameters used to analyze it were those of the
Cyclone II Look-Up Table (LUT) under commercial
environmental conditions.
The complete timing analysis of the ALU design unit can be
found in Appendix E and the results summarized in Table E-2.

[1]
[2]

[3]

T. Hill, B. Dansfield, E. Peyerk, D. Brown, Design and Analysis of an


8-bit MCU,unpublished
Altera, In-System Updating of Memory and Constants, http://ridl.cfd.r
it.edu/products/manuals/Altera/In-System%20Memory%20Content%
20Editor/qts_qii53012.pdf
Altera, Cyclone II Device Handbook, Volume 1, http://www.altera.
com/literature/hb/cyc2/cyc2_cii5v1.pdf

Appendix A: Non-Quartus Design Documents

Fig. A-1: Pre-Design Review MCU Architecture with Harvard memory configuration

Fig. A-2: Pre-Design Review MCU Architecture with Von Nuemann memory configuration

Fig. A-3: Post-Design Review MCU Architecture

THB14 Pinout for DE1 and DE2 Versions


Signal Name Input/Ouput

Module To/From

DE1 Name

DE1 Pin

DE2 Name

DE2 Pin

SYS_CLK_EN

MASTER_CTRL

SW[1]

PIN_L21

SW[1]

PIN_N26

CLR_REGS

MASTER_CTRL

KEY[0]

PIN_R22

KEY[0]

PIN_G26

CLK_SEL

CLK_CTRL

SW[0]

PIN_L22

SW[0]

PIN_N25

MAN_CLK

CLK_CTRL

KEY[1]

PIN_R21

KEY[1]

PIN_N23

SYS_CLK

CLK_Prescaler

CLOCK_50

PIN_L1

CLOCK_50

PIN_N2

NegBit

CCreg

LEDR[9]

PIN_R17

LEDR[17]

PIN_AD12

OVFWBit

CCreg

LEDR[8]

PIN_R18

LEDR[16]

PIN_AE12

ZeroBit

CCreg

LEDR[7]

PIN_U18

LEDR[15]

PIN_AE13

CarryBit
A_Seg

O
O

CCreg
REG_TO_7SEGS

LEDR[6]
HEX3 and

PIN_Y18
SEE

LEDR[14]
HEX7

PIN_AF13
SEE

HEX2

BELOW
SEE
BELOW
-

UB,HEX6 LB
HEX5
UB,HEX4 LB
HEX3
UB,HEX2 LB
HEX1
UB,HEX0 LB

BELOW
SEE
BELOW
SEE
BELOW
SEE
BELOW

B_Seg

REG_TO_7SEGS

Result_Seg

REG_TO_7SEGS

HEX1 and
HEX0

PC_Seg

REG_TO_7SEGS

T0_OUT

CONTROL_MATRIX

LEDG[7]

LEDG[7]

PIN_Y18

T1_OUT

CONTROL_MATRIX

LEDG[6]

LEDG[6]

PIN_AA20

T2_OUT

CONTROL_MATRIX

LEDG[5]

LEDG[5]

PIN_U17

T3_OUT

CONTROL_MATRIX

LEDG[4]

LEDG[4]

PIN_U18

T4_OUT

CONTROL_MATRIX

LEDG[3]

LEDG[3]

PIN_V18

T5_OUT

CONTROL_MATRIX

LEDG[2]

LEDG[2]

PIN_W19

T6_OUT

CONTROL_MATRIX

LEDG[1]

LEDG[1]

PIN_AF22

DE1 SEVEN SEGMENTS


HEX3

PIN

HEX2

PIN

HEX1

PIN

HEX0

PIN

HEX3[0]

PIN_F4

HEX2[0]

PIN_G5

HEX1[0]

PIN_E1

HEX0[0]

PIN_J2

HEX3[1]

PIN_D5

HEX2[1]

PIN_G6

HEX1[1]

PIN_H6

HEX0[1]

PIN_J1

HEX3[2]

PIN_D6

HEX2[2]

PIN_C2

HEX1[2]

PIN_H5

HEX0[2]

PIN_H2

HEX3[3]

PIN_J4

HEX2[3]

PIN_C1

HEX1[3]

PIN_H4

HEX0[3]

PIN_H1

HEX3[4]

PIN_L8

HEX2[4]

PIN_E3

HEX1[4]

PIN_G3

HEX0[4]

PIN_F2

HEX3[5]

PIN_F3

HEX2[5]

PIN_E4

HEX1[5]

PIN_D2

HEX0[5]

PIN_F1

HEX3[6]

PIN_D4

HEX2[6]

PIN_D3

HEX1[6]

PIN_D1

HEX0[6]

PIN_E2

DE2 SEVEN SEGMENTS


HEX7

PIN

HEX6

PIN

HEX5

PIN

HEX4

PIN

HEX7[0]
HEX7[1]
HEX7[2]
HEX7[3]
HEX7[4]
HEX7[5]
HEX7[6]

PIN_L3
PIN_L2
PIN_L9
PIN_L6
PIN_L7
PIN_P9
PIN_N9

HEX6[0]
HEX6[1]
HEX6[2]
HEX6[3]
HEX6[4]
HEX6[5]
HEX6[6]

PIN_R2
PIN_P4
PIN_P3
PIN_M2
PIN_M3
PIN_M5
PIN_M4

HEX5[0]
HEX5[1]
HEX5[2]
HEX5[3]
HEX5[4]
HEX5[5]
HEX5[6]

PIN_T2
PIN_P6
PIN_P7
PIN_T9
PIN_R5
PIN_R4
PIN_R3

HEX4[0]
HEX4[1]
HEX4[2]
HEX4[3]
HEX4[4]
HEX4[5]
HEX4[6]

PIN_U9
PIN_U1
PIN_U2
PIN_T4
PIN_R7
PIN_R6
PIN_T3

DE2 SEVEN SEGMENTS CONT.


HEX3

PIN

HEX2

PIN

HEX1

PIN

HEX0

PIN

HEX3[0]
HEX3[1]
HEX3[2]
HEX3[3]
HEX3[4]
HEX3[5]

PIN_Y23
PIN_AA25
PIN_AA26
PIN_Y26
PIN_Y25
PIN_U22

HEX2[0]
HEX2[1]
HEX2[2]
HEX2[3]
HEX2[4]
HEX2[5]

PIN_AB23
PIN_V22
PIN_AC25X
PIN_AC26
PIN_AB26
PIN_AB25

HEX1[0]
HEX1[1]
HEX1[2]
HEX1[3]
HEX1[4]
HEX1[5]

PIN_V20
PIN_V21
PIN_W21
PIN_Y22
PIN_AA24
PIN_AA23

HEX0[0]
HEX0[1]
HEX0[2]
HEX0[3]
HEX0[4]
HEX0[5]

PIN_AF10
PIN_AB12
PIN_AC12
PIN_AD11
PIN_AE11
PIN_V14

HEX3[6]

PIN_W24

HEX2[6]

PIN_Y24

HEX1[6]

PIN_AB24

HEX0[6]

PIN_V13

APPENDIX B QUARTUS DESIGN FILE


Note: Some VHDL files are too large to insert into the appendices
A. Block Diagram Files

Fig. A-: 8-Bit Tri-State BDF

Fig. A- : 8-Bit ALU BDF

Fig. A- : 8-Bit ALU with Tri-State

Fig. A- : CCR BDF

Fig. A-5 : Clock Presaler BDF

Fig. A-6 : DMAR Register BDF

Fig. A-7 : DMAR Register with Tri-State BDF

Fig. A-8 : IMAR Register

Fig. A-9 : Opcode Register

Fig. A-10 : Program Counter (PC) BDF

Fig. A-11 : A Register BDF

Fig. A-12 : B Register BDF

Fig. A-13 : C Register BDF

Fig. A-14 : D Register BDF

Fig. A-15 : Results Register BDF

Fig. A-16 : Zero Bit Checker BDF

B. VHDL Files
1) ADDSUB8BIT
--------------------------------------------------- ENTITY NAME: ADDSUB8BIT
-- ARCH STYLE: Structural
-- USES:
FULL_ADDER, XOR2, NEGATIVE, CRRYOUT
-- Author:
Dylan Brown
-- Date:
11/21/14
-------------------------------------------------- DESCRIPTION: VHDL code that allows for the addition
-- and subtraction of numbers taken in from two
-- separate 8-bit data buses
--------------------------------------------------- UPDATES: None
--------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Dylan Brown
Date: 11/21/14
LIBRARY WORK;
USE WORK.ALL;
ENTITY ADDSUB8BIT IS
PORT(INPUT1,INPUT2 : IN BIT_VECTOR(7 DOWNTO 0);
MINUS : IN BIT;
OVERFLOW : OUT BIT;
N : OUT BIT;
OUTPUT : OUT BIT_VECTOR(7 DOWNTO 0);
CARRYOUT : OUT BIT);
END ENTITY ADDSUB8BIT;
ARCHITECTURE STRUCTURE OF ADDSUB8BIT IS
--COMPONENT DECLARATIONS
COMPONENT FULL_ADDER

PORT(A,B,CIN : IN BIT;
S,COUT : OUT BIT);
END COMPONENT FULL_ADDER;
COMPONENT XOR2
PORT(A,B : IN BIT;
S : OUT BIT);
END COMPONENT XOR2;
COMPONENT NEGATIVE
PORT(A : IN BIT;
S,O : OUT BIT);
END COMPONENT NEGATIVE;
COMPONENT CRRYOUT
PORT(M,CRRY : IN BIT;
O : OUT BIT);
END COMPONENT CRRYOUT;
--INTERMEDIATE SIGNAL DECLARATIONS\
SIGNAL C0,C1,C2,C3,C4,C5,C6,C7,C8: BIT;
SIGNAL
INXORM0,INXORM1,INXORM2,INXORM3,INXORM4,INXORM5,INXORM6,INXORM7: BIT;
BEGIN
--COMPONENT INSTANTIATIONS
INXORM0 <= MINUS XOR INPUT2(0);
INXORM1 <= MINUS XOR INPUT2(1);
INXORM2 <= MINUS XOR INPUT2(2);
INXORM3 <= MINUS XOR INPUT2(3);
INXORM4 <= MINUS XOR INPUT2(4);
INXORM5 <= MINUS XOR INPUT2(5);
INXORM6 <= MINUS XOR INPUT2(6);
INXORM7 <= MINUS XOR INPUT2(7);
U0: FULL_ADDER PORT MAP(INPUT1(0),INXORM0,MINUS,OUTPUT(0),C0);
U1: FULL_ADDER PORT MAP(INPUT1(1),INXORM1,C0,OUTPUT(1),C1);
U2: FULL_ADDER PORT MAP(INPUT1(2),INXORM2,C1,OUTPUT(2),C2);
U3: FULL_ADDER PORT MAP(INPUT1(3),INXORM3,C2,OUTPUT(3),C3);
U4: FULL_ADDER PORT MAP(INPUT1(4),INXORM4,C3,OUTPUT(4),C4);
U5: FULL_ADDER PORT MAP(INPUT1(5),INXORM5,C4,OUTPUT(5),C5);
U6: FULL_ADDER PORT MAP(INPUT1(6),INXORM6,C5,OUTPUT(6),C6);
U7: FULL_ADDER PORT MAP(INPUT1(7),INXORM7,C6,C8,C7);
U8: XOR2 PORT MAP(C7,C6,OVERFLOW);
U9: NEGATIVE PORT MAP(C8,N,OUTPUT(7));
U10: CRRYOUT PORT MAP(MINUS,C7,CARRYOUT);
END ARCHITECTURE STRUCTURE;

2) Basic Adder
--------------------------------------------------- ENTITY NAME: basicAdders
-- ARCH STYLE: Behavioral
-- USES:
none
-- Author:
Dylan Brown

-- Date:
11/21/14
-------------------------------------------------- DESCRIPTION: File which contains the standard VHDL
-- code for a half adder and full adder, a negative bit
-- flag, a basic 2-input XOR gate, and a carry bit flag
---------------------------------------------------- UPDATES: CRRYOUT
--------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Dylan Brown
Date: 11/25/14
entity half_adder is
port(a,b : in BIT;
s,cout : out BIT);
end entity half_adder;
architecture behaviour of half_adder is
begin
s <= a xor b;
cout <= a and b;
end architecture behaviour;
entity full_adder is
port(a,b,cin : in BIT;
s,cout : out BIT);
end entity full_adder;
architecture behaviour of full_adder is
begin
s <= a xor (b xor cin);
cout <= (b and cin) or (a and cin) or (a and b);
end architecture behaviour;
entity XOR2 is
port(a,b : in BIT;
s: out BIT);
end entity XOR2;
architecture behaviour of XOR2 is
begin
s <= a xor b;
end architecture behaviour;
entity NEGATIVE is
port(a : in BIT;
s,o: out BIT);
end entity NEGATIVE;
architecture behaviour of NEGATIVE is
begin
s <= a;
o <= a;

end architecture behaviour;


entity CRRYOUT is
port(M,CRRY : in BIT;
O: out BIT);
end entity CRRYOUT;
architecture behaviour of CRRYOUT is
BEGIN
PROCESS(M,CRRY)
BEGIN
IF M = '1' THEN O <= '0';
ELSIF M = '0' THEN O <= CRRY;
END IF;
END PROCESS;
end architecture behaviour;

3) Binary to 7-Segment
--------------------------------------------------- ENTITY NAME: BIN4TO7SEG
-- ARCH STYLE: Concurrent/Behavioral
-- USES:
N/A
-- Author:
Trace Hill, Dylan Brown
-- Date:
11/22/14
-------------------------------------------------- DESCRIPTION: This module maps a 4-bit value to the 7 bits
-needed to display that value on one 7 segment
-display.
--------------------------------------------------- UPDATES:
--------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Trace Hill
Date: 11/22/14
-- Signed: Dylan Brown
Date: 11/22/14
ENTITY BIN4TO7SEG IS
PORT(BIN_IN:IN BIT_VECTOR(3 DOWNTO 0);
SEG:OUT BIT_VECTOR(6 DOWNTO 0));
END ENTITY BIN4TO7SEG;
ARCHITECTURE CONCUR OF
BEGIN
SEG <=
"1000000" WHEN
"1111001" WHEN
"0100100" WHEN
"0110000" WHEN
"0011001" WHEN
"0010010" WHEN

BIN4TO7SEG IS
BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN

=
=
=
=
=
=

"0000"
"0001"
"0010"
"0011"
"0100"
"0101"

ELSE
ELSE
ELSE
ELSE
ELSE
ELSE

"0000010" WHEN
"1111000" WHEN
"0000000" WHEN
"0011000" WHEN
"0001000" WHEN
"0000011" WHEN
"1000110" WHEN
"0100001" WHEN
"0000110" WHEN
"0001110";
END CONCUR;

4)

BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN
BIN_IN

=
=
=
=
=
=
=
=
=

"0110"
"0111"
"1000"
"1001"
"1010"
"1011"
"1100"
"1101"
"1110"

ELSE
ELSE
ELSE
ELSE
ELSE
ELSE
ELSE
ELSE
ELSE

Clock Contol
--------------------------------------------------- ENTITY NAME: CLK_CTRL
-- ARCH STYLE: Behvioral
-- USES:
N/A
-- Author:
Trace Hill
-- Date:
11/22/14
-------------------------------------------------- DESCRIPTION: Selects between two different clock sources.
--------------------------------------------------- UPDATES:
---------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Trace Hill
Date: 11/22/14
ENTITY CLK_CTRL IS
PORT(CLK1,CLK2 : IN BIT;
CLK_SEL
: IN BIT;
CLK_OUT
: OUT BIT);
END ENTITY CLK_CTRL;
ARCHITECTURE BEHAVE OF CLK_CTRL IS
BEGIN
CLK_OUT <= CLK1 WHEN CLK_SEL = '0' ELSE
CLK2 WHEN CLK_SEL = '1';
END ARCHITECTURE BEHAVE;

5) D flip-flop
--------------------------------------------------- ENTITY NAME: DFFLOP
-- ARCH STYLE: Behvaioral
-- USES:
N/A
-- Author:
Eugene Peyerk
-- Date:
11/22/14
-------------------------------------------------

-- DESCRIPTION: Functions as a D Flip Flop with


-- Active High Asycnhronous Clear and Preset inputs.
-- Triggers on Falling Edge.
--------------------------------------------------- UPDATES:
---------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Eugene Peyerk
Date: 11/22/14
ENTITY DFFLOP IS
PORT(CLK_L, D, CLR, PRESET, EN : IN BIT;
Q : OUT BIT);
END DFFLOP;
ARCHITECTURE DFF_ARCH OF DFFLOP IS
BEGIN
PROCESS(PRESET,CLR,CLK_L)
BEGIN
IF CLR = '1' THEN
Q <= '0';
ELSIF PRESET = '1' THEN
Q <= '1';
ELSIF CLK_L' EVENT AND CLK_L = '0' THEN
IF EN = '1' THEN
Q <= D;
END IF;
END IF;
END PROCESS;
END DFF_ARCH;

6) Master Control
--------------------------------------------------- ENTITY NAME: MASTER_CTRL
-- ARCH STYLE: Behvaioral
-- USES:
N/A
-- Author:
Trace Hill
-- Date:
11/22/14
-------------------------------------------------- DESCRIPTION: Has Enable/Disable control of an input
-clock signal which presumably goes
-to the rest of a system. Also passes a
-a CLR input through itself which also
-presumably goes to the rest of the system.
--------------------------------------------------- UPDATES:
-- (Trace Hill,11/25/14):
---------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic

-- dishonesty or deception such as cheating, stealing,


-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Trace Hill
Date: 11/22/14
ENTITY MASTER_CTRL IS
PORT(CLK_IN
CLK_EN, CLR_REGS
CLK_OUT,REG_CLR
END ENTITY MASTER_CTRL;

: IN BIT;
: IN BIT;
: OUT BIT);

ARCHITECTURE BEHAVE OF MASTER_CTRL IS


BEGIN
REG_CLR <= CLR_REGS;
CLK_OUT <= CLK_IN and CLK_EN;
END ARCHITECTURE BEHAVE;

7) 2 Input Mux
--------------------------------------------------- ENTITY NAME: mux2to1
-- ARCH STYLE: Behavioral
-- USES:
Nothing
-- Author:
Dylan Brown
-- Date:
11/21/14
-------------------------------------------------- DESCRIPTION: Takes in two buts and outputs one bit
-- that corresponds to what value the select line currently
-- holds.
--------------------------------------------------- UPDATES: None
--------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Dylan Brown
Date: 11/21/14
entity mux2to1
Port ( SEL
A
B
X
end mux2to1;

is
: in
: in
: in
: out

BIT;
BIT_VECTOR (7 downto 0);
BIT_VECTOR (7 downto 0);
BIT_VECTOR (7 downto 0));

architecture Behavioral of mux2to1 is


begin
X <= A when (SEL = '0') else B;
end Behavioral;

8) Register to 7-Segment
--------------------------------------------------

-- ENTITY NAME: REG_TO_7SEGS


-- ARCH STYLE: Structural
-- USES:
BIN4TO7SEG
-- Author:
Trace Hill
-- Date:
11/22/14
-------------------------------------------------- DESCRIPTION: This module maps an 8 bit value to the bits
-needed to display that value to two 7 segment
-displays (4 bits per display).
--------------------------------------------------- UPDATES:
---------------------------------------------------- Academic Honor Statement:
-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Trace Hill
Date: 11/22/14
LIBRARY WORK;
USE WORK.ALL;
ENTITY REG_TO_7SEGS IS
PORT(REG_IN
: IN BIT_VECTOR(7 DOWNTO 0);
SEG1_OUT : OUT BIT_VECTOR(6 DOWNTO 0);
SEG0_OUT : OUT BIT_VECTOR(6 DOWNTO 0) );
END ENTITY REG_TO_7SEGS;
ARCHITECTURE STRUCT OF REG_TO_7SEGS is
--COMPONENT DECLARATION OF BIN4TO7SEG
COMPONENT BIN4TO7SEG
PORT(BIN_IN: IN BIT_VECTOR(3 DOWNTO 0);
SEG
: OUT BIT_VECTOR(6 DOWNTO 0));
END COMPONENT BIN4TO7SEG;
BEGIN
U0 : BIN4TO7SEG PORT MAP(REG_IN(7 DOWNTO 4),SEG1_OUT);
U1 : BIN4TO7SEG PORT MAP(REG_IN(3 DOWNTO 0),SEG0_OUT);
END ARCHITECTURE STRUCT;

9) XOR
--------------------------------------------------- ENTITY NAME: XOR8BIT
-- ARCH STYLE: Behavioral/Structural
-- USES:
NEGATIVE
-- Author:
Dylan Brown
-- Date:
11/21/14
-------------------------------------------------- DESCRIPTION: VHDL code that takes in two numbers-each
-- from 8-bit data buses-and XOR's each of the respective
-- corresponding bits. The result is the output.
--------------------------------------------------- UPDATES: None

--------------------------------------------------- Academic Honor Statement:


-- In completing this, I have refrained from any form of academic
-- dishonesty or deception such as cheating, stealing,
-- plagiarism, or lying. This work is solely of my own origin.
--- Signed: Dylan Brown
Date: 11/21/14
LIBRARY WORK;
USE WORK.ALL;
ENTITY XOR8BIT IS
PORT(INPUT1,INPUT2 : IN BIT_VECTOR(7 DOWNTO 0);
N : OUT BIT;
OUTPUT : OUT BIT_VECTOR(7 DOWNTO 0));
END ENTITY XOR8BIT;
ARCHITECTURE STRUCTURE OF XOR8BIT IS
COMPONENT NEGATIVE
PORT(A : IN BIT;
S,O : OUT BIT);
END COMPONENT NEGATIVE;
SIGNAL S1 : BIT;
BEGIN
OUTPUT(0) <= INPUT1(0) XOR INPUT2(0);
OUTPUT(1) <= INPUT1(1) XOR INPUT2(1);
OUTPUT(2) <= INPUT1(2) XOR INPUT2(2);
OUTPUT(3) <= INPUT1(3) XOR INPUT2(3);
OUTPUT(4) <= INPUT1(4) XOR INPUT2(4);
OUTPUT(5) <= INPUT1(5) XOR INPUT2(5);
OUTPUT(6) <= INPUT1(6) XOR INPUT2(6);
S1 <= INPUT1(7) XOR INPUT2(7);
U1: NEGATIVE PORT MAP(S1,N,OUTPUT(7));
END ARCHITECTURE STRUCTURE;

APPENDIX C INDIVIDUAL (PHASE II) MODULE SIMULATION FILES

Fig. C-1: ALU Simulation

Fig. C-2: Clock Control Simulation

Fig. C-3: DRAM Simulation

Fig. C-4: Master Control Simulation

Fig. C-5: Program Counter Simulation

APPENDIX D FINAL DEMONSTRATION (PHASE III) SIMULATION FILES


# Simulates: CLK_CTRL
# Author:
Trace Hill
# Date:
11/22/2014
restart
#Create two oscillating clock sources for CLK1 and CLK2
force CLK1 0 0,1 5ns -r 10ns
force CLK2 0 0,1 10ns -r 20ns
#CLK_OUT should express CLK1
force CLK_SEL 0
run 40ns
#CLK_OUT should express CLK2
force CLK_SEL 1
run 40ns
Fig. D-1: Clock Control Do file

# Simulates: MASTER_CTRL
# Author:
Trace Hill
# Date:
11/22/2014
restart
#Create a clock waveform
force CLK_IN 0 0,1 5ns -r 10ns
#Start with CLK disabled
force CLK_EN 0
force CLR_REGS 0
run 20ns
#enable the CLK
force CLK_EN 1
run 20ns
#Clear the Registers
force CLR_REGS 1
run 5ns
force CLR_REGS 0
run 5ns
Fig. D-2: Master Control Do file

# Simulates: DRAMTop
# Author:
Trace Hill
# Date:
11/23/2014

restart
#Output Should be disconnected
force DRAMO 0
#Create the CLK source
force CLK 0 0,1 5ns -r 10ns
#Begin by writing values to different
#addresses in the DRAM
force RL_W 1
force ADDR X"00"
force DATA_I X"A5"
run 10ns
force ADDR X"C9"
force DATA_I X"F6"
run 10ns
force ADDR X"FF"
force DATA_I X"BC"
run 10ns
#Read those addresses that were written to
force RL_W 0
force ADDR X"00"
run 10ns
force ADDR X"C9"
run 10ns
force ADDR X"FF"
run 10ns
#Re-read those same addresses with DRAMO enabled
#As a note, this will enable what was once on the
force DRAMO 1
force RL_W 0
force ADDR X"00"
run 10ns
force ADDR X"C9"
run 10ns
force ADDR X"FF"
run 10ns
Fig. D-3: DRAM Do file

# Simulates: PC
# Author:
Trace Hill
# Date:
11/24/2014
restart
#Initial Values

force
force
force
force
force

DATA_IN X"00"
PCINC 0
PCCLR 0
PCI 0
PCO 0

#Set up the CLK


force CLK 1 1,0 5ns -r 10ns
# We should be disconnected
# for this clock pulse
run 10ns
#Test with no PCINC
#Counter should simply act as a register
force DATA_IN X"DE"
force PCI 1
run 10ns
force PCO 1
run 10ns
force PCO 0
force DATA_IN X"A5"
run 10ns
force PCO 1
run 10ns
force PCO 0
#Increment PC with PCO off
force PCI 0
force PCINC 1
run 10ns
force PCINC 0
force PCO 1
run 10ns
force PCO 0
#VERIFIED: PCINC works
#Clear out the PC
force PCI 0
force PCINC 0
force PCCLR 1
force PCO 0
run 10ns
force PCCLR 0
force PCO 1
run 10ns
force PCO 0
#Load up FE to check if PC recycles correctly
force PCI 1
force DATA_IN X"FE"
run 10ns

force PCO 1
run 10ns
force PCO 0
force PCINC 1
force PCI 0
run 10ns
force PCINC 0
force PCO 1
run 10ns
force PCO 0
force PCINC 1
force PCI 0
run 10ns
force PCINC 0
force PCO 1
run 10ns
force PCO 0
force PCINC 1
force PCI 0
run 10ns
force PCINC 0
force PCO 1
run 10ns
force PCO 0
#Counter recylces correctly
Fig. D-4: Program Counter Do file

restart
force clk '1' 1,0 5ns -r 10ns
force CLR '0'
force \\IN\\ "00000000"
force OEN '1'
force EN '1'
run 10ns
force \\IN\\ "11111111"
force OEN '1'
force EN '1'
run 10ns
force \\IN\\ "10000001"
force OEN '1'
force EN '0'
run 10ns

force OEN '1'


force EN '1'
run 5ns
force \\IN\\ "11001011"
force OEN '1'
force EN '1'
run 5ns
force CLR '1'
run 10ns
force OEN '0'
run 10ns
Fig. D-5: Register Do file

Fig. D-6: BEQ Branch Taken Simulation

Fig. D-7: BEQ Branch Not Taken Simulation

Note: The yellow line in the first figure is the same yellow line in the second figure. The simulation was too large to fit into a
single image.

Fig. D-8: Final Demo Program Part 1

Fig. D-8: Final Demo Program Part 2

APPENDIX E TIMING ANALYSIS


The ALU design unit was chosen for timing analysis because it is the most complex subsystem that exists within the THB14
microcontroller. This is due to the fact that the embedded ADDSUBB is a ripple implementation.
The worst, best, and typical propagation delays through the ALU design unit were analyzed by examining the number of Look-Up
Table (LUT) levels from the input to output of the unit. These levels were determined by viewing the Technology Map
available in Altera Quartus II for our THB14-DE2 project.
Look-Up Tables in parallel constitute 1 LUT level. For example, Fig. E-1 shows a picture of a Technology Map representation of
the ZERO module, which is a sub-unit of the ALU design unit. The blue cells represent the Look-Up-Tables in the design. There
are three LUT levels in this sub-unit.

Fig. E-1: This Technology Map screenshot shows that the ZERO sub-unit is performed using 3 LUT levels.

The propagation delay through one LUT was found in the Cyclone II Device Handbook, Volume 1 [3]. Table 5-16 of this
document specifies the minimum and maximum propagation delays through one LUT for different environmental specifications
(tLUT). Fig. E-2 is a screenshot of the relevant portion of the aforementioned table.

Fig. E-2: Screenshot of Table 5-16 in the Cyclone II Device Handbook, Volume 1. [3] tLUT is the parameter of interest.

In Fig. E-2, the parameter tLUT is specified for multiple environmental conditions: -6 Speed Grade, -7 Speed Grade, and 8 Speed Grade. -6 Speed Grade is used for commercial products, therefore the minimum and maximum propagation delays for
tLUT were chosen from this column. A typical propagation delay was not reported in the table seen in Fig. E-2, so we used an
approximate average of the minimum and maximum values for tLUT for our typical propagation delay value. Therefore, to
summarize, the selected values for tLUT used in the timing analysis of the ALU design unit are given in Table E-1.

Parameter
tLUT

Minimum
180ps

Typical
300ps

Maximum
438ps

Table E-1: Selected timing parameters for the propagation delay through one Look-Up Table (LUT)

The final timing specifications discovered for the ALU design unit can be seen in Table E-2. The Operation column was
included because the LUT levels differ based on whether the ALU is executing an Addition/Subtraction or a XOR. This is
intuitive, as an Addition/Subtraction operation requires that inputs propagate through the ripple ADDSUB8BIT sub-unit (8 LUT
levels) whereas an XOR operation requires inputs only propagate through the XOR sub-unit (1 LUT level). The delay was
calculated using Equation 1:
(1)

Propagation
Parameter
Worst Case
Best Case
Typical Case
Typical Case

Operation

LUT Levels

tLUT Used

Delay

Add/Sub
XOR
Add/Sub
XOR

14
7
14
7

438ps
180ps
300ps
300ps

6.132ns
1.260ns
4.200ns
2.100ns

Table E-2: Summary of timing specifications for the ALU design unit

The parameters in Table E-2 were discovered by tediously analyzing each sub-unit of the ALU design unit in the Technology
Maps hierarchy. Using Fig. E-3 through Fig. E-11, the values in Table E-2 were derived.

Fig. E-3 ALU design unit

Fig. E-4: Inside the ALU design unit. An input must propagate through both 8bitALU (Fig. E-6) and 8bitTRIstate (Fig. E-5). The signals X[5]~2 and OUT[1..4]
are only pinned out to simulation outputs, and are therefore not recognized in this analysis.

Fig. E-5 8BitTRIState sub-unit has 2 LUT levels.

Fig. E-6: Inside of the 8BitALU sub-unit. The critical path through the circuit is through the ADDSUB8BIT (Fig. E-7), mux2to1 (Fig. E-10), and finally the
ZERO (Fig. E-10) sub-units. The 21mux: inst9 occurs in parallel with the ZERO module and is only 1 LUT level, therefore it is not included in the further figures.

Fig E-7: Inside of the ADDSUB8BIT sub-unit. This sub-unit is 8 LUT levels from input to output, as each FULL_ADDER sub-unit is 1 LUT level (see Fig. E-8).

Fig. E-8: Inside of the FULL_ADDER sub-unit. This sub-unit is 1 LUT level from input to output.

Fig. E-9: Inside the XOR sub-unit. This sub-unit is 1 LUT level from input to output and is only used if the ALU is set to perform an XOR operation.

Fig. E-10: Inside of the mux2to1 sub-unit. This sub-unit is 1 LUT level from input to output.

Fig. E-11: Inside of the ZERO sub-unit. This sub-unit is 3 LUT levels from input to output.