Beruflich Dokumente
Kultur Dokumente
Speed Lake
CSSE 232 Processor Project
Team 2A
Thaddeus Hughes, Evë Maquelin, Matthew Howlett, Ian Sheffert, David Li
1
Table of Contents
Changelog 4
Executive Summary 6
Our Processor 6
Design 6
Instructions and Control 6
Multi-Cycle Design 7
Memory and the Stack 7
Procedure Calls 8
Implementation 8
Testing 9
Compiler and Assembler 10
Results 11
Conclusion 12
2
3
Milestone 1 58
Meeting Monday, January 8 58
First Meeting Wednesday, January 10 59
Second Meeting Wednesday, January 10 59
Milestone 2 60
Impromptu Meeting Thursday, January 11 60
Meeting Friday, January 12 61
Meeting Wednesday, January 17 61
Milestone 3 62
Meeting Sunday, January 21 62
Meeting Tuesday, January 23 62
Milestone 4 63
Meeting Thirstday, January 25 63
Meeting Sunday, January 28 63
Milestone 5 64
Meeting Monday During Class 2/5/2018 64
Meeting Monday Evening 2/5/2018 64
Meeting Tuesday During Class 2/6/2018 64
Meeting Wednesday During/After Class 2/7/2018 64
Meeting Monday During Class 2/12/2018 64
Meeting Tuesday During Class 2/13/2018 65
Meeting Wednesday During and After Class 2/14/2018 65
Meeting Thursday After Scheduled Meeting 2/15/2018 65
Meeting Wednesday 2/21/2018 66
4
Changelog
Version Date Description
1.0 January 10, 2018 Initial version of the document created for Milestone 1
5
6
Executive Summary
We have designed a simple accumulator-style processor, which runs on an FPGA board. The
processor provides basic support for the LCD and buttons on the board, and a compiler /
assembler were created in order to make programming easier.
In this document we will discuss the instruction set, implementation, testing, and final
performance results of the processor.
Our Processor
We chose to implement an multi-cycle accumulator-style processor with a stack. We designed
ours with only one working register, the DA register, and an indirect addressing (IA) register to
store memory addresses.
Though some of us had prior experience working with accumulators, there was still a lot left to
learn about how they worked. Also, the addition of a stack seemed like an easy improvement
that would be quite valuable to the programmer. Later, our choice of style turned out to be quite
convenient as the lack of arguments on most instructions left room for a lot of opcodes and
therefore a wide range of instructions. This led to the creation of an instruction set that not only
computed relative primes, but could handle general computation with ease.
Design
Over the course of the project we were forced to make a lot interesting but difficult design
decisions. Though most of our decisions were made with processor efficiency and elegance as
the priority, such as the design of our control unit and the ALU design, some were made with
ease of implementation as the priority, such as choosing to keep our cycle times and cycles per
instruction constant.
7
accumulator and values stored in memory. By putting the memory address of a value into the IA
register, instructions like add/addm can use both the value in the DA register and and the value
in memory and store the result accordingly. Add stores the value into DA and addm stores the
value back into memory at IA. This pattern is repeated across many of our instructions.
Due to the nature and size of our instruction set, we thought the conventional or expected
method of designing our control unit, a finite state machine that takes whole instructions as
inputs, would be too large and inefficient. Instead of creating cases for each instruction or group
of instructions, we created cases for each control signal and instruction type. Since most
control signals have default states for each instruction cycle, this brought our cases per cycle
down from a potential 44 to an average of 9. Though not efficient to design or even to
implement, this was far and away the best decision for hardware efficiency.
Multi-Cycle Design
Of all the parts of our processor, cycle design is where we took the most shortcuts and made
the most compromises. Each instruction has three equal-length cycles: Fetch, Memory Read,
and Memory Write. Though it was easiest to implement, we missed out on a major optimization
opportunity. Not all instructions make use of the Memory Read and Memory Write stages, so we
could have modified our control unit to skip those stages for certain instructions. Doing this,
however, would have involved making significant changes to our control unit, which was
infeasible given the current time constraints, as well as our datapath. Ultimately, we opted to
leave it as is in the hopes of implementing some sort of pipeline, but that never occured.
8
down the expected process of loading a value, manipulating it in DA, and storing back to
memory to a single instruction.
As they are housed in the same memory unit, there is the potential for the stack and program
memory to collide in nasty ways. The stack builds up from the bottom of memory, whereas
program memory is indexed from the top. If a program makes too deep of a recursive call, it’s
possible they will begin to overlap. This could be mitigated by increasing the size of the memory
unit, but was not an issue we ran into in any of our testing.
Procedure Calls
Though most of the details surrounding procedure calls can be found in Appendix A, the design
of our calling conventions was the subject of heated debate for several days. As such, we felt it
deserved special mention.
Unlike many other processor designs, our processor does not have dedicated registers for
procedure arguments and return values. Instead these values are stored on the stack and the
responsibility of preserving data is left to the caller. As these conventions are not
hardware-enforced, it is imperative that the programmer follow them carefully.
Implementation
We began datapath design by whiteboarding out the individual components we knew we would
need and beginning to connect them by going through the list of instructions and their
associated RTL to make sure they were supported. This was a somewhat iterative process as
we found ways to shrink muxes, reduce stages of logic, and found ways to make components
serve multiple purposes (i.e. we originally had an ALU dedicated for the IA register, but
determined we could use the primary ALU for the same purpose).
9
Overall, the design focuses on reducing the hardware footprint as much as possible in order to
speed up cycles.
Our Xilinx model is almost entirely written in Verilog. This is because we all preferred to read
code rather than schematics, and some of us had significant prior experience with Verilog and
only one team member had experience with VHDL. We implemented the processor in modules
which matched the initial integration tests: decoder, registers, ALU, (program) memory, control,
and LCD driving. This made debugging and splitting up work easier in the long term, we believe,
although it did make for finding problems and incorrect links difficult at times.
Testing
We began with simple unit tests for all individual components (i.e. general purpose registers,
adders, muxes). In each of these, we choose the component to test, determine the expected
inputs, control signals, and outputs, build the testbench, run it, and check with tables in the
testbench to determine validity. On most individual components, there were no major changes
to be made in order for components to work.
After unit tests, we integrated some of these into small segments of the processor.
The PC Subsystem consists of the program counter register, incrementer, return address
register, the necessary muxes which hook these together, and the input control signals. Testing
for this system proved rather straightforward with no major changes necessary.
The ALU subsystem consists of the ALU, muxes into it, and necessary control signals. Testing
of this system proved very straightforward with no changes necessary.
10
The Stack/Program Memory Subsystem consists of the SP register, IA register, memory, and
necessary muxes. Testing of this proved very straightforward with no changes necessary.
The Control testbench tests, as expected, the control unit. This was the second-most daunting
part of the whole processor and testing, and in our testing we had to do countless small fixes to
make things work as expected, and also had to refer back to the testbench when we had more
fundamental issues with our processor (such as an ALU output latch).
The Processor core consists of the necessary components to execute a given instruction
(everything but PC subsystem and instruction memory). We found that we would need a register
to serve as a latch on the output of the ALU.
At this point, we were ready to test the entire processor, which gave us some headaches. We
found that our branch control was not working. This was because we were expecting the wrong
outputs in the PC test (we needed to branch to PC | imm, not PC+1 | imm). We were using the
assembler at this point, and found that there were some bugs within it that caused unexpected
program behavior. After this was resolved, we battled glitches with our assembly before finally
making Euclid’s algorithm work.
11
Results
We were able to get the model implemented on the FPGA board- this included running a simple
program which would take 8-bit inputs separated by 2 seconds and merging them into a 16-bit
input, running relPrime, and then displaying the result to the FPGA screen.
12
Conclusion
Creating a processor is quite difficult! We learned that there are many compromises to be made
in a design, and having gone through one iteration of a design helps you to make better
compromises. After doing one pass through, there are many changes we would individually like
to see happen, and some that we can all agree would be significant improvements, such as
leveraging multi-cycle to its fullest and shortening the length of some instructions.
13
A: Software Specifications
Available Registers
14
15
16
Instruction Semantics
OR to Memory
orm N (00) 00 1000 1101 XXXX
Performs a logical OR on the value stored at memory as described by the IA register and
the contents of DA register. The result is put into memory at the location described by
the IA register.
OR with Immediate
ori imm I (01) 10 1010 iiii iiii
Put the result of a logical OR between DA register and the immediate value into DA
register
Shift Left Logical
sll shamt N (00) 00 0110 0010 shmt
Shift the value in DA register left by shamt.
Shift Right Logical
srl shamt N (00) 00 1010 0010 shmt
Shift the value in DA register right by shamt.
Shift Left Logical to Memory
sllm shamt N (00) 00 0110 0101 shmt
Shift the value in memory at IA left by shamt.
Shift Right Logical to Memory
srlm shamt N (00) 00 1010 0101 shmt
Shift the value in memory at IA right by shamt.
Subtraction to DA
sub N (00) 00 0010 1110 XXXX
Subtracts the value stored at memory as described by the IA register from the contents
of DA register. The result is put into DA register.
Subtraction to Memory
subm N (00) 00 0010 1101 shmt
Subtracts the value stored at memory as described by the IA register from the contents
of DA register. The result is put into memory at the location described by the IA register.
Load Upper Immediate
lui imm I (01) 00 0000 iiii iiii
Load the immediate value specified into the upper half of DA register.
Load Immediate
li imm I (01) 00 0100 iiii iiii
Load the immediate value specified into the lower half of DA register. Sign extended.
18
Two's Complement
two N (00) 00 0100 0010 XXXX
Take the two’s complement of DA register. The result is also put in DA register.
Branch/Jump Instructions
Branch if Not Equal To 0
bnez label B (
1) 111 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register does not contain the
value 0.
Branch if Equal To 0
bez label B (1) 110 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register contains the value 0.
Branch if No Carry
bnc label B (1) 101 LLLL LLLL LLLL
Conditionally jump to the address specified by label if the carry bit is set to 0 from the
previous operation.
Branch if Carry
bc label B (1) 100 LLLL LLLL LLLL
Conditionally jump to the address specified by label if the carry bit is set to 1 from the
previous operation.
Branch if Positive
bp label B (1) 011 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register’s first bit is 0.
Branch if Negative
bn label B (1) 010 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register’s first bit is 1.
Jump
j label B (1) 000 LLLL LLLL LLLL
Jumps to the address specified by label.
Call
call label B (1) 001 LLLL LLLL LLLL
Jumps to the address specified by label after storing the contents of the PC register
(incremented by 1) in the RA register.
19
Return
ret N (00) 10 1111 0001 XXXX
Jumps to the address stored in the RA register.
20
Stack Instructions
Push to Stack
push N (00) 01 1110 0100 0XXX
Pushes the contents of DA register to the stack. Decrements the SP register.
Pop off Stack
pop N (00) 01 1111 0010 0XXX
Loads the word at the top of the stack into DA register. Increments the SP register.
Push RA to Stack
pushra N (00) 01 1110 1000 0XXX
Pushes the contents of RA register to the stack. Decrements the SP register.
Pop RA off Stack
popra N (00) 01 1111 1000 1XXX
Loads the word at the top of the stack into RA register. Increments the SP register.
Push IA to Stack
pushia N (00) 01 1110 0000 0XXX
Pushes the contents of DA register to the stack. Decrements the SP register.
Pop IA off Stack
popia N (00) 01 1111 0001 0XXX
Loads the word at the top of the stack into DA register. Increments the SP register.
21
Example Program
The following program is an example of programming in our processor’s assembly language. It
finds the relative primes of some input N.
Assembly Program
relPrime:
# Fetch argument n
la 0 # Load address of variable N into IA
pop # Pop off the last argument from the stack
sw # Store DA into mem[IA]
# Create variable M
la 1 # Load address of variable M into IA
li 2 # Load value 2 into DA
sw # Store DA into mem[IA]
relPrime_loop:
pushra # Backup critical registers (DA, IA, PC)
pushia
push
# Setup up N and M as arguments for gcd
la 0 # Load address of variable N into IA
lw # Load mem[IA] into DA
push # Push DA onto the stack (put N on as an argument)
# Repeat for M
la 1
lw
push
# Call the GCD function
call gcd
# Get the return values
pop # put return value into DA
addi -1 # subtract 1
# if the result is zero (return value == 1), we're done
bez relPrime_done
pop
popia
popra # Restore critical registers (DA, IA, PC)
22
li 1 # load the immediate 1 into DA
addm # mem[IA] = DA + mem[IA] (recall! IA points to M
after the rya)
relPrime_done:
lw # NOTE: M is already in IA at this point
push # Push DA onto stack
ret # Go back to where we came from
gcd:
# fetch argument B
la 1
pop
sw
# fetch argument A
la 0
pop
sw
bnez gcd_nonzero # DA contains a, check if nonzero
la 1 # IA = addr of B
lw # DA = mem[IA] = B
push # Push DA onto stack (put B on stack)
ret # Go back to where we came from
gcd_nonzero:
gcd_loop:
la 1 # IA = addr of B
lw # DA = mem[IA] = B
# if B == 0 then we're done
bez gcd_done
# a = a-b
la 0 # IA = addr of A
lw # DA = mem[IA] = A
la 1 # IA = addr of B
sub # DA = DA-mem[IA] = A-B
la 0 # IA = addr of A
sw # A = DA
# skip over the next case
j gcd_casedone
gcd_case2:
# b = b-a. Same as above
23
la 1
lw
la 0
sub
la 1
sw
gcd_casedone:
j gcd_loop # go to start of loop
gcd_done: # It's over!
la 0 # IA = addr of A
lw # DA = mem[IA] = A
push # push DA (A) onto stack
ret # Go back to where we came from
24
0010111001010000 # sw
0100101100000000 # la 0
0001111100100000 # pop
0010111001010000 # sw
1110000000101010 # bnez gcd_nonzero
0100101100000001 # la 1
0000111101100000 # lw
0001111001100000 # push
0010111100010000 # ret
0100101100000001 # la 1
0000111101100000 # lw
1111000000111111 # bez gcd_done
0100101100000000 # la 0
0000111101100000 # lw
0100101100000001 # la 1
0000001011100000 # sub
0100101100000000 # la 0
0010111001010000 # sw
1000000000111101 # j gcd_casedone
0100101100000001 # la 1
0000111101100000 # lw
0100101100000000 # la 0
0000001011100000 # sub
0100101100000001 # la 1
0010111001010000 # sw
1000000000101011 # j gcd_loop
0100101100000000 # la 0
0000111101100000 # lw
0001111001100000 # push
0010111100010000 # ret
25
Common Operations
26
pop # restore the index to DA
addi -1 # decrement the index
iap 16 # increment the position of IA by one word
bnez loop # check if we have traversed the whole array
# continue the program here
Machine Code
Note: loop is found at address 0x111
0101 10XX 0000 1010
0010 010X XXXX XXXX
0001 001X XXXX XXXX
0010 011X XXXX XXXX
0100 00XX 1111 1111
0110 01XX 0001 0000
1000 0001 0001 0001
27
Conditional Statements
Assembly
Skip over code if memory at label A equals memory at label B
la A
lw
la B
sub
bez equal
# Put code to execute if A!=B
equal:
Skip over code if memory at label A is less than memory at label B
la A
lw
la B
sub # DA = A-B
bnc gt
# Put code to execute if A <= B
gt:
Machine Code
Skip over code if memory at label A equals memory at label B
#for reference
A address = 0 000 0001
B address = 0 000 0010
equal address = 1 010 1010 1010
gt address = 0101 0101 0101
#############################
0110 00XX 0000 0001
0001 001X XXXX XXXX
0110 00XX 0000 0010
0000 110X XXXX XXXX
1001 1010 1010 1010
# Put code to execute if A!=B
0000 1010 1010 1010:
Skip over code if memory at label A is less than memory at label B
0110 00XX 0000 0001
0001 001X XXXX XXXX
28
29
B: Hardware Specifications
Register Transfer Language
Arithmetic
A/L to memory A/L to DA A/L with imm to DA
inst = mem[PC]
Fetch
newPC = PC+1
Instruction
ALUOut = DA op
Stage 1 ALUOut = DA op Mem[IA]
SE/ZE/ZEu(inst[7:0])
Memory Instructions
lw sw
inst = mem[PC]
Fetch newPC = PC+1
PC = newPC
30
iap la oua sia lia
inst = mem[PC]
Fetch newPC = PC+1
PC = newPC
IA = ALUOut DA = ALUOut
Stage 2 PC = newPC PC = newPC
Jumps/Branches
ret Branch Jump
Call
inst = mem[PC]
Fetch newPC = PC+1
if <flag>
PC = PC[15:12] || inst[11:0] RA = PC
PC = PC[15:12] ||
Stage 1 PC = RA
else inst[11:0]
PC = PC[15:12] ||
inst[11:0]
PC = newPC
31
LCD / Buttons
lcdwr lcdclr lcdmc buttr
inst = mem[PC]
Fetch newPC = PC+1
Stack Instructions
pop popia popra push pushia pushra
inst = mem[PC]
Fetch newPC = PC+1
32
Component Specification
1. General Purpose Register
a. Input Signal(s): 16-bit regWrite
b. Output Signal(s): 16-bit regRead
c. Control Signal(s): 1-bit writeEnable
d. Description: The input signal is ignored unless the writeEnable control signal is
set to 1. If it is, then the contents of the register is overwritten with the contents of
the input signal. Regardless of the control signal, the output signal will always
reflect the contents of the register.
2. Program Memory
a. Input Signal(s): 16-bit Address, 16-bit WriteData
b. Output Signal(s): 16-bit ReadData
c. Control Signal(s): 1-bit MemRead, 1-bit MemWrite
d. Description: If MemWrite is 1, then the data on the WriteData input will be
written to the memory address on the Address input. If MemRead is 1, then the
data at the memory address on the Address input will be available on the
ReadData output.
3. 1:2 Mux
a. Input Signal(s): 16-bit in0, 16-bit in1
b. Output Signal(s): 16-bit out
c. Control Signal(s): 1-bit select
d. Description: Select in0 or in1 to be fed to out.
4. 2:4 Mux
a. Input Signal(s): 16-bit in0, 16-bit in1, 16-bit in2, 16-bit in3
b. Output Signal(s): 16-bit out
c. Control Signal(s): 2-bit select
d. Description: Select one of the inputs to be fed to out.
5. Sign Extension Unit
a. Input Signal(s): 8-bit signal
b. Output Signal(s): 16-bit signal
c. Control Signal(s): None
d. Description: Sign extends the 8-bit input signal to 16-bits
e. Implements SE
6. Zero Extension Unit
a. Input Signal(s): 8-bit signal
b. Output Signal(s): 16-bit signal
c. Control Signal(s): None
d. Description: Zero extends the 8-bit input signal to 16-bits (zeros on MSB side)
e. Implements ZE
33
34
Datapath Schematic
35
Control Signals
Into Control Unit:
DA The entire DA register is used as an input to the control unit (this provides
the zero and negative ‘flags’).
CARRY The carry bit from the ALu is fed into the control unit.
Out of Control Unit:
PCSRC1 Selects between newPC/branched PC and RA as write input for PC
register.
PCW Whether or not to write to the PC register.
FEN Which flag to use (for branch operations)
FINV Whether or not to invert the flag (for branch operations)
IMEMR Whether or not to read from instruction memory.
IASRC Selects whether to feed ALUResult, MemOut, or DA as write input for IA
register.
IAW Whether or not to write to the IA register.
DASRC Selects whether to feed ALUResult, MemOut, IA, or Buttons & Switch as
write input for DA register.
DAW Whether or not to write to the DA register.
SPDIR Selects whether to increment (1) or decrement (0) the SP register.
SPW Whether or not to write to the SP register.
RASRC Selects whether to feed MemOut or newPC as write input for RA register.
LCDROW Selects the row on the LCD display the cursor to move to.
LCDSTARTADDRESS Selects the position in the row for the cursor to move to.
LCDMOVECURSOR Indicates to the LCD driver that the cursor needs to move to another
location.
LCDCLEAR Indicates to the LCD driver that the LCD needs to be cleared.
LCDWRITE Indicates to the LCD that the lcd_DP needs to be written to the display.
ALUASRC Selects whether to feed IA, DA, or RA into ALU input A
ALUBSRC Selects whether to feed zero-extended-upper, zero-extended,
sign-extended immediate, or MEM_OUT
ALU_LATCHW Whether or not to write to the ALU Latch
ADDRSRC Selects whether to feed IA or SP as memory address.
MEMR Whether or not to read from the memory unit.
MEMW Whether or not to write to the memory unit.
36
Control Unit
The processor is split into three cycles, the fetch, read, and write stages. With the exception of
the fetch stage, where all control signals are predetermined, control signals are based off of bits
in the instruction.
The instructions are split into sets of bits that help control determine not only what type of
instruction it is working with, but values for the relevant control signals. See the Machine
Language Instruction Types section for more information on these sections, as they will be
referred to by name.
For the Read and Write stages, if the instruction prefix is 1X, the signals fall to the Branch (B)
column. Then if the prefix is 00, they fall to the Inherent (N) column. Else, they fall to the
Immediate (I) column. Commas separate wires in a bus, and conditions evaluate to 1 for true
and 0 for false.
Fetch Stage
FEN XX SPW 0
FINV X RASRC X
PCSRC0 0X RAW 0
PCW 0 ALUA XX
IMEMR 1 ALUB XX
IASRC XX ALUOP XXX
IAW 0 ADRSRC X
DASRC XX MEMR 0
DAW 0 MEMW 0
SPDIR X ALU_LATCHW 0
37
Opcode Breakdown
For the next two stages, opcodes play a very important role in the decoding of control signals.
Due to the number of inherent type and simplicity of immediate type instructions, we are able to
manipulate the bits in the instruction to determine the appropriate control. These bits are
typically found in the op and grp sections of instructions, as defined in the Machine Language
Instruction Types section.
Inherent Opcodes
Inherent opcodes, due to their variety, are further broken down into subgroups numbered 0
through 3. The grp section indicates which group an instruction belongs to. The op breakdown
for each subgroup is as follows:
Group 00 - Arithmetic/Logic (A/L)
The DASRC mux will either be set to 00 or 11, this bit signifies which
op[4] DASRC of the two it should be.
The ALUB mux will either be set to 01 or 11, this bit signifies which
op[3] ALUB of the two it should be.
op[2] MEMR This bit is the memory read control signal in Read Stage
op[1] DAW This bit is the DA write control signal in Write Stage
op[0] MEMW This bit is the memory write control signal in Write Stage
Group 01 - Stack
Note: These instructions “cheat” by dipping into the shamt space of the instruction to store
extra data.
This is a very busy bit. It controls:
SPDIR/ ● The direction of the SP register adder
MEMR/ ● The control signal for memory read in Read Stage
MEMW/ ● The inverse of memory write in Write Stage.
op[4] DASRC ● The bits for DASRC
op[1] DAW
op[0] IAW
shamt[3] RAW These bits control the write signals to the DA, IA, and RA register.
38
op[3] IAW This bit is the IA write control signal in Write Stage
op[2] MEMW This bit is the memory write control signal in Write Stage
op[1] DAW This bit is the DA write control signal in Write Stage
The DASRC mux will either be set to 00 or 01, this bit signifies which
op[0] DASRC of the two it should be.
Group 11 - LCD
LCD control signals are generated by AND-ing the whole instruction with values hardcoded into
control, as the control structure for LCD is fairly straightforward.
Immediate Opcodes
This bit is a heavy lifter. It controls:
IASRC/ ● Both bits of IASRC
DASRC/ ● The inverse of the first bit of DASRC
ALUA/ ● The second bit of ALUA
IAW/ ● The IA write control signal in Write Stage
op[3] DAW ● The inverse of the DA write control signal in Write Stage
IAW/ This bit is the IA write control signal in Write Stage, and its inverse is
op[0] DAW the DA write control signal in Write Stage
Branch/Jump Opcodes
FEN/
op[2:1] PCSRC1 These bits control the FEN signal. When 00 they also control PCSRC1
39
Read Stage
N I B
FEN 00 00 FEN
FINV 0 0 !FINV
Snow: PCSRC1
0 FEN == 00
PCSRC1 Others: 0
PCW DEFAULT: 0
IMEMR DEFAULT: 0
IASRC grp* IASRC,IASRC XX
IAW DEFAULT: 0
Snow: 0,DASRC
DASRC,0 XX
DASRC Others: DASRC,DASRC
DAW DEFAULT: 0
Stack: SPDIR
XX XX
SPDIR Others: X
pop**: 1
DEFAULT: 0
SPW Others: 0
RASRC 0 X 1
RAW DEFAULT: 0
Stack: ALUA
0,ALUA XX
ALUA Others: 01
A/L: ALUB,1
ALUB XX
ALUB Others: XX
ALUOP aluop alu,0 XXX
ADRSRC grp[0]* 0 XX
A/L and Stack: MEMR
0 0
MEMR Others: 0
MEMW DEFAULT: 0
ALU_LATCHW DEFAULT: 1
* the group number doubles as control for the mux into the IA register, and bit 0 serves as the
ADRSRC control when need be
** SPW is 1 for pop, popra, and popia
40
Write Stage
If a control signal is not explicitly stated, it should be assumed that it did not change from Read
Stage.
N I B
FEN RS RS RS
FINV RS RS RS
PCSRC1 RS RS RS
PCW DEFAULT: 1
IMEMR DEFAULT: 0
IASRC RS RS RS
Stack,Snow: IAW
IAW 0
IAW Others: 0
RS RS RS
DASRC
DAW DAW DAW 0
SPDIR RS RS RS
push**: 1
DEFAULT: 0
SPW Others: 0
RASRC RS RS RS
Stack: RAW
0 call*
RAW Others: 0
ALUA RS RS RS
ALUB RS RS RS
ALUOP RS RS RS
ADRSRC RS RS RS
MEMR DEFAULT: 0
MEMW MEMW 0 0
ALU_LATCHW DEFAULT: 0
Note: RS = Value from Read Stage
* only the call instruction turns this bit on
** SPW =1 for push, pushra, and pushia
41
Procedure
1. Identify the block in the RTL you want to test
2. Identify the initial conditions of the CPU.
3. Identify the final conditions that should result from the execution of the instruction
4. Step through the commands in the RTL chart and record all changes within the CPU
5. Verify that the final state of the CPU matches the expected final state
RTL Markup
Arithmetic/Logical To DA
Add, And, Or, Sub, Two
Inst = Mem[PC] //gets the instruction
newPC = PC +1 //increments the instruction counter
Op = inst[15:8] //selects operation
B=Mem[IA] //loads memory from address at IA
A= DA
DA = A+B // DA = A&&B // DA = A||B // DA = A-B // DA = two(B)
PC = newPC
Addm, Andm, Orm, Subm
Inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
Op = inst[15:8] //selects ALU operation
B= Mem[IA] //loads memory from address at IA
A=DA
Mem[IA] = A + B // Mem[IA] = A&&B // Mem[IA] = A||B // Mem[IA] = A-B
PC = newPC
Addi
Inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
Op = inst[15:8] //selects the operation
A=DA //puts DA in the A input of ALU
42
43
44
PC = newPC
Push
inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
op = inst[15:11] //selects the push instruction
newSP = SP-1 //moves the SP register down 1
SP = newSP
Mem[SP] = DA //puts DA onto the stack
PC = newPC
Pop
inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
op = inst[15:11] //selects the pop operation
newSP = SP+1 //moves the stack pointer up 1
SP = newSP
PC = newPC
Iap
inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
IA = IA + inst[11:0] //increments the IA register by an immediate value
PC = newPC
La
inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
IA = ZE(inst[7:0]) //loads a zero extended immediate into IA
PC = newPC
Oua
inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
IA = IA | inst[11:0] //ors the IA register with an immediate value
PC = newPC
J
inst = Mem[PC] //gets the instruction
newPC = PC+1 //increments the instruction counter
PC = PC[15:12] concat inst[11:0] //concatenates the top 4 bits of PC to the lower 12
bits of an immediate
Call
inst = Mem[PC] //gets the instruction
45
46
Muxes
4 Input, 2 Bit Control Mux
47
Will take in 3 or 4 inputs of varying lengths and output 1 value based on selection by control
bits.
The tester will ensure that result of testing corresponds with table and diagram.
Control Bit Signals Mux Output
0 0 A
0 1 B
1 0 C
1 1 D
2 Input 1 Bit Control Mux
Will take in 2 inputs of varying lengths and output 1 value based on selection by control bits.
The tester will ensure that result of testing corresponds with table and diagram
48
Control Bit Signal Mux Output
0 A
1 B
Registers
The purposes of registers in this design is to store data for future use. The registers must be
able to have an input written to it and output the value written in it to function properly. All of the
registers in our design need to support this functionality for data with a size up to 16 bits.
16 Bit Registers
Will write an input to its reserved memory location when the control signal is 1 and continually
output the value store in it. As the register needs to be able to store any 16 bit value an effective
testbench will test several 16 bit values. The intermediate registers are essentially multiple
single 16 bit registers combined. As a result they will be tested in the same way as the single 16
bit registers.
The table below offers examples of 16 bit values to test.
49
16 bit data input Register Control Signal Output
ALUs
The below tables are used to test the ALUs present in our design. To make it easier to find
specific components for testing single operation components are included in this section.
Single Operation ALU 1 Input
This includes all of the operations extenders and incrementers that will need to individually
complete. Each operation in the below table corresponds to one of the single operators used in
the processor. For testing single operation ALUs use the table as a reference
Single Operation Data Input Operation Control Data Output
ALU Signal
Received
Sign Extender 1111 0000 Sign Extend N/A 1111 1111 1111 0000
50
Zero Extender 1111 0000 Zero Extend N/A 0000 0000 1111 0000
Zero Extender 1111 0000 Zero Extend N/A 1111 0000 0000 0000
Upper Upper
51
Other Components
Consists of parts that are unique and that cannot be easily grouped together
Memory Unit
The memory unit is used to access and store all needed values from memory. For it to work
properly it must receive a 16 bit data address, control signals, a 16 bit input, and output a 16 bit
value. The below table outlines a basic test of the memory unit.
Address Input Data Input MemW (control MemR (control Output
signal) signal)
Instruction Memory
The instruction memory stores the output data the control unit needs to perform its operations.
The memory is accessed by 16 bit inputs from our instructions. A control signal called IMemR
controls whether or not the instruction memory outputs data.
Address Input IMemR (control signal) Output
0xFDEC 0 0x0000
52
Integration Plan
In order to successfully integrate our components, we will follow the 3 step plan outlined below.
The general testing procedure for each set of components is to iterate through the applicable
permutations in the control signals, and compare the expected state after each permutation
with the actual state. Permutations may be tested with multiple starting states if there are
anticipated edge cases we would like to cover.
We are asserting that the value in the PC register is the expected value, and will be ignoring the
cases where PC W is set to off, as the writing capabilities should have been tested at a previous
layer.
53
Control Starting State Result
The ALU Subsystem
This subsystem consists of an ALU, a mux for ALU input A, and a mux for ALU input B. For the
purposes of this test, the values of the IA, RA, and DA register as well as the instruction will be
fed in on wires.
We are asserting that the value of ALUOut is the expected value.
Note: Instead of explicitly stating the ALUCtrl value, we list the operations we expect to occur at
each permutation of ALU A. This is both for the sake of brevity and maintainability.
54
Control Starting State Result
ALU A ALU B ALU OP DA IA RA Inst ALU Out
We are asserting that the value read out of memory is the expected value. Memory reading is
always set to on, and memory writing is always set to off.
55
Control Starting State Result
0 1 0x0020 0x0100
1 1 0x0020 0x0900
0 0 0x0020 0x0400
1 0 0x0020 0x0400
In previous steps we tested the connections from the IA, RA, and DA registers and inst into the
ALU, so we will be ignoring those tests for this step. We will also be ignoring the memory unit, as
its functionality should have already been tested. Instead we will be monitoring the writing back
to the IA, RA, and DA registers. As such, the write control signals for all registers will be set to on
by default.
56
We will be asserting that the end state of the IA, RA, and DA registers is as expected. This will be
checked by eye or with a very careful script, so the table below is merely for reference.
Control Starting State Ending State
IASRC DASRC ALU OP IA DA RA IA DA RA
00 0x1540 0x2471 0x0254 ALU Out ALU Out Mem Out
add,
01 0x1540 0x2471 0x0254 ALU Out 0x2471 Mem Out
sub,
00
shift,
10 0x1540 0x2471 0x0254 ALU Out 0x1540 Mem Out
pass
11 0x1540 0x2471 0x0254 ALU Out Mem Out Mem Out
00 0x1540 0x2471 0x0254 Mem Out ALU Out Mem Out
add,
01 sub, 0x1540 0x2471 0x0254 Mem Out 0x2471 Mem Out
and,
01
or,
10 shift, 0x1540 0x2471 0x0254 Mem Out 0x1540 Mem Out
pass
11 0x1540 0x2471 0x0254 Mem Out Mem Out Mem Out
57
58
59
caused quite a bit of kerfuffle. Eventually we decided on putting the majority of the procedure
responsibilities on the caller, and passing arguments and return values on the stack. This allows
us to maintain an accumulator-style low register count, while expanding on the capabilities of
processors like the PIC.
We also decided how we would handle storing local values. Our options were to store them on
the stack and create commands that let you access more than just the top word, or put them
into memory and force the programmer to back up what they wanted to persist before handing
over the reigns to other procedures. We chose the latter, both because it preserved the integrity
and concept of the stack, and also because we would then have to create instructions that
performed operations on elements of the stack. As we already have instructions that perform
operations on registers or memory, adding a third variation of an instruction would both
increase our opcodes and add unnecessary complexity.
Sometime between now and our next meeting (in approximately 4 hours time):
● Ian will examine the instructions we’ve laid out and will try to condense them into
opcodes/instruction types
● Evë will update the design process journal
● Thad will translate the example program into our assembly and determine if our new
instruction set is feasible
● David, Matthew, and Evë will write some code snippets for the ‘Common Operations’
section
61
Milestone 2
Impromptu Meeting Thursday, January 11
Members Present: Thad, Evë Matthew, Ian
At this meeting we debriefed the design meeting we just had with Micah.
We discussed adding a register to store carry, overflow, and perhaps negative flags. Though
having a status register ensures that the data persists, we decided to instead very carefully set
up the datapath to render the register irrelevant. This is because we do not want to support a
scenario in which we let the programmer set any of the flags or view them for an extended
period of time. In the event we decide to handle interrupts, we will revisit this decision.
An interesting idea that came up in the meeting with Micah was to have a chunk of memory
dedicated to local values. This solves a lot of our problems with programming recursion and
preserves the integrity of our design, so we decided to move forward with that in mind.
Though we discussed condensing opcodes, we would like to wait to do some analysis on which
instructions get used at what frequency so that we can do proper encoding using a system like
Huffman coding.
We then briefly discussed the idea of replacing ret with jr (jump register), before Thad threw out
the ridiculous idea of creating an instruction that pops the stack directly into PC. Though this
could have been a productive discussion we quickly moved on.
At this point we noticed that we’d effectively sucked ourselves into making a multi-cycle
processor. Though not intended, the 4 of us are on board with the idea.
Work completed during the meeting:
● Evë: Updated process journal
● Ian: Updated design document with opcodes and instruction types
● Thad: Updated example program
● Matthew: Updated design document with opcodes and instruction types
62
63
Milestone 3
Meeting Sunday, January 21
Members Present: Matthew, Thad, Evë, Ian
At this meeting we recapped our Friday discussion with Micah, as well as delegated lab work.
Since we will not be using interrupts for our games, we chose to do labs 6 and 7. David will work
on lab 6, and Matthew will work on lab 7.
One of the biggest concerns from the meeting were our RTL tables. We spent some time
condensing and refactoring our tables.
We also drew out our initial datapath. This was done on a whiteboard, so after this meeting
someone will have to reconstitute it in software like Visio. We are currently in a race to see who
can download it the fastest. As we are all in F217, the prospects do not look good for any of us.
Ian may have to restart his computer.
A key decision we made in the datapath was to have the RA, IA, and DA registers mux into an
ALU input. We decided to go this route because there is already a mux on the B input, so adding
one to the A input does not change cycle time.
As a very important side note, Evë won the Visio installation battle.
Work to complete before next meeting:
● Matthew has the honor of copying the diagram into Visio
● David will work on Lab 6
● Thad will update the RTL and describe the control signals
● Evë will start the integration plan
● Ian will design the unit tests and update the component list
64
Milestone 4
Meeting Thirstday, January 25
Members Present: Matthew, Ian, Evë, Thad, David
In addition to assignments to complete Labs 6&7….
Evë: Optimize opcodes, make memory unit testbench
Thad: Fix the RTL again & purge document of intermediate registers, fix Micah comments, make
Mux testbenches.
Matt: Get rid of intermediate registers from the datapath diagram. Start putting together ALU
control and ALU control codes (if those exist at all).
Ian: Make register testbench
David: Make ALU testbench
65
Milestone 5
67