Speed Lake Processor

Speed Lake
CSSE 232 Processor Project

Team 2A
Thaddeus Hughes, Evë Maquelin, Matthew Howlett, Ian Sheffert, David Li

1
Table of Contents

Changelog 4
Executive Summary 6
Our Processor 6
Design 6
Instructions and Control 6
Multi-Cycle Design 7
Memory and the Stack 7
Procedure Calls 8
Implementation 8
Testing 9
Compiler and Assembler 10
Results 11
Conclusion 12
A: Software Specifications 13
Available Registers 13

General Purpose Registers 13
Restricted Purpose Registers 13
Procedure Call Convention 14
Machine Language Instruction Types 15

Inherent (N) Type Instructions 15
Immediate (I) Type Instructions 15
Branch (B) Type Instructions 15
Instruction Semantics 16

Arithmetic and Logical Instructions 16
Branch/Jump Instructions 18
I/O Manipulation Instructions 19
Memory Manipulation Instructions 19
Stack Instructions 20
Example Program 21

Assembly Program 21
Assembled Machine Code 23
Common Operations 25

Adding Numbers in Memory 25

2
Loop Through an Array in Memory 25

Loading an Address into the IA Register 26
Conditional Statements 27
Reading from/Writing to a Display Register 28
B: Hardware Specifications 29
Register Transfer Language 29

Arithmetic 29
Memory Instructions 29
Jumps/Branches 30
LCD / Buttons 31
Stack Instructions 31
Component Specification 32
Datapath Schematic 34
Control Signals 35
Control Unit 36

Fetch Stage 36
Opcode Breakdown 37
Inherent Opcodes 37
Immediate Opcodes 38
Branch/Jump Opcodes 38
Read Stage 39
Write Stage 40
C: Testing and Integration 41
RTL Testing 41

Procedure 41
RTL Markup 41
Unit Test Plan 45

Unit Testing Procedure: 46
Tables for Unit Testing 46
Muxes 46
Registers 48
ALUs 49
Other Components 51
Memory Unit 51
Instruction Memory 51

3
Integration Plan 52

Step 1: Small Subsystems 52
Step 2: Registers, the ALU, and Program Memory 55
Step 3: The Big Kahuna 56
D: Design Process Journal 58
Milestone 1 58
Meeting Monday, January 8 58
First Meeting Wednesday, January 10 59
Second Meeting Wednesday, January 10 59
Milestone 2 60
Impromptu Meeting Thursday, January 11 60
Meeting Friday, January 12 61
Meeting Wednesday, January 17 61
Milestone 3 62
Meeting Sunday, January 21 62
Meeting Tuesday, January 23 62
Milestone 4 63
Meeting Thirstday, January 25 63
Meeting Sunday, January 28 63
Milestone 5 64
Meeting Monday During Class 2/5/2018 64
Meeting Monday Evening 2/5/2018 64
Meeting Tuesday During Class 2/6/2018 64
Meeting Wednesday During/After Class 2/7/2018 64
Meeting Monday During Class 2/12/2018 64
Meeting Tuesday During Class 2/13/2018 65
Meeting Wednesday During and After Class 2/14/2018 65
Meeting Thursday After Scheduled Meeting 2/15/2018 65
Meeting Wednesday 2/21/2018 66

4
Changelog

Version Date Description
1.0 January 10, 2018 Initial version of the document created for Milestone 1
1.1 January 17, 2018 Milestone 2 Updates:

● Procedure calling pattern was altered to accommodate
use of memory for local data storage
● The instructions cya and rya were converted to
pseudo-instructions, and the instructions pushia, popia,
pushra, and popra were added to accomplish this
● The jump and branch instruction types were
consolidated into one
● More branch instructions were added to branch off of
the sign of the value in the DA register
● Example program was updated to reflect changes in the
instruction set

Milestone 2 Additions:
● RTL for each instruction
● List of necessary components
● Testing procedure and results of RTL

● Updated and condensed RTL tables
● No more IA ALU. Everything goes through one ALU.
● Removed MIPS from the document
● Document rearranged into logical sections
● Updated components

Milestone 3 Additions:
● Unit Test Plan
● Integration Plan
● Control signal descriptions
● Datapath schematic

● No more passing B through ALU. IA is also passed
through ALU to get to DA, rather than being directly
wired up. Instead, ALU port B is wired to input mux of
DA.
● Addition of instructions sllm, srlm, subm
● Changing opcodes
● Inherent types now have a 4-bit shamt used for shifting

5
1.3 February 7, 2018 Milestone 5 Updates:

● RTL updated to match control design
● Instruction type diagrams were updated to reflect
opcode structure
● Control signal table was updated for readability
● Editing documentation

6
Executive Summary
We have designed a simple accumulator-style processor, which runs on an FPGA board. The
processor provides basic support for the LCD and buttons on the board, and a compiler /
assembler were created in order to make programming easier.

In this document we will discuss the instruction set, implementation, testing, and final
performance results of the processor.
Our Processor
We chose to implement an multi-cycle accumulator-style processor with a stack. We designed
ours with only one working register, the DA register, and an indirect addressing (IA) register to
store memory addresses.

Though some of us had prior experience working with accumulators, there was still a lot left to
learn about how they worked. Also, the addition of a stack seemed like an easy improvement
that would be quite valuable to the programmer. Later, our choice of style turned out to be quite
convenient as the lack of arguments on most instructions left room for a lot of opcodes and
therefore a wide range of instructions. This led to the creation of an instruction set that not only
computed relative primes, but could handle general computation with ease.
Design
Over the course of the project we were forced to make a lot interesting but difficult design
decisions. Though most of our decisions were made with processor efficiency and elegance as
the priority, such as the design of our control unit and the ALU design, some were made with
ease of implementation as the priority, such as choosing to keep our cycle times and cycles per
instruction constant.
Instructions and Control

Our instructions consist of basic arithmetic, logical operations, stack operations, branches,
function calls, memory operations, and i/o operations designed for buttons, switches, and the
FPGA board LCD screen. These are broken down into three types: inherent, immediate, and
branch instructions.

Of the 44 total instructions, 27 are truly inherent, meaning they take no arguments. This is where
we focused the majority of our control unit optimization. Among these are the arithmetic
operations, such as addition and subtraction. Though we have immediate instructions that do
these operations with constant values, we wanted an easy way to interface with values in the

7
accumulator and values stored in memory. By putting the memory address of a value into the IA
register, instructions like add/addm can use both the value in the DA register and and the value
in memory and store the result accordingly. Add stores the value into DA and addm stores the
value back into memory at IA. This pattern is repeated across many of our instructions.

Due to the nature and size of our instruction set, we thought the conventional or expected
method of designing our control unit, a finite state machine that takes whole instructions as
inputs, would be too large and inefficient. Instead of creating cases for each instruction or group
of instructions, we created cases for each control signal and instruction type. Since most
control signals have default states for each instruction cycle, this brought our cases per cycle
down from a potential 44 to an average of 9. Though not efficient to design or even to
implement, this was far and away the best decision for hardware efficiency.
Multi-Cycle Design
Of all the parts of our processor, cycle design is where we took the most shortcuts and made
the most compromises. Each instruction has three equal-length cycles: Fetch, Memory Read,
and Memory Write. Though it was easiest to implement, we missed out on a major optimization
opportunity. Not all instructions make use of the Memory Read and Memory Write stages, so we
could have modified our control unit to skip those stages for certain instructions. Doing this,
however, would have involved making significant changes to our control unit, which was
infeasible given the current time constraints, as well as our datapath. Ultimately, we opted to
leave it as is in the hopes of implementing some sort of pipeline, but that never occured.
Memory and the Stack

Since the omission of temporary registers makes operations on local variables a bit more
challenging, we recognized the need for a simple and robust memory system. The programmer
has two options for working with local values, the stack and program memory.

The stack follows the traditional format of pushing values to the top of the stack, and popping
them off as necessary. We did, however, omit the typical peek instruction. When we were paring
down our instruction set we realized that peeking was not necessary for any of the integral
operations of our processor, such as making function calls or working with local data. In the
event that someone really needs to keep the value they’ve looked at on the stack, they can
immediately push it back at the cost of a single instruction.

Program memory provides a perhaps more intuitive way to store local variables. In combination
with our assembler, the programmer can load addresses by name into the IA register and either
manipulate them in the DA register, or use instructions that store results to memory. This is
where our instruction design choices shine most, as this provides huge potential program
optimization. Due to the way our cycles are designed, instructions that involve reading and
writing to memory don’t take any longer than instructions that only involve registers, so this cuts

8
down the expected process of loading a value, manipulating it in DA, and storing back to
memory to a single instruction.

As they are housed in the same memory unit, there is the potential for the stack and program
memory to collide in nasty ways. The stack builds up from the bottom of memory, whereas
program memory is indexed from the top. If a program makes too deep of a recursive call, it’s
possible they will begin to overlap. This could be mitigated by increasing the size of the memory
unit, but was not an issue we ran into in any of our testing.
Procedure Calls
Though most of the details surrounding procedure calls can be found in Appendix A, the design
of our calling conventions was the subject of heated debate for several days. As such, we felt it
deserved special mention.

Unlike many other processor designs, our processor does not have dedicated registers for
procedure arguments and return values. Instead these values are stored on the stack and the
responsibility of preserving data is left to the caller. As these conventions are not
hardware-enforced, it is imperative that the programmer follow them carefully.
Implementation
We began datapath design by whiteboarding out the individual components we knew we would
need and beginning to connect them by going through the list of instructions and their
associated RTL to make sure they were supported. This was a somewhat iterative process as
we found ways to shrink muxes, reduce stages of logic, and found ways to make components
serve multiple purposes (i.e. we originally had an ALU dedicated for the IA register, but
determined we could use the primary ALU for the same purpose).

9

Overall, the design focuses on reducing the hardware footprint as much as possible in order to
speed up cycles.

Our Xilinx model is almost entirely written in Verilog. This is because we all preferred to read
code rather than schematics, and some of us had significant prior experience with Verilog and
only one team member had experience with VHDL. We implemented the processor in modules
which matched the initial integration tests: decoder, registers, ALU, (program) memory, control,
and LCD driving. This made debugging and splitting up work easier in the long term, we believe,
although it did make for finding problems and incorrect links difficult at times.
Testing
We began with simple unit tests for all individual components (i.e. general purpose registers,
adders, muxes). In each of these, we choose the component to test, determine the expected
inputs, control signals, and outputs, build the testbench, run it, and check with tables in the
testbench to determine validity. On most individual components, there were no major changes
to be made in order for components to work.

After unit tests, we integrated some of these into small segments of the processor.

The PC Subsystem consists of the program counter register, incrementer, return address
register, the necessary muxes which hook these together, and the input control signals. Testing
for this system proved rather straightforward with no major changes necessary.

The ALU subsystem consists of the ALU, muxes into it, and necessary control signals. Testing
of this system proved very straightforward with no changes necessary.

10

The Stack/Program Memory Subsystem consists of the SP register, IA register, memory, and
necessary muxes. Testing of this proved very straightforward with no changes necessary.

The Control testbench tests, as expected, the control unit. This was the second-most daunting
part of the whole processor and testing, and in our testing we had to do countless small fixes to
make things work as expected, and also had to refer back to the testbench when we had more
fundamental issues with our processor (such as an ALU output latch).

The Processor core consists of the necessary components to execute a given instruction
(everything but PC subsystem and instruction memory). We found that we would need a register
to serve as a latch on the output of the ALU.

At this point, we were ready to test the entire processor, which gave us some headaches. We
found that our branch control was not working. This was because we were expecting the wrong
outputs in the PC test (we needed to branch to PC | imm, not PC+1 | imm). We were using the
assembler at this point, and found that there were some bugs within it that caused unexpected
program behavior. After this was resolved, we battled glitches with our assembly before finally
making Euclid’s algorithm work.
Compiler and Assembler

When we originally looked at the relative prime algorithm we were given, our first thought was
that it was going to be abysmal writing an assembly version. As such, we decided to build a
compiler to help. The completed compiler is a Python script that takes in basic C code and
supports the following constructs:

- Method declarations with any number of arguments
- Local variable declarations
- Local variable assignments
- Addition and subtraction on local variables
- While loops
- Control flow such as if/else if/else blocks
- Method calls

As the goal of this class was not to make a fully-functional compiler, there are a few guidelines
to follow when writing code for the compiler:

- Keep assembly in mind. Arithmetic is only supported where a local variable is
manipulated by a constant or a variable, eg. a = a + 4, or a = a + b.
- All logic must be inside of a method body. You can specify an assembly header that sets
up arguments and calls the method inside of the build file.

11
- Conditions must be as simple as possible, eg. if (i == 5) or if (a < b) as opposed to if (i - 4

< foo(j)). Local variables can be used to store values ahead of the condition.
- Comments must be single line comments and on their own lines.

At a high level, the compiler works by first cleaning the program, parsing the program into an
abstract syntax tree (AST), and converting the tree to assembly. More details on the steps is
below:

- Cleaning: This step removes all unnecessary characters, including newlines, comments,
and miscellaneous spaces.
- Parsing: Perhaps the most challenging step in the compiling process, this step turns the
cleaned program into abstract syntax. Using a stack, the parser works from the
innermost code blocks outwards, turning individual lines of code into various
components and composites of components. A full list can be found in syntax.py, but
these range from arithmetic blocks to method calls to variable declarations to
conditional statements. At the end of the parsing step, the entire program has been
turned into a tree of syntactical components
- Converting to Assembly: This step walks through the tree, compiling generalized
assembly from each component into the final program. Syntax2.py contains a series of
optimizations made to this step. Instead of blindly outputting repetitive, inefficient, and
generalized code, the compiler tries to simulate the program and cut out instructions
that would ultimately have no effect. The optimizations are unfortunately still under
development, though syntax.py is fully functional.

To help transfer our assembly to the processor, we also developed an assembler which has
support for:
- Labels
- Defined constants
- Tabs and spaces
- Comments
- Hex and decimal immediates
- Pseudoinstructions (currently, just unpacking li and la for large immediates)

Running the compiler invokes the assembler as well, so it outputs an assembly file, a text file
with commented machine code, and .COE files for easy instruction memory generation.
Results
We were able to get the model implemented on the FPGA board- this included running a simple
program which would take 8-bit inputs separated by 2 seconds and merging them into a 16-bit
input, running relPrime, and then displaying the result to the FPGA screen.

12
We wrote a hand-optimized assembly program and compared it to compiler output. We found

the following performance specifications:
Optimized Code Unoptimized Code
(Compiled from C)
# of 16 bit instructions to store euclid’s and 60 97

relPrime (including retrieving input)
# of 16 bit memory addresses 4 7
# of 16 bit instructions executed with 0x13B0 92092 163496
# of cycles to execute with 0x13B0 276276 490488
Average cycles per instruction 3
Cycle time for design 6.366 ns
Clock speed 157 MHz

Conclusion
Creating a processor is quite difficult! We learned that there are many compromises to be made
in a design, and having gone through one iteration of a design helps you to make better
compromises. After doing one pass through, there are many changes we would individually like
to see happen, and some that we can all agree would be significant improvements, such as
leveraging multi-cycle to its fullest and shortening the length of some instructions.

13
A: Software Specifications
Available Registers
General Purpose Registers

Default Accumulating (DA) Register
This is the primary register available to the programmer. It serves as a way to accumulate
values and is typically the first argument in any operation.When used properly, its contents are
preserved across procedure calls.

Indirect Addressing (IA) Register
This is a secondary register available to the programmer, which points to which portion of
memory to read/write from. It serves as a way to accumulate values and parameterize
instructions. When used properly, its contents are preserved across procedure calls.
Restricted Purpose Registers

Program Counter (PC) Register
This register stores the memory address for the current instruction. It cannot be altered by the
programmer, but branching at jumping instructions alter it as part of their behavior. It is also
incremented appropriately at the end of every completed instruction.

Return Address (RA) Register
This register stores the memory address of the instruction to return to upon executing the ret
instruction. It cannot be directly altered by the programmer, but RA is altered appropriately by
instructions p ushra, popra, call . When used properly, its contents are preserved across
procedure calls.

Stack Pointer (SP) Register
This register stores the memory address for the top item on the stack. It cannot directly be
altered by the programmer, but instructions such as pushand popmay alter it as part of their
behavior.

14
Procedure Call Convention

Calling a Procedure (Being the Caller)
Before jumping or branching to a procedure, you must first back up any local memory you want
to ensure persists across the procedure call. This is to be done on the stack. Next, you must
back up the critical registers RA, IA, and DA (using pushra, pushia, and push).

After backing up the registers you may then push whatever arguments the procedure requires to
the stack. You are now ready to call the procedure.

Upon returning from the procedure any return values will be present at the top of the stack.
From here, you can pop off these values. After this, be sure to restore critical registers RA, IA, DA
(using popra, popia, pop).

Defining a Procedure (Being the Callee)
As a callee your responsibilities are much less stringent. The caller expects the return values to
be at the top of the stack so your only responsibility is to clear any passed arguments and push
the expected return values to the stack. You may then return using an instruction such as ret .

In the event that you push any local values to the stack, you must remove them before pushing
your return values.

15
Machine Language Instruction Types
Inherent (N) Type Instructions

These instructions are majority opcode, with a small 4-bit immediate (shamt) for shift
operations.

0 0 grp (2) aluop (3) op (5) shamt (4)

prefix lways 00
: A
grp : Instruction subgroup
aluop : A LU opcode
op : Operation type
shamt : Shift amount
Immediate (I) Type Instructions

These instructions contain an 8-bit immediate value.

0 1 alu (2) op (4) imm (8)

prefix lways 01
: A
alu irst two bits of ALU opcode (the last bit is assumed to be 0)
: F
op : Operation type
imm he immediate value
: T
Branch (B) Type Instructions

These instructions take a 12-bit label which points to an address in memory.

1 op (3) imm (12)

prefix lways 1
: A
op :
Type of branch
imm
: The immediate to be used by the branch instruction. Varies by instruction.

16
Instruction Semantics
Arithmetic and Logical Instructions

Addition to DA
add N (00) 00 0000 1110 XXXX
Performs addition on the value stored at memory as described by the IA register and the
contents of DA register. The result is put into DA register.

Addition to Memory
addm N (00) 00 0000 1101 XXXX
Performs addition on the value stored at memory as described by the IA register and the
contents of DA register. The result is put into memory at the location described by the IA
register.

Addition with Immediate
addi imm I (01) 00 1100 iiii iiii
Performs signed addition on the immediate value and DA register. The result is put into
DA register.

AND to DA
and N (00) 00 1100 1110 XXXX
Performs a logical AND on the value stored at memory as described by the IA register
and the contents of DA register. The result is put into DA register.

AND to Memory
andm N (00) 00 1100 1101 XXXX
Performs a logical AND on the value stored at memory as described by the IA register
and the contents of DA register. The result is put into memory at the location described
by the IA register.

AND with Immediate
andi imm I (01) 11 1010 iiii iiii
Put the result of a logical AND between DA register and the immediate value into DA
register.

OR to DA
or N (00) 00 1000 1110 XXXX
Performs a logical OR on the value stored at memory as described by the IA register and
the contents of DA register. The result is put into DA register.

17
OR to Memory
orm N (00) 00 1000 1101 XXXX
Performs a logical OR on the value stored at memory as described by the IA register and
the contents of DA register. The result is put into memory at the location described by
the IA register.

OR with Immediate
ori imm I (01) 10 1010 iiii iiii
Put the result of a logical OR between DA register and the immediate value into DA
register

Shift Left Logical
sll shamt N (00) 00 0110 0010 shmt
Shift the value in DA register left by shamt.

Shift Right Logical
srl shamt N (00) 00 1010 0010 shmt
Shift the value in DA register right by shamt.

Shift Left Logical to Memory
sllm shamt N (00) 00 0110 0101 shmt
Shift the value in memory at IA left by shamt.

Shift Right Logical to Memory
srlm shamt N (00) 00 1010 0101 shmt
Shift the value in memory at IA right by shamt.

Subtraction to DA
sub N (00) 00 0010 1110 XXXX
Subtracts the value stored at memory as described by the IA register from the contents
of DA register. The result is put into DA register.

Subtraction to Memory
subm N (00) 00 0010 1101 shmt
Subtracts the value stored at memory as described by the IA register from the contents
of DA register. The result is put into memory at the location described by the IA register.

Load Upper Immediate
lui imm I (01) 00 0000 iiii iiii
Load the immediate value specified into the upper half of DA register.

Load Immediate
li imm I (01) 00 0100 iiii iiii
Load the immediate value specified into the lower half of DA register. Sign extended.

18

Two's Complement
two N (00) 00 0100 0010 XXXX
Take the two’s complement of DA register. The result is also put in DA register.

Branch/Jump Instructions
Branch if Not Equal To 0
bnez label B (
1) 111 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register does not contain the
value 0.

Branch if Equal To 0
bez label B (1) 110 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register contains the value 0.

Branch if No Carry
bnc label B (1) 101 LLLL LLLL LLLL
Conditionally jump to the address specified by label if the carry bit is set to 0 from the
previous operation.

Branch if Carry
bc label B (1) 100 LLLL LLLL LLLL
Conditionally jump to the address specified by label if the carry bit is set to 1 from the
previous operation.

Branch if Positive
bp label B (1) 011 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register’s first bit is 0.

Branch if Negative
bn label B (1) 010 LLLL LLLL LLLL
Conditionally jump to the address specified by label if DA register’s first bit is 1.

Jump
j label B (1) 000 LLLL LLLL LLLL
Jumps to the address specified by label.

Call
call label B (1) 001 LLLL LLLL LLLL
Jumps to the address specified by label after storing the contents of the PC register
(incremented by 1) in the RA register.

19
Return
ret N (00) 10 1111 0001 XXXX
Jumps to the address stored in the RA register.

I/O Manipulation Instructions

LCD Write
lcdw N (00) 11 0000 XXXX XXXX
Writes the current value in the DA register to the DP register.

LCD Move Cursor
lcdmc imm I (00) 11 0100 iiii iiii
Moves the LCD cursor to a specified location on the LCD display.

LCD Move Cursor to DA
lcdmcda (00) 11 0100 1111 1111
Moves the LCD cursor to the location in DA.

LCD Clear
lcdclr N (00) 11 1000 XXXX XXXX
Clears the LCD screen and resets the cursor position to the start.

Read Buttons and Switches
buttr N (00) 10 1110 0011 XXXX
Reads the button and switch inputs from the BS register in the order
S0,S1,S2,S3,B0,B1,B2,B3 to the DA register.
Memory Manipulation Instructions

Load Word
lw N (00) 00 1111 0110 XXXX
Load the contents of the memory pointed by the IA register into DA register.

Store Word
sw N (00) 10 1110 0101 XXXX
Store the contents of DA register at the address pointed by the IA register.

Load Indirect Addressing
lia N (00) 10 1110 0010 1XXX
Loads the value of the IA register into DA register.

20
Store to Indirect Addressing

sia N (00) 10 1110 1001 1XXX
Stores the value of the DA register into the IA register.

Load Address
la imm I (01) 00 0011 iiii iiii
Load the immediate value specified into the lower half of the IA register. Zero-extended.

OR Upper Address
oua imm I (01) 10 0001 iiii iiii
Load the immediate value specified into the upper half of the IA register. Zero extended.

Point to Adjacent Memory
iap imm I (01) 00 0101 iiii iiii
Increments (or decrements) the IA register by the specified value.
Stack Instructions
Push to Stack
push N (00) 01 1110 0100 0XXX
Pushes the contents of DA register to the stack. Decrements the SP register.

Pop off Stack
pop N (00) 01 1111 0010 0XXX
Loads the word at the top of the stack into DA register. Increments the SP register.

Push RA to Stack
pushra N (00) 01 1110 1000 0XXX
Pushes the contents of RA register to the stack. Decrements the SP register.

Pop RA off Stack
popra N (00) 01 1111 1000 1XXX
Loads the word at the top of the stack into RA register. Increments the SP register.

Push IA to Stack
pushia N (00) 01 1110 0000 0XXX
Pushes the contents of DA register to the stack. Decrements the SP register.

Pop IA off Stack
popia N (00) 01 1111 0001 0XXX
Loads the word at the top of the stack into DA register. Increments the SP register.

21
Example Program
The following program is an example of programming in our processor’s assembly language. It
finds the relative primes of some input N.
Assembly Program
relPrime:
# Fetch argument n
la 0 # Load address of variable N into IA
pop # Pop off the last argument from the stack
sw # Store DA into mem[IA]

# Create variable M
la 1 # Load address of variable M into IA
li 2 # Load value 2 into DA
sw # Store DA into mem[IA]

relPrime_loop:
pushra # Backup critical registers (DA, IA, PC)
pushia
push

# Setup up N and M as arguments for gcd
la 0 # Load address of variable N into IA
lw # Load mem[IA] into DA
push # Push DA onto the stack (put N on as an argument)

# Repeat for M
la 1
lw
push

# Call the GCD function
call gcd

# Get the return values
pop # put return value into DA
addi -1 # subtract 1
# if the result is zero (return value == 1), we're done
bez relPrime_done

pop
popia
popra # Restore critical registers (DA, IA, PC)

22

li 1 # load the immediate 1 into DA
addm # mem[IA] = DA + mem[IA] (recall! IA points to M
after the rya)

relPrime_done:
lw # NOTE: M is already in IA at this point
push # Push DA onto stack
ret # Go back to where we came from

gcd:
# fetch argument B
la 1
pop
sw

# fetch argument A
la 0
pop
sw

bnez gcd_nonzero # DA contains a, check if nonzero
la 1 # IA = addr of B
lw # DA = mem[IA] = B
push # Push DA onto stack (put B on stack)
gcd_nonzero:

gcd_loop:
lw # DA = mem[IA] = B

# if B == 0 then we're done
bez gcd_done
# a = a-b
la 0 # IA = addr of A
lw # DA = mem[IA] = A
sub # DA = DA-mem[IA] = A-B
sw # A = DA

# skip over the next case
j gcd_casedone

gcd_case2:
# b = b-a. Same as above

23
la 1
lw
la 0
sub
la 1
sw

gcd_casedone:
j gcd_loop # go to start of loop

gcd_done: # It's over!
lw # DA = mem[IA] = A
push # push DA (A) onto stack
Assembled Machine Code

0100101100000000 # la 0
0001111100100000 # pop
0010111001010000 # sw
0100101100000001 # la 1
0101010000000010 # li 2
0010111001010000 # sw
0001111010001000 # pushra
0001111000010000 # pushia
0001111001100000 # push
0100101100000000 # la 0
0000111101100000 # lw
0001111001100000 # push
0100101100000001 # la 1
0000111101100000 # lw
0001111001100000 # push
1001000000011110 # call gcd
0001111100100000 # pop
0100110011111111 # addi -1
1111000000011010 # bez relPrime_done
0001111100100000 # pop
0001111100010000 # popia
0001111110001000 # popra
0101010000000001 # li 1
0000000011010000 # addm
0000111101100000 # lw
0001111001100000 # push
0010111100010000 # ret
0100101100000001 # la 1
0001111100100000 # pop

24
0010111001010000 # sw
0100101100000000 # la 0
0001111100100000 # pop
0010111001010000 # sw
1110000000101010 # bnez gcd_nonzero
0100101100000001 # la 1
0000111101100000 # lw
0001111001100000 # push
0010111100010000 # ret
0100101100000001 # la 1
0000111101100000 # lw
1111000000111111 # bez gcd_done
0100101100000000 # la 0
0000111101100000 # lw
0100101100000001 # la 1
0000001011100000 # sub
0100101100000000 # la 0
0010111001010000 # sw
1000000000111101 # j gcd_casedone
0100101100000001 # la 1
0000111101100000 # lw
0100101100000000 # la 0
0000001011100000 # sub
0100101100000001 # la 1
0010111001010000 # sw
1000000000101011 # j gcd_loop
0100101100000000 # la 0
0000111101100000 # lw
0001111001100000 # push
0010111100010000 # ret

25
Common Operations
Adding Numbers in Memory

This assembly snippet adds 3 numbers that are stored in memory and places the result in the
next spot in memory. For our purposes, we assume the address of the first number is already in
the IA register.

Assembly

andi 0 # clear the value in DA
add # add the first number to DA
iap 16 # move IA by 1 word (16 bits) to the next number
add # add the second number to DA
iap 16 # move IA to the next number
add # add the third number to DA
iap 16 # move IA to the next word in memory
sw # store DA

Machine Code

0100 01XX 0000 0000
0000 000X XXXX XXXX
0110 01XX 0001 0000
0000 000X XXXX XXXX
0110 01XX 0001 0000
0000 000X XXXX XXXX
0110 01XX 0001 0000
0001 010X XXXX XXXX
Loop Through an Array in Memory

This assembly snippet loops through memory. For our purposes we will assume the starting
address is already in the IA register and the length of the array is 10.

Assembly

li 10 # set DA to be 10 (array length)
loop:
push # push the value of DA (array index) to the stack
lw # load the array element

# do whatever you desire with the array element

26

pop # restore the index to DA
addi -1 # decrement the index
iap 16 # increment the position of IA by one word
bnez loop # check if we have traversed the whole array

# continue the program here

Machine Code
Note: loop is found at address 0x111

0101 10XX 0000 1010
0010 010X XXXX XXXX
0001 001X XXXX XXXX
0010 011X XXXX XXXX
0100 00XX 1111 1111
0110 01XX 0001 0000
1000 0001 0001 0001
Loading an Address into the IA Register

Assembly
For small addresses (load 0x50):

la 0x50 # load address 0x50 into the IA register

For larger addresses (load 0x1234):

la 0x34 # load address 0x34 into the IA register
oua 0x12 # or the value 0x1200 with the IA register, store in IA
register (result: 0x1234)

Machine Code
For small addresses (load 0x50):

0110 0000 0101 0000

For larger addresses(load 0x1234):

0110 0000 0011 0100
0101 1100 0001 0010

27
Conditional Statements
Assembly
Skip over code if memory at label A equals memory at label B

la A
lw
la B
sub
bez equal
# Put code to execute if A!=B

equal:

Skip over code if memory at label A is less than memory at label B

la A
lw
la B
sub # DA = A-B
bnc gt
# Put code to execute if A <= B
gt:

Machine Code
Skip over code if memory at label A equals memory at label B

#for reference
A address = 0 000 0001
B address = 0 000 0010
equal address = 1 010 1010 1010
gt address = 0101 0101 0101
#############################

0110 00XX 0000 0001
0001 001X XXXX XXXX
0110 00XX 0000 0010
0000 110X XXXX XXXX
1001 1010 1010 1010
# Put code to execute if A!=B
0000 1010 1010 1010:

Skip over code if memory at label A is less than memory at label B

0110 00XX 0000 0001
0001 001X XXXX XXXX

28
0110 00XX 0000 0010

0000 110X XXXX XXXX # DA = A-B
1010 0101 0101 0101 # Put code to execute if A <= B
0000 0101 0101 0101
:

Reading from/Writing to a Display Register

These are examples of writing and reading to the DP register.
Assembly
LCDWriting:
li 5 #loads 5 into the DA register
lcdmc 0 #moves the LCD cursor to position 0
lcdw #writes the DA register (5) into the DP register

LCDReading:
lcdr #reads value to the DA register
la A #loads address A
sub #LCD value - A
bnc gt #branches if last operation carries

gt:

Machine Code

0101 1000 0000 0101
0110 1000 0000 0000
0001 1010 0000 0000

0001 1100 0000 0000
0110 0000 0000 1111
0000 1100 0000 0000
1010 0000 0000 1000

29
B: Hardware Specifications
Register Transfer Language
Arithmetic
A/L to memory A/L to DA A/L with imm to DA
inst = mem[PC]
Fetch
newPC = PC+1
Instruction
ALUOut = DA op
Stage 1 ALUOut = DA op Mem[IA]
SE/ZE/ZEu(inst[7:0])
Mem[IA] = ALUOut DA = ALUOut

Stage 2 PC = newPC PC = newPC
Memory Instructions
lw sw
inst = mem[PC]
Fetch newPC = PC+1
PC = newPC
DA = mem[IA] mem[IA] = DA


30

iap la oua sia lia
inst = mem[PC]
Fetch newPC = PC+1
PC = newPC
ALUOut = IA + ALUOut = ALUOut = IA OR

Stage 1 SE(inst[7:0]) ZE(inst[7:0]) ZEU(inst[7:0])
ALUOut = DA ALUOut = IA
IA = ALUOut DA = ALUOut
Jumps/Branches

ret Branch Jump
Call
inst = mem[PC]
Fetch newPC = PC+1

if <flag>
PC = PC[15:12] || inst[11:0] RA = PC
PC = PC[15:12] ||
Stage 1 PC = RA
else inst[11:0]
PC = PC[15:12] ||
inst[11:0]
PC = newPC

31
LCD / Buttons
lcdwr lcdclr lcdmc buttr
inst = mem[PC]
Fetch newPC = PC+1
DP = DA Row = inst[5]

CLEAR = 1 Sa = inst[4:0] DA = BS
Stage 1 PC =
PC = newPC Movecursor = 1 PC = newPC
newPC PC = newPC
Stack Instructions
pop popia popra push pushia pushra
inst = mem[PC]
Fetch newPC = PC+1
MemOut = mem[SP] ALUOut = DA ALUOut = IA ALUOUT = RA

Stage 1 newSP=SP-1 newSP = SP+1 newSP = SP+1 newSP = SP+1
DA = mem[SP] IA = mem[SP] RA = mem[SP] mem[SP] = ALUOut

Stage 2 SP = newSP SP = newSP SP = newSP SP = newSP
PC = newPC PC = newPC PC = newPC PC = newPC

32
Component Specification
1. General Purpose Register
a. Input Signal(s): 16-bit regWrite
b. Output Signal(s): 16-bit regRead
c. Control Signal(s): 1-bit writeEnable
d. Description: The input signal is ignored unless the writeEnable control signal is
set to 1. If it is, then the contents of the register is overwritten with the contents of
the input signal. Regardless of the control signal, the output signal will always
reflect the contents of the register.
2. Program Memory
a. Input Signal(s): 16-bit Address, 16-bit WriteData
b. Output Signal(s): 16-bit ReadData
c. Control Signal(s): 1-bit MemRead, 1-bit MemWrite
d. Description: If MemWrite is 1, then the data on the WriteData input will be
written to the memory address on the Address input. If MemRead is 1, then the
data at the memory address on the Address input will be available on the
ReadData output.
3. 1:2 Mux
a. Input Signal(s): 16-bit in0, 16-bit in1
b. Output Signal(s): 16-bit out
c. Control Signal(s): 1-bit select
d. Description: Select in0 or in1 to be fed to out.
4. 2:4 Mux
a. Input Signal(s): 16-bit in0, 16-bit in1, 16-bit in2, 16-bit in3
c. Control Signal(s): 2-bit select
d. Description: Select one of the inputs to be fed to out.
5. Sign Extension Unit
a. Input Signal(s): 8-bit signal
b. Output Signal(s): 16-bit signal
c. Control Signal(s): None
d. Description: Sign extends the 8-bit input signal to 16-bits
e. Implements SE
6. Zero Extension Unit
d. Description: Zero extends the 8-bit input signal to 16-bits (zeros on MSB side)
e. Implements ZE

33
7. Zero Extension Upper Unit

d. Description: Zero extends the 8-bit input signal to 16-bits (zeros on LSB side)
e. Implements ZEU
8. ALU
a. Input Signal(s): 16-bit A, 16-bit B
b. Output Signal(s): 16-bit ALUResult, 1-bit Carry
c. Control Signal(s): 3-bit ALUOp
d. Description: Performs an operation selected by ALUOp on the A and B inputs,
making the result available on ALUOut (e.g., A op B = ALUOut). If this results in a
carry, then the carry output wire is 1. Otherwise, the carry output wire is 0.
e. Implements op
9. Incrementer
a. Input Signal(s): 16-bit in
c. Control Signal(s): 1-bit direction
d. Description: 16-bit adder that either increments or decrements based on
direction.
e. Implements newSP = SP+1 or SP-1, and PC increment.
10. Instruction Memory

a. Input Signal(s): 16-bit PC address
b. Output Signal(s): 16-bit Instruction
c. Control Signal(s): none
d. Description: Takes in the current address of the PC and outputs the instruction
there.
11. ALU Latch

a. Input Signal(s): 16-bit in
c. Control Signal(s): 1-bit write signal
d. Description: Holds the output of the ALU at the end of the READ stage so that it
can be used to write to the proper registers in the WRITE stage.

34
Datapath Schematic

35
Control Signals
Into Control Unit:
DA The entire DA register is used as an input to the control unit (this provides
the zero and negative ‘flags’).
CARRY The carry bit from the ALu is fed into the control unit.
Out of Control Unit:
PCSRC1 Selects between newPC/branched PC and RA as write input for PC
register.
PCW Whether or not to write to the PC register.
FEN Which flag to use (for branch operations)
FINV Whether or not to invert the flag (for branch operations)
IMEMR Whether or not to read from instruction memory.
IASRC Selects whether to feed ALUResult, MemOut, or DA as write input for IA
register.
IAW Whether or not to write to the IA register.
DASRC Selects whether to feed ALUResult, MemOut, IA, or Buttons & Switch as
write input for DA register.
DAW Whether or not to write to the DA register.
SPDIR Selects whether to increment (1) or decrement (0) the SP register.
SPW Whether or not to write to the SP register.
RASRC Selects whether to feed MemOut or newPC as write input for RA register.
LCDROW Selects the row on the LCD display the cursor to move to.
LCDSTARTADDRESS Selects the position in the row for the cursor to move to.
LCDMOVECURSOR Indicates to the LCD driver that the cursor needs to move to another
location.
LCDCLEAR Indicates to the LCD driver that the LCD needs to be cleared.
LCDWRITE Indicates to the LCD that the lcd_DP needs to be written to the display.
ALUASRC Selects whether to feed IA, DA, or RA into ALU input A
ALUBSRC Selects whether to feed zero-extended-upper, zero-extended,
sign-extended immediate, or MEM_OUT
ALU_LATCHW Whether or not to write to the ALU Latch
ADDRSRC Selects whether to feed IA or SP as memory address.
MEMR Whether or not to read from the memory unit.
MEMW Whether or not to write to the memory unit.

36
Control Unit
The processor is split into three cycles, the fetch, read, and write stages. With the exception of
the fetch stage, where all control signals are predetermined, control signals are based off of bits
in the instruction.

The instructions are split into sets of bits that help control determine not only what type of
instruction it is working with, but values for the relevant control signals. See the Machine
Language Instruction Types section for more information on these sections, as they will be
referred to by name.

For the Read and Write stages, if the instruction prefix is 1X, the signals fall to the Branch (B)
column. Then if the prefix is 00, they fall to the Inherent (N) column. Else, they fall to the
Immediate (I) column. Commas separate wires in a bus, and conditions evaluate to 1 for true
and 0 for false.
Fetch Stage
FEN XX SPW 0
FINV X RASRC X
PCSRC0 0X RAW 0
PCW 0 ALUA XX
IMEMR 1 ALUB XX
IASRC XX ALUOP XXX
IAW 0 ADRSRC X
DASRC XX MEMR 0
DAW 0 MEMW 0
SPDIR X ALU_LATCHW 0

37
Opcode Breakdown
For the next two stages, opcodes play a very important role in the decoding of control signals.
Due to the number of inherent type and simplicity of immediate type instructions, we are able to
manipulate the bits in the instruction to determine the appropriate control. These bits are
typically found in the op and grp sections of instructions, as defined in the Machine Language
Instruction Types section.
Inherent Opcodes
Inherent opcodes, due to their variety, are further broken down into subgroups numbered 0
through 3. The grp section indicates which group an instruction belongs to. The op breakdown
for each subgroup is as follows:

Group 00 - Arithmetic/Logic (A/L)
The DASRC mux will either be set to 00 or 11, this bit signifies which
op[4] DASRC of the two it should be.
The ALUB mux will either be set to 01 or 11, this bit signifies which
op[3] ALUB of the two it should be.
op[2] MEMR This bit is the memory read control signal in Read Stage
op[1] DAW This bit is the DA write control signal in Write Stage
op[0] MEMW This bit is the memory write control signal in Write Stage

Group 01 - Stack
Note: These instructions “cheat” by dipping into the shamt space of the instruction to store
extra data.
This is a very busy bit. It controls:
SPDIR/ ● The direction of the SP register adder
MEMR/ ● The control signal for memory read in Read Stage
MEMW/ ● The inverse of memory write in Write Stage.
op[4] DASRC ● The bits for DASRC
op[3:2] ALUA These bits control the mux into ALU A
op[1] DAW
op[0] IAW
shamt[3] RAW These bits control the write signals to the DA, IA, and RA register.

38
Group 10 - The Snowflakes (Snow)

op[4] PCSRC1 This bit signifies whether PCSRC1 should be on or off
op[3] IAW This bit is the IA write control signal in Write Stage
op[2] MEMW This bit is the memory write control signal in Write Stage
op[1] DAW This bit is the DA write control signal in Write Stage
The DASRC mux will either be set to 00 or 01, this bit signifies which
op[0] DASRC of the two it should be.

Group 11 - LCD

LCD control signals are generated by AND-ing the whole instruction with values hardcoded into
control, as the control structure for LCD is fairly straightforward.
Immediate Opcodes

This bit is a heavy lifter. It controls:
IASRC/ ● Both bits of IASRC
DASRC/ ● The inverse of the first bit of DASRC
ALUA/ ● The second bit of ALUA
IAW/ ● The IA write control signal in Write Stage
op[3] DAW ● The inverse of the DA write control signal in Write Stage
op[2:1] ALUB These bits control the mux into ALU B
IAW/ This bit is the IA write control signal in Write Stage, and its inverse is
op[0] DAW the DA write control signal in Write Stage

Branch/Jump Opcodes

FEN/
op[2:1] PCSRC1 These bits control the FEN signal. When 00 they also control PCSRC1
op[0] FINV This bit is the inverse FINV

39
Read Stage
N I B
FEN 00 00 FEN
FINV 0 0 !FINV
Snow: PCSRC1
0 FEN == 00
PCSRC1 Others: 0
PCW DEFAULT: 0
IMEMR DEFAULT: 0
IASRC grp* IASRC,IASRC XX
IAW DEFAULT: 0
Snow: 0,DASRC
DASRC,0 XX
DASRC Others: DASRC,DASRC
DAW DEFAULT: 0
Stack: SPDIR
XX XX
SPDIR Others: X
pop**: 1
DEFAULT: 0
SPW Others: 0
RASRC 0 X 1
RAW DEFAULT: 0
Stack: ALUA
0,ALUA XX
ALUA Others: 01
A/L: ALUB,1
ALUB XX
ALUB Others: XX
ALUOP aluop alu,0 XXX
ADRSRC grp[0]* 0 XX
A/L and Stack: MEMR
0 0
MEMR Others: 0
MEMW DEFAULT: 0
ALU_LATCHW DEFAULT: 1

* the group number doubles as control for the mux into the IA register, and bit 0 serves as the
ADRSRC control when need be
** SPW is 1 for pop, popra, and popia

40
Write Stage
If a control signal is not explicitly stated, it should be assumed that it did not change from Read
Stage.
N I B
FEN RS RS RS
FINV RS RS RS
PCSRC1 RS RS RS
PCW DEFAULT: 1
IMEMR DEFAULT: 0
IASRC RS RS RS
Stack,Snow: IAW
IAW 0
IAW Others: 0
RS RS RS
DASRC
DAW DAW DAW 0
SPDIR RS RS RS
push**: 1
DEFAULT: 0
SPW Others: 0
RASRC RS RS RS
Stack: RAW
0 call*
RAW Others: 0
ALUA RS RS RS
ALUB RS RS RS
ALUOP RS RS RS
ADRSRC RS RS RS
MEMR DEFAULT: 0
MEMW MEMW 0 0
ALU_LATCHW DEFAULT: 0

Note: RS = Value from Read Stage
* only the call instruction turns this bit on
** SPW =1 for push, pushra, and pushia

41
C: Testing and Integration

RTL Testing
Procedure
1. Identify the block in the RTL you want to test
2. Identify the initial conditions of the CPU.
3. Identify the final conditions that should result from the execution of the instruction
4. Step through the commands in the RTL chart and record all changes within the CPU
5. Verify that the final state of the CPU matches the expected final state
RTL Markup
Arithmetic/Logical To DA

Add, And, Or, Sub, Two
Inst = Mem[PC] //gets the instruction
newPC = PC +1 //increments the instruction counter
Op = inst[15:8] //selects operation
B=Mem[IA] //loads memory from address at IA
A= DA
DA = A+B // DA = A&&B // DA = A||B // DA = A-B // DA = two(B)
PC = newPC

Addm, Andm, Orm, Subm
newPC = PC+1 //increments the instruction counter
Op = inst[15:8] //selects ALU operation
B= Mem[IA] //loads memory from address at IA
A=DA
Mem[IA] = A + B // Mem[IA] = A&&B // Mem[IA] = A||B // Mem[IA] = A-B
PC = newPC

Addi
Op = inst[15:8] //selects the operation
A=DA //puts DA in the A input of ALU

42
B= SE[inst[7:0]] //puts the sign extended immediate in B port of ALU

DA = A + B //adds the inputs A and B port of ALU and puts result in DA
PC = newPC
Andi
A = DA //puts DA in the A input of ALU
B = SE[inst[8:0]] //puts the sign extended immediate in B port of ALU
DA = A AND B //Ands the inputs A and B port of ALU and puts result in DA
PC = newPC
Ori
A=DA //puts DA in the A input of ALU
B= SE[inst[8:0]] //puts the sign extended immediate in B port of ALU
DA = DA OR B //ors the inputs A and B port of ALU and puts result in DA
PC = newPC
Sll,Srl
A = DA //puts DA in the A input port of ALU
B = inst[4:0] puts the immediate in B port of ALU
DA = A >> B //shifts the DA register
PC = newPC
Lui
Inst = Mem[PC] //get the instruction
Op = inst[15:8]
DA = ZEu[inst[7:0]] //Zero extend upper adds zeros for bits [7:0] of immediate
PC = newPC
Li
Inst = Mem [PC] //gets the instruction from memory
Op = inst[15:8] //selects the load immediate operation
B =SE[inst[7:0]] //sign extends the immediate
DA = B //puts B into the DA register
PC = newPC

Load Word
inst = Mem[PC] //gets the instruction

43

op = inst[15:11] //selects the load word operation
DA = Mem[IA] //loads the memory at the address in the IA register
PC = newPC
Store Word
op = inst[15:11] //selects the store word operation
Mem[IA] = DA //puts the memory at the address in the IA register
PC = newPC

Bnez, Bez, Bnc, bc
If <flag> //checks a flag and changes operation based on flag value
PC = PC[15:12] concat inst[11:0] //sets PC equal to the top 4 bits of PC
concatenated with the lower 12 bits of the instruction
Else
PC = newPC //otherwise PC is incremented by 1

LCDwr
DP = DA //writes the value to the DP register
PC = newPC
Buttr
DA = BS //takes the button and switch inputs and puts them in DA
PC = newPC

LCDclr
CLEAR = 1 //tells the LCD to clear
PC = newPC

LCDmc
row = inst[5] //sets the row input to the lcd_control_master in the LCD_Driver
Startaddress = inst[4:0] //sets the start address input wire to lower 4 bits of imm
MoveCursor = 1 //tells the lcd_control_master to update

44
PC = newPC

Push
op = inst[15:11] //selects the push instruction
newSP = SP-1 //moves the SP register down 1
SP = newSP
Mem[SP] = DA //puts DA onto the stack
PC = newPC
Pop
op = inst[15:11] //selects the pop operation
newSP = SP+1 //moves the stack pointer up 1
SP = newSP
PC = newPC
Iap
IA = IA + inst[11:0] //increments the IA register by an immediate value
PC = newPC

La
IA = ZE(inst[7:0]) //loads a zero extended immediate into IA
PC = newPC

Oua
IA = IA | inst[11:0] //ors the IA register with an immediate value
PC = newPC

J
PC = PC[15:12] concat inst[11:0] //concatenates the top 4 bits of PC to the lower 12
bits of an immediate

Call

45

RA = PC //stores PC in the RA register
PC = PC[15:12] concat inst[11:0] //concatenates the top 4 bits of PC to the lower 12
bits of an immediate

Ret
PC = RA //restores the PC from the RA
Unit Test Plan

To ensure accurate and efficient operation of our processor every component must be tested.
Parts like muxes will be tested exhaustively due to their restricted inputs and outputs while
components such as registers will be tested extensively but not exhaustively so as to save time
testing while ensuring edge cases can still be met.

In addition to testing basic functionality of the components the amount of time needed to run
each operation will be recorded to assist with setting the proper clock cycle lengths later in the
project.

Components to Test:
PC Select Mux
PC adder
Instruction Memory
IA Mux
DA Mux
SP Incrementer
RA Mux
ALU MuxA
ALU MuxB
Addr Mux
Ia Register
DA Register
SP Register
RA Register
ALU
Memory Unit
LCD Driver
Zero Extenders(tested during integration due to it being a wire operation)
Sign Extenders(tested during integration due to it being a wire operation)
Zero Upper Extenders(tested during integration due to it being a wire operation)

46
(Registers between cycles)
Unit Testing Procedure:

1. Choose the basic component to test.
2. Discern which testing subset it belongs to based on breakdown below
3. Determine the expected inputs, control signals, and outputs
4. Use the tables below to create a testbench
a. For ALUs determine the operations and test each the ALU is capable of
performing
b. The tables are not meant to be the only tests ran on the testbench, they only
serve as an example of tests that the user can run to confirm proper operation
5. Run the testbench
6. Check with the tables below or use common sense to check validity of output data

Due to the similarity in operation between a number of the components the components will be
subdivided as follows:

Muxes: ALUs:
IA Mux (2:4) The ALU
DA Mux (2:4) PC Incrementer
PC Select Mux (2:4) Sign Extender
ALU MuxA (2:4) Zero Extender
ALU MuxB (2:4) Zero Upper Extender
RA Mux (1:2) SP Incrementer
Addr Mux (1:2)
Others:
Registers: LCD Driver
IA Register Memory Unit
DA Register Instruction Memory
SP Register
RA Register
PC Register

Tables for Unit Testing
Muxes
4 Input, 2 Bit Control Mux

47
Will take in 3 or 4 inputs of varying lengths and output 1 value based on selection by control
bits.
The tester will ensure that result of testing corresponds with table and diagram.

Control Bit Signals Mux Output
0 0 A
0 1 B
1 0 C
1 1 D

2 Input 1 Bit Control Mux
Will take in 2 inputs of varying lengths and output 1 value based on selection by control bits.
The tester will ensure that result of testing corresponds with table and diagram

48

Control Bit Signal Mux Output
0 A
1 B

Registers
The purposes of registers in this design is to store data for future use. The registers must be
able to have an input written to it and output the value written in it to function properly. All of the
registers in our design need to support this functionality for data with a size up to 16 bits.

16 Bit Registers
Will write an input to its reserved memory location when the control signal is 1 and continually
output the value store in it. As the register needs to be able to store any 16 bit value an effective
testbench will test several 16 bit values. The intermediate registers are essentially multiple
single 16 bit registers combined. As a result they will be tested in the same way as the single 16
bit registers.
The table below offers examples of 16 bit values to test.

49

16 bit data input Register Control Signal Output
0000 0000 0000 0000 1 0000 0000 0000 0000
1111 1111 1111 1111 0 Previous register value
0101 0101 0101 0101 1 0101 0101 0101 0101
1010 1010 1010 1010 0 Previous register value
1001 0110 1001 0110 1 1001 0110 1001 0110
ALUs
The below tables are used to test the ALUs present in our design. To make it easier to find
specific components for testing single operation components are included in this section.

Single Operation ALU 1 Input
This includes all of the operations extenders and incrementers that will need to individually
complete. Each operation in the below table corresponds to one of the single operators used in
the processor. For testing single operation ALUs use the table as a reference

Single Operation Data Input Operation Control Data Output
ALU Signal
Received
Sign Extender 1111 0000 Sign Extend N/A 1111 1111 1111 0000
0111 0000 Sign Extend N/A 0000 0000 0111 0000

50
Zero Extender 1111 0000 Zero Extend N/A 0000 0000 1111 0000
0111 0000 Zero Extend N/A 0000 0000 0111 0000
Zero Extender 1111 0000 Zero Extend N/A 1111 0000 0000 0000
Upper Upper
1111 0001 Zero Extend N/A 1111 0001 0000 0000

Upper
PC Incrementer 1111 0001 Increment N/A 1111 0001 0001 0000

0000 1111
0000 0000 Increment N/A 0000 0000 0000 0001

0000 0000
SP Incrementer 1111 0000 Increment 1 1111 0000 0010 0000

0001 1111
1111 0000 Decrement 0 1111 0000 0000 1111

0001 0000

The ALU

A input B input Operation Control Output Carry Flag
Signal Input Output
0x0001 0x0001 add 0b000 0x0002 0x0
0xFFFF 0x0001 0b000 0x0000 0x1
0x0001 0x0011 sub 0b001 0xFFF0 0x1
0x1111 0x1110 0b001 0x0001 0x0
0x0FFF 0xFEF1 and 0b110 0X0EF1 x
0x8787 0x7878 or 0b100 0xFFFF x
0x4044 0x0002 sll 0b011 0x0110 x
0x2222 0x0002 0b011 0x8888 x
0x8888 0x0002 srl 0b101 0x2222 x
0x4371 0x0002 0b101 0x10DC x
0x4371 0xXXXX pass 0b111 0x4371 x

51
Other Components
Consists of parts that are unique and that cannot be easily grouped together

Memory Unit
The memory unit is used to access and store all needed values from memory. For it to work
properly it must receive a 16 bit data address, control signals, a 16 bit input, and output a 16 bit
value. The below table outlines a basic test of the memory unit.

Address Input Data Input MemW (control MemR (control Output
signal) signal)
0xFEDC 0xXXXX 0 1 Mem[0xFEDC]*
0xF3A7 0x4782 1 0 Data input

stored in
Address Input in
Memory
No output

*Mem[data] represents the 16 bit value stored at the address input memory location.

Instruction Memory
The instruction memory stores the output data the control unit needs to perform its operations.
The memory is accessed by 16 bit inputs from our instructions. A control signal called IMemR
controls whether or not the instruction memory outputs data.

Address Input IMemR (control signal) Output
0x1234 1 Control located at

0x1234
0xFDEC 0 0x0000
0xFDEC 1 Control located at

0xFDEC

52
Integration Plan
In order to successfully integrate our components, we will follow the 3 step plan outlined below.
The general testing procedure for each set of components is to iterate through the applicable
permutations in the control signals, and compare the expected state after each permutation
with the actual state. Permutations may be tested with multiple starting states if there are
anticipated edge cases we would like to cover.
Step 1: Small Subsystems

The PC Subsystem
This subsystem consists of the PC register, an adder, a mux for PC source, an OR unit, and two
different zero extension units. For the purposes of the test, the instruction will be fed in on a
wire.

We are asserting that the value in the PC register is the expected value, and will be ignoring the
cases where PC W is set to off, as the writing capabilities should have been tested at a previous
layer.

53

Control Starting State Result
PC Src PC W Inst RA PC Expected PC
00 1 0x1234 0x4848 0x2222 0x2223
01 1 0x1234 0x4848 0x2222 0x2234
10 1 0x1234 0x4848 0x2222 0x4848

The ALU Subsystem
This subsystem consists of an ALU, a mux for ALU input A, and a mux for ALU input B. For the
purposes of this test, the values of the IA, RA, and DA register as well as the instruction will be
fed in on wires.

We are asserting that the value of ALUOut is the expected value.

Note: Instead of explicitly stating the ALUCtrl value, we list the operations we expect to occur at
each permutation of ALU A. This is both for the sake of brevity and maintainability.

54

ALU A ALU B ALU OP DA IA RA Inst ALU Out
00 0x1540 0x2471 0x0254 0x4488 0x1540 OP 0x4400

add,
sub,
00 01 0x1540 0x2471 0x0254 0x4488 0x1540 OP 0x0088
shift,
pass
10 0x1540 0x2471 0x0254 0x4488 0x1540 OP 0xFF88
00 add, 0x1540 0x2471 0x0254 0x4488 0x2471 OP 0x4400

sub,
and,
01 01 0x1540 0x2471 0x0254 0x4488 0x2471 OP 0x0088
or,
shift,
10 pass 0x1540 0x2471 0x0254 0x4488 0x2471 OP 0xFF88
00 0x1540 0x2471 0x0254 0x4488 0x0254
10 01 pass 0x1540 0x2471 0x0254 0x4488 0x0254
10 0x1540 0x2471 0x0254 0x4488 0x0254

The Stack/Program Memory Subsystem
This subsystem consists of a program memory unit, the SP register, and an adder. For the
purposes of this test, the memory unit will be pre-populated in such a way that the value stored
at an address is the square of the address. This scheme should be sufficient for the scope of
the tests.

We are asserting that the value read out of memory is the expected value. Memory reading is
always set to on, and memory writing is always set to off.

55

SP Inc SP W SP Expected Mem Out
0 1 0x0020 0x0100
1 1 0x0020 0x0900
0 0 0x0020 0x0400
1 0 0x0020 0x0400
Step 2: Registers, the ALU, and Program Memory

Processor Core
The next step in the integration process is assembling the core of the processor. This consists
of the IA, RA, and DA registers, the ALU subsystem, and the program memory unit. For the
purposes of these tests we will be ignoring the SP unit, as its operation has been tested and it
does not affect the processor core. Its functionality will be further tested in step 3.

In previous steps we tested the connections from the IA, RA, and DA registers and inst into the
ALU, so we will be ignoring those tests for this step. We will also be ignoring the memory unit, as
its functionality should have already been tested. Instead we will be monitoring the writing back
to the IA, RA, and DA registers. As such, the write control signals for all registers will be set to on
by default.

56

We will be asserting that the end state of the IA, RA, and DA registers is as expected. This will be
checked by eye or with a very careful script, so the table below is merely for reference.

Control Starting State Ending State
IASRC DASRC ALU OP IA DA RA IA DA RA
00 0x1540 0x2471 0x0254 ALU Out ALU Out Mem Out
add,
01 0x1540 0x2471 0x0254 ALU Out 0x2471 Mem Out
sub,
00
shift,
10 0x1540 0x2471 0x0254 ALU Out 0x1540 Mem Out
pass
11 0x1540 0x2471 0x0254 ALU Out Mem Out Mem Out
00 0x1540 0x2471 0x0254 Mem Out ALU Out Mem Out
add,
01 sub, 0x1540 0x2471 0x0254 Mem Out 0x2471 Mem Out
and,
01
or,
10 shift, 0x1540 0x2471 0x0254 Mem Out 0x1540 Mem Out
pass
11 0x1540 0x2471 0x0254 Mem Out Mem Out Mem Out
00 0x1540 0x2471 0x0254 0x2471 ALU Out Mem Out
01 0x1540 0x2471 0x0254 0x2471 0x2471 Mem Out

10 pass
10 0x1540 0x2471 0x0254 0x2471 0x1540 Mem Out
11 0x1540 0x2471 0x0254 0x2471 Mem Out Mem Out
Step 3: The Big Kahuna

This is the final stage in integration, putting everything together. Writing out a table for this
testing procedure would be extensive and unmaintainable, so the idea is as follows.

For each instruction:
1. Convert the instruction to machine code
2. Adjust the mif file in instruction memory with the updated instruction(s)
3. Run the processor until the instruction has been completed
4. Check the states of the processor to confirm that the operation was performed correctly.
5. If any errors are found additional non documented testing will be performed to isolate the
root of the error

57
(All numbers are in decimal unless otherwise specified)

Test Basic Immediate Instructions:

Program Result
addi 5 Should see 5 in the DA register

ori 12
iap 1 Should see 1 in the IA register

Test Basic Arithmetic Instructions

Program Result

ori 12 Should see 13 in address 0 of program
sw memory

iap 1 memory
sw Should see 0 in the IA register
iap -1
lw

iap 1 memory
sw Should see 1 in the IA register
iap -1
lw
iap 1
li 25
addm

Test Basic Stack Operation

Program Result
addi 5 Stack Pointer is at FFFE

ori 12 Stack has 13 at location FFFF

58
iap 1 DA should have 13 in it

sw
push
iap -1
lw
iap 1
li 25
addm
pop
push

59
D: Design Process Journal

Milestone 1
Meeting Monday, January 8
Members Present: Thad, Evë, Matthew, Ian

At this meeting we decided to build an accumulator-style processor. This is a pattern that we
are all mostly familiar with, and will fit well within the requirements of the project. As there is an
implicit register in every instruction, the capabilities of the 16-bit data requirement can be
maximized, leaving room for interesting optimizations.

We proceeded to outline commands that would be necessary / useful for programming. These
are:

sll
srl
andi
and
or
ori
lui
sw
sm0 (store into memory address 0 - for indirect addressing)
lw
subi
add
addi
sub
bnez
j

sw and l
w
work by using something (a special register or memory address 0) to provide the
index value. We still aren’t sure whether to implement j
al
. We need some way of keeping track
of return address though.

No work items came as a result of this meeting.
First Meeting Wednesday, January 10

Members Present: All

At this meeting we settled the procedure call convention for our processor. As the bulk of our
knowledge about accumulator processors comes from the PIC, this was uncharted territory and

60
caused quite a bit of kerfuffle. Eventually we decided on putting the majority of the procedure
responsibilities on the caller, and passing arguments and return values on the stack. This allows
us to maintain an accumulator-style low register count, while expanding on the capabilities of
processors like the PIC.

We also decided how we would handle storing local values. Our options were to store them on
the stack and create commands that let you access more than just the top word, or put them
into memory and force the programmer to back up what they wanted to persist before handing
over the reigns to other procedures. We chose the latter, both because it preserved the integrity
and concept of the stack, and also because we would then have to create instructions that
performed operations on elements of the stack. As we already have instructions that perform
operations on registers or memory, adding a third variation of an instruction would both
increase our opcodes and add unnecessary complexity.

Sometime between now and our next meeting (in approximately 4 hours time):
● Ian will examine the instructions we’ve laid out and will try to condense them into
opcodes/instruction types
● Evë will update the design process journal
● Thad will translate the example program into our assembly and determine if our new
instruction set is feasible
● David, Matthew, and Evë will write some code snippets for the ‘Common Operations’
section
Second Meeting Wednesday, January 10

Members Present: Thad, Evë, Matthew, Ian

At this meeting we aimed to finish the remainder of Milestone 1. This involved nailing down
instruction formats and opcodes, as well as addressing modes.

The main question we had was whether we needed jump register, and whether we could turn
jump into an inherent instructuction by using the address stored in the IA register. We ended up
by deciding jump would include a 12 bit label or immediate which would be extended by the top
4 bits of PC.

We also assembled the assembly into machine code.

No work items came as a result of this meeting.

61
Milestone 2
Impromptu Meeting Thursday, January 11
Members Present: Thad, Evë Matthew, Ian

At this meeting we debriefed the design meeting we just had with Micah.

We discussed adding a register to store carry, overflow, and perhaps negative flags. Though
having a status register ensures that the data persists, we decided to instead very carefully set
up the datapath to render the register irrelevant. This is because we do not want to support a
scenario in which we let the programmer set any of the flags or view them for an extended
period of time. In the event we decide to handle interrupts, we will revisit this decision.

An interesting idea that came up in the meeting with Micah was to have a chunk of memory
dedicated to local values. This solves a lot of our problems with programming recursion and
preserves the integrity of our design, so we decided to move forward with that in mind.

Though we discussed condensing opcodes, we would like to wait to do some analysis on which
instructions get used at what frequency so that we can do proper encoding using a system like
Huffman coding.

We then briefly discussed the idea of replacing ret with jr (jump register), before Thad threw out
the ridiculous idea of creating an instruction that pops the stack directly into PC. Though this
could have been a productive discussion we quickly moved on.

At this point we noticed that we’d effectively sucked ourselves into making a multi-cycle
processor. Though not intended, the 4 of us are on board with the idea.

Work completed during the meeting:
● Evë: Updated process journal
● Ian: Updated design document with opcodes and instruction types
● Thad: Updated example program
● Matthew: Updated design document with opcodes and instruction types

Meeting Friday, January 12

Members Present: Thad, Evë, Matthew, Ian, and David

In this meeting we brought David up to speed with our design.

62
We began looking at the milestone two requirements.

We put final verification that our processor design was what we wanted going forwards.
Grouped our instructions into categories that share the same RTL implementation
Decided we would finalize op-code assignment when beginning work on the control.’
Created a chart to organize the RTL.

● Evë: left early
● Ian: Updated process journal/ helped organize instructions into RTL categories
● Matthew: helped organize instructions into RTL categories and identify signals required
for RTL
● Thad: Added bneg and bpos commands and rearranged branch/jump opcodes.
● David: Make a summary chat for the RTL to better represent 4 stage of a multi-cycle
processor

Work to complete before next meeting:
The list of RTL commands was split up between the group to work on before the next meeting.
Evë: Stack commands (including CYA/RYA), and load/store word
Ian: Arithmetic Memory/DA commands
Thad: IA commands, and branch/jump commands
Matthew: LCD and BS commands
David: Arithmetic/Logic Immediate commands
Meeting Wednesday, January 17

Members Present: Thad, Evë, Matthew, Ian, David

At this meeting we aimed to complete the work remaining for Milestone 2. Which we did!

We decided to break up CYA and RYA into more instructions to simplify control logic. pushRA,
pushIA, popRA, popIA are a result of this (push and pop already work with DA). This could also
mean if the programmer doesn’t want to backup DA or IA, they don’t need to.

● Evë: Converted written RTL into tables, added changelog, formatted document
● Thad: Converted written RTL into tables, added components to list
● Ian: Converted written RTL into tables, added components to list
● Matthew: Converted written RTL into tables, tested RTL
● David: Created list of components

63
Milestone 3
Meeting Sunday, January 21
Members Present: Matthew, Thad, Evë, Ian

At this meeting we recapped our Friday discussion with Micah, as well as delegated lab work.
Since we will not be using interrupts for our games, we chose to do labs 6 and 7. David will work
on lab 6, and Matthew will work on lab 7.

One of the biggest concerns from the meeting were our RTL tables. We spent some time
condensing and refactoring our tables.

We also drew out our initial datapath. This was done on a whiteboard, so after this meeting
someone will have to reconstitute it in software like Visio. We are currently in a race to see who
can download it the fastest. As we are all in F217, the prospects do not look good for any of us.
Ian may have to restart his computer.

A key decision we made in the datapath was to have the RA, IA, and DA registers mux into an
ALU input. We decided to go this route because there is already a mux on the B input, so adding
one to the A input does not change cycle time.

As a very important side note, Evë won the Visio installation battle.

Work to complete before next meeting:
● Matthew has the honor of copying the diagram into Visio
● David will work on Lab 6
● Thad will update the RTL and describe the control signals
● Evë will start the integration plan
● Ian will design the unit tests and update the component list
Meeting Tuesday, January 23

Members Present: Matthew, Ian, Evë, Thad, David

At this meeting we finished the tasks we were meant to complete before the meeting. Actually,
Matthew had finished copying the design into Visio, but we needed to add some components.

We discussed some formatting standards, and had a brief discussion about initializing the
processor. We decided to cross that bridge later.

64
Milestone 4
Meeting Thirstday, January 25
Members Present: Matthew, Ian, Evë, Thad, David
In addition to assignments to complete Labs 6&7….

Evë: Optimize opcodes, make memory unit testbench
Thad: Fix the RTL again & purge document of intermediate registers, fix Micah comments, make
Mux testbenches.
Matt: Get rid of intermediate registers from the datapath diagram. Start putting together ALU
control and ALU control codes (if those exist at all).
Ian: Make register testbench
David: Make ALU testbench
Meeting Sunday, January 28

Members Present: Matthew, Ian, Evë, Thad
At this meeting we finished the tasks we were assigned last meeting, and redistributed other
tasks.

Evë: Designed opcodes for inherents
Thad: Designed opcodes for branch and immediates
Ian: Insured the test benches that tested the registers, adders, and muxes worked.
Matthew: Updated schematic and implemented components

Assignments:
Evë: Finish designing inherent control + add opcode logic to design doc
Thad: Fix the design doc (add/remove instructions + fix opcodes)
Ian: Check in all test benches
Matthew: Check in all components
David: Make ALU

65
Milestone 5

Meeting Monday During Class 2/5/2018

Members Present: Thad, David, Matthew, Ian
Thad: Fixing RTL documentation
David/Matthew/Ian: Write testbenches

Meeting Monday Evening 2/5/2018

Members Present: Matthew, Evë (Thad for 5 minutes)
Wrote control using verilog if/switch statements… reorganized tables to be more sensible. Also
clarified with Thad what was going on with the PCSRC control signal(s).
Meeting Tuesday During Class 2/6/2018

Members Present: Thad, David, Matthew, Ian, Evë
Thad: Finished assembler now that we know opcodes
David/Matthew/Ian: Write testbenches
Meeting Wednesday During/After Class 2/7/2018

Thad: Wrote testbench for program memory.
David/Matthew/Ian: Write testbenches. Wrote basic control testbench (includes example of
each instruction type), did some debugging of the control unit to make them work.
Later, wrote entire processor testbench- in this though, PC was not incrementing properly. Still
not sure why.

Meeting Monday During Class 2/12/2018

Ian and Matt explained changes they had made over the weekend to the control, datapath, and
opcodes to allow for more of the instructions to work. Discussed as a group how to make the
control more readable to people outside of the team.
Ian: Ran tests on individual instructions and checked the outputs to make sure they were
functioning correctly.
Matthew and David: Helped with debugging the processor
Thad and Evë: Worked on fixing up our control documentation

66
Meeting Tuesday During Class 2/13/2018

As a group we:
Worked on debugging the branch instructions. Corrected errors in control documentation and
control verilog file.
Ian was given the task of testing each instruction before wednesday.

Meeting Wednesday During and After Class 2/14/2018

Members Present: Thad, Ian, David, Evë, and Matthew
As a group:
Discussed the results of the instructions Ian ran the previous night. Realized our arithmetic
instructions, our la instruction, and a few other instructions were not working properly. Added a
latch following the ALU output so as to clearly separate the Read and write stages. This latch
insures that the processor will not perform both a memory read and memory write during the
same cycle when an instruction like addm is run. Eventually we got the processor working.

Outside of class:
Thad: Completed Assembler worked on running Euclid’s algorithm on processor
Ian: Continued running instruction tests on processor and debugging errors as the arose.
Updated design journal.
David: Helped write and run instruction tests
Matthew: Continued to work on preparing the processor to be run on the fpga board.
Evë: Worked on compiler updated documentation

Meeting Thursday After Scheduled Meeting 2/15/2018

Members Present: Thad, Ian, David, Evë, and Matthew
As a group:
We discussed the feedback we received from our meeting and planned out what we needed to
work on for the coming week.
Ian: Worked on updating documentation so that it matched our current processor
Evë: Worked on completing the compiler
Thad: Worked on updating the assembler and designing a game
Matthew: Worked on getting the processor to run on the FPGA board.
David: Helped update the documentation and worked on the presentation.

67
Meeting Wednesday 2/21/2018

Members Present: Thad, Ian, David, and Evë
Matthew was unable to meet due to his final’s schedule
As a group:
Worked on finishing touches to design documentation and design process journal. Continued
work on presentation.

Speed Lake Processor

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Speed Lake Processor

Hochgeladen von

Copyright:

Verfügbare Formate

A: Software Specifications 13

Available Registers 13

Procedure Call Convention 14

Machine Language Instruction Types 15

Instruction Semantics 16

Example Program 21

Common Operations 25

Loop Through an Array in Memory 25

B: Hardware Specifications 29

Register Transfer Language 29

Component Specification 32

Datapath Schematic 34

Control Signals 35

Control Unit 36

C: Testing and Integration 41

RTL Testing 41

Unit Test Plan 45

Integration Plan 52

D: Design Process Journal 58

1.1 January 17, 2018 Milestone 2 Updates:

1.2 January 24, 2018 Milestone 3 Updates:

1.3 January 30, 2018 Milestone 4 Updates:

1.3 February 7, 2018 Milestone 5 Updates:

Instructions and Control

Memory and the Stack

Compiler and Assembler

- Conditions must be as simple as possible, eg. if (i == 5) or if (a < b) as opposed to if (i - 4

We wrote a hand-optimized assembly program and compared it to compiler output. We found

# of 16 bit instructions to store euclid’s and 60 97

# of 16 bit memory addresses 4 7

# of 16 bit instructions executed with 0x13B0 92092 163496

# of cycles to execute with 0x13B0 276276 490488

Average cycles per instruction 3

Cycle time for design 6.366 ns

Clock speed 157 MHz

General Purpose Registers

Restricted Purpose Registers

Procedure Call Convention

Machine Language Instruction Types

Inherent (N) Type Instructions

Immediate (I) Type Instructions

Branch (B) Type Instructions

Arithmetic and Logical Instructions

I/O Manipulation Instructions

Memory Manipulation Instructions

Store to Indirect Addressing

Assembled Machine Code

Adding Numbers in Memory

Loop Through an Array in Memory

Loading an Address into the IA Register

0110 00XX 0000 0010

Reading from/Writing to a Display Register

Mem[IA] = ALUOut DA = ALUOut

DA = mem[IA] mem[IA] = DA

ALUOut = IA + ALUOut = ALUOut = IA OR

DP = DA Row = inst[5]

MemOut = mem[SP] ALUOut = DA ALUOut = IA ALUOUT = RA

DA = mem[SP] IA = mem[SP] RA = mem[SP] mem[SP] = ALUOut

7. Zero Extension Upper Unit

10. Instruction Memory

11. ALU Latch

op[3:2] ALUA These bits control the mux into ALU A

Group 10 - The Snowflakes (Snow)

op[2:1] ALUB These bits control the mux into ALU B