Sie sind auf Seite 1von 45

1

The Midterm is Coming


Midterm on May 8th.
Midterm review on May 6th.
Come to class with questions.
Midterm will cover everything before it
It will mostly resemble the homeworks and
the reading quizzes.
We will send out a reading quiz compendium on
Thursday
It will be challenging.
It will be curved.
2
The Final is Also Coming (but
more slowly)
Despite what the online schedule says, we
have only one final time and it is:
6/10/2014
8:00am-11:00am.
3
Implementing a MIPS
Processor
Readings: 4.1-4.9
4
Goals for this Class
Understand how CPUs run programs
How do we express the computation the CPU?
How does the CPU execute it?
How does the CPU support other system components (e.g., the OS)?
What techniques and technologies are involved and how do they work?
Understand why CPU performance (and other metrics)
varies
How does CPU design impact performance?
What trade-offs are involved in designing a CPU?
How can we meaningfully measure and compare computer systems?
Understand why program performance varies
How do program characteristics affect performance?
How can we improve a programs performance by considering the CPU
running it?
How do other system components impact program performance?
5
Goals
Understand how the 5-stage MIPS pipeline
works
See examples of how architecture impacts ISA
design
Understand how the pipeline affects performance
Understand hazards and how to avoid them
Structural hazards
Data hazards
Control hazards
Processor Design in
Two Acts
Act I: A single-cycle CPU
Foreshadowing
Act I: A Single-cycle Processor
Simplest design Not how many real machines
work (maybe some deeply embedded processors)
Figure out the basic parts; what it takes to execute
instructions
Act II: A Pipelined Processor
This is how many real machines work
Exploit parallelism by executing multiple instructions
at once.
8
Target ISA
We will focus on part of MIPS
Enough to run into the interesting issues
Memory operations
A few arithmetic/Logical operations (Generalizing is
straightforward)
BEQ and J
This corresponds pretty directly to what
youll be implementing in 141L.
9
Basic Steps for Execution
Fetch an instruction from the instruction store
Decode it
What does this instruction do?
Gather inputs
From the register file
From memory
Perform the operation
Write back the outputs
To register file or memory
Determine the next instruction to execute
10
The Processor Design Algorithm
Once you have an ISA
Design/Draw the datapath
Identify and instantiate the hardware for your architectural state
Foreach instruction
Simulate the instruction
Add and connect the datapath elements it requires
Is it workable? If not, fix it.
Design the control
Foreach instruction
Simulate the instruction
What control lines do you need?
How will you compute their value?
Modify control accordingly
Is it workable? If not, fix it.
Youve already done much of this in 141L.
Arithmetic; R-Type
Inst = Mem[PC]
REG[rd] = REG[rs] op REG[rt]
PC = PC + 4
bits 31:26 25:21 20:16 15:11 10:6 5:0
name op rs rt rd shamt funct
# bits 6 5 5 5 5 6
12
ADDI; I-Type
PC = PC + 4
REG[rt] = REG[rs] op SignExtImm
bits 31:26 25:21 20:16 15:0
name op rs rt imm
# bits 6 5 5 16
13
Load Word
PC = PC + 4
REG[rt] = MEM[signextendImm + REG[rs]]
bits 31:26 25:21 20:16 15:0
name op rs rt immediate
# bits 6 5 5 16
14
Store Word
PC = PC + 4
MEM[signextendImm + REG[rs]] = REG[rt]
bits 31:26 25:21 20:16 15:0
name op rs rt immediate
# bits 6 5 5 16
15
Branch-equal; I-Type
PC = (REG[rs] == REG[rt]) ? PC + 4 + SignExtImmediate *4 :
PC + 4;
bits 31:26 25:21 20:16 15:0
name op rs rt displacement
# bits 6 5 5 16
16
A Single-cycle Processor
Performance refresher
ET = IC * CPI * CT
Single cycle CPI == 1; That sounds great
Unfortunately, Single cycle CT is large
Even RISC instructions take quite a bite of effort to
execute
This is a lot to do in one cycle
17
Our Hardware is Mostly Idle
Cycle time = 18 ns
Slowest module (alu) is ~6ns
Processor Design in
Two Acts
Act II: A pipelined CPU
Pipelining
Letter Answer
A Allows the execution of multiple instructions to
overlap
B Prevents branch articulation
C Significantly decreases the amount of time it
takes to execute a particular instruction
D Significantly increases the amount of time it
takes to implement a particular instruction
E A and D
19
Pipelining
Letter Answer
A Increases instruction count
B Reduces CPI
C Reduces cycle time
D Has no effect on performance
E B and C
20
Data hazards
Letter Answer
A Occur because a value is not ready when its
needed
B Occur because the next PC is not yet known.
C Cannot be removed.
D A and B
E All of the above
21
Stalling a processor
Letter Answer
A Reduces CPI and increases instruction count.
B Means that instructions early in the pipeline
stop making progress
C Can resolve some hazards.
D B and C
E A and C
22
Forwarding
Letter Answer
A Is just for email.
B Allows the processor to resolve control
hazards.
C Improves CPI
D Reduces cycle time
E Interacts poorly with stalling.
23
24
Pipelining Review
Pipelining
Break up the logic with pipeline registers into
pipeline stages
Each stage can act on different instruction/data
States/Control Signals of instructions are hold in
pipeline registers (latches)
25
2ns 2ns 2ns 2ns 2ns
10ns
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
Pipelining
26
2ns 2ns 2ns 2ns 2ns
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
cycle #1
2ns 2ns 2ns 2ns 2ns
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
cycle #2
2ns 2ns 2ns 2ns 2ns
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
cycle #3
2ns 2ns 2ns 2ns 2ns
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
cycle #4
2ns 2ns 2ns 2ns 2ns
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
l
a
t
c
h
cycle #5
Performance of a pipeline processor
If we have 500 instructions , whats the speedup of a
5-stage pipeline processor with 2 ns cycle time v.s. a
single-cycle processor with 10 ns cycle time?
A. 5
B. 4.96
C. 2.78
D. 1
E. None of the above
27
Recap: Clock
A hardware signal defines when data is valid and
stable
Think about the clock in real life!
We use edge-triggered clocking
Values stored in the sequential logic is updated only on a
clock edge
28
sequential logic
combinational logic
The 5-Stage MIPS Pipeline
Instruction Fetch
Read the instruction
Decode
Figure out the incoming
instruction?
Fetch the operands from the
register file
Execution: ALU
Perform ALU functions
Memory access
Read/write data memory
Write back results to registers
Write to register file
36
Execution (EXE)
Instruction Fetch (IF)
Instruction Decode (ID)
Memory Access (MEM)
Write Back (WB)
Pipelined Datapath
Read
Address
Instruction
Memory
Add
P
C
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
16 32
ALU
Shift
left 2
Add
Data
Memory
Address
Write Data
Read
Data
Sign
Extend
Pipelined datapath
39
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[15:11]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead
RegDst
RegWrite MemWrite
PCSrc
Zero
PCSrc = Branch & Zero
IF/ID ID/EX EX/MEM
MEM/WB
Instruction Fetch
Instruction
Decode
Execution
Memory
Access
Write
Back
Will this work?
ALUop
Pipelined datapath
40
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[15:11]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead
RegDst
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
add $1, $2, $3
lw $4, 0($5)
sub $6, $7, $8
sub $9,$10,$11
sw $1, 0($12)
ALUop
Pipelined datapath
41
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[15:11]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead
RegDst
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
add $1, $2, $3
lw $4, 0($5)
sub $6, $7, $8
sub $9,$10,$11
sw $1, 0($12)
ALUop
Pipelined datapath
42
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[15:11]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead
RegDst
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
add $1, $2, $3
lw $4, 0($5)
sub $6, $7, $8
sub $9,$10,$11
sw $1, 0($12)
ALUop
Pipelined datapath
43
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[15:11]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead
RegDst
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
add $1, $2, $3
lw $4, 0($5)
sub $6, $7, $8
sub $9,$10,$11
sw $1, 0($12)
ALUop
RegDst
Pipelined datapath
44
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[15:11]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
Is this right?
add $1, $2, $3
lw $4, 0($5)
sub $6, $7, $8
sub $9,$10,$11
sw $1, 0($12)
ALUop
Pipelined datapath
45
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead RegDst
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
i
n
s
t
[
1
5
:
1
1
]
ALUop
Pipelined datapath + control
46
Read
Address
Instruction
Memory
ALU
Write Data
4
Add
Read
Data 1
Read
Data 2
Read Reg 1
Read Reg 2
Write Reg
Register
File
inst[25:21]
inst[20:16]
inst[31:0]
m
u
x
0
1
m
u
x
0
1 sign-
extend 32 16
Data
Memory
Address
Read
Data
m
u
x
1
0
Write Data
m
u
x
1
0
Add
Shift
left 2
ALUSrc
MemtoReg
MemRead RegDst
RegWrite MemWrite
PCSrc
Zero
IF/ID ID/EX EX/MEM
MEM/WB
i
n
s
t
[
1
5
:
1
1
]
ALUop
Control
WB
ME
EX
WB
ME
WB
RegWrite
Simplified pipeline diagram
1.Use symbols to represent the physical resources
with the abbreviations for pipeline stages.
1. IF, ID, EXE, MEM, WB
2.Horizontal axis represent the timeline, vertical axis
for the instruction stream
3.Example:
47
add $1, $2, $3
lw $4, 0($5)
sub $6, $7, $8
sub $9,$10,$11
sw $1, 0($12)
IF EXE WB ID MEM
IF EXE WB ID MEM
IF EXE ID MEM
IF EXE ID
IF ID
WB
WB MEM
EXE WB MEM
What how much speedup should pipelining
provide and why?
Letter Answer
A 5x, by Amdahls law (x = 0.8 , S = 6.25)
B 25x, by the PE, since CPI goes up by 5x
and cycle time goes down by 5x
C 2.24x, by the PE and the quotient rule
D 5x, by the PE since cycle time goes down
by 90%
E 5x, by the PE since clock rate goes up by
5x
48
50
Pipelining Inaction
Ctrl 0.797 ns
Imem 2.77 ns
ArgBMux 1.124 ns
ALU 6.527ns
Dmem 1.744 ns
ALU 6.527 ns
WriteRegMux
3.067 ns
RegFile 2.27 ns
53
Ideal 5-stage Pipeline (3.733ns -> 267Mhz)
Single-cycle Implementation to scale
18.667ns -> 3.733ns == 80% reduction in CT
Lold = IC * CPI * CTold
Lnew = IC * CPI * CTnew
CTnew = 0.2 * CTold
Lnew = 0.2 * Lold
Speed up = Lold/Lnew = 5x
54
Ideal 5-stage Pipeline (3.733ns -> 267Mhz)
Single-cycle Implementation to scale
Realistic 5-stage Pipeline
Letter Whats the actual
speedup? Clock rate?
A 3x; 150Mhz
B 1.02; 76.6Mhz
C 2.85x; 153Mhz
D 5.49x; 294Mhz
E None of the abve

Das könnte Ihnen auch gefallen