Beruflich Dokumente
Kultur Dokumente
Lecture 9
Outline
Building a CPU
Basic Components
MIPS Instructions
Basic 5 Steps for CPU
Single-Cycle Design
Multi-cycle Design
Comparison of Single and Multi-cycle Designs
CPU Datapath
MIPS Example
S (Select input)
0 D Q
D Q Q EN
D 1 Clock
(edge-
Clock triggered)
EN (edge-
triggered)
(enable)
D3 Q3
D Q D2 Q2 D Q
EN D1 Q1 EN
Clock Clock
(edge- D0 Q0 (edge-
triggered) triggered)
EN
Clock
(edge-
triggered)
Registers
55:035 Computer Architecture and Organization 5
Digital Logic
Tri-state Driver (Buffer)
In Drive Out
in out
0 0 Z
1 0 Z
0 1 0
drive 1 1 1
What is Z ??
Add/sub or ALUop
Carry-out Carry-in
OR Immediate: 31 26 21 16 0
ori rt, rs, imm16 op rs rt immediate
6 bits 5 bits 5 bits 16 bits
LOAD and STORE Word
lw rt, rs, imm16
sw rt, rs, imm16 31 26 21 16 0
op rs rt immediate
BRANCH: 6 bits 5 bits 5 bits 16 bits
beq rs, rt, imm16
31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
ALUop
55:035 Computer Architecture and Organization 13
Step 3: Datapath Assembly
ADDU rd, rs, rt SUBU rd, rs, rt
Need an ALU
Hook it up to REGISTER FILE
REGFILE has 2 read ports (rs,rt), 1 write port (rd)
rs RdReg1
RdData1
rt RdReg2 Zero?
From
Instruction REGFILE
X
rt rd WrReg Result
RdData2
WrData 0
ALU
ZERO- 1
Control Signals Imm16
Depend Upon RegWrite 16-bits EXTEND ALUop
Instruction Fields ALUsrc
E.g.:
ALUsrc = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization 15
Steps 2 and 3 Destination Register
Must select proper destination, rd or rt
Depends on Instruction Type
R-type may write rd
I-type may write rt
rs RdReg1
RdData1
rt RdReg2 Zero?
From 1 REGFILE
Instruction WrReg RdData2 Result
rd 0 WrData 0
ALU
ZERO- 1
Imm16
RegWrite 16-bits EXTEND ALUop
RegDst
ALUsrc
rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg RdData2 Result Addr
rd 0 WrData 0 RdData 0
ALU
Imm16 SIGN/ 1
ZERO- 1
RegDst RegWrite EXTEND
ALUsrc ALUop MemtoReg
ExtOp
17
55:035 Computer Architecture and Organization
Steps 2 and 3: Store Word
SW rt, rs, Imm16
Need Data Memory: Mem[Addr] ← data
Addr is rs+Imm16, Imm16 is signed, use ALU for +
Store in Mem: Mem[rs+Imm16] ← rt
rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg Result Addr
RdData2
rd 0 WrData 0 RdData 1
ALU
WrData
Imm16 SIGN/ 1
ZERO- 0
RegWrite EXTEND
RegDst
ALUsrc ALUop MemWrite
ExtOp MemtoReg
55:035 Computer Architecture and Organization 18
Writes: Need to Control Timing
Problem: write to data memory
Data can come anytime
Addr must come first
MemWrite must come after Addr
Else? writes to wrong Addr!
How to branch?
BEQ rs, rt, Imm16
Etc…
Add
4
PC Read
address
Instruction Instruction[31:0]
[31:0]
Instruction
Memory
PC: a register
Counter, counts by +4
Provides address to Instruction Memory
55:035 Computer Architecture and Organization 24
Steps 2 and 3: Datapath & Assembly
0
M
u
Add Add x
4 Add 1
result
Shift
Left 2 PCSrc
Instruction[25:21]
PC Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction Instruction[15:11]
Memory
PC: a register
Counter, counts by +4
Instruction[15:0] (Imm16)
Sign/ Sometimes, must add
Zero
16 Extend 32
SignExtend{Imm16||b’00’} for
Note: the sign-extender for Imm16 branch instructions
is already in the datapath
ExtOp
(everything else is new) 25
Steps 2 and 3: Add Previous Datapath
0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2
Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Instruction[15:0] (Imm16)
Sign/
Zero ALU
16
Extend 32 Control
MemWrite
Single-cycle CPU
Every instruction takes 1 clock cycle
Clocking ?
Operation
On rising edge, PC will get new value
Maybe REGFILE will have one value updated as well
After rising edge
PC and REGFILE can’t change
New value out of PC
Instruction out of INSTRMEM
Instruction selects registers to read from REGFILE
Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc Lots to do
ALU does its work
DataMem may be read (depending on instruction)
in only
Result value goes back to REGFILE 1 clock
New PC value goes back to PC
Await next clock edge cycle !!
Implementation Details
How to implement REGFILE?
Read port: tristate buffers? Multiplexer? Memory?
Two read ports: two of above?
Write port: how to write only 1 register?
How to control writes to memory? To register file?
More instructions
Shift instructions
Jump instruction
Etc
Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Sign/
Instruction[15:0] (Imm16) ALU
Zero
16 Extend 32 Control
MemWrite
Add Add
4 Add
result
RegDst
Shift PCSrc
Left 2
Branch
Instruction MemRead
[31:26] Con- MemtoReg
trol ALUOp
MemWrite
ALUSrc
RegWrite
Instruction[25:21] Read
PC Read reg. 1 Read
address data 1
Instruction[20:16]
Read Zero
Instruction
[31:0]
reg. 2 ALU Read
Read ALU Addr-
Write data
Instruction data 2 result ess
Instruction[15:11] reg.
Memory Register Data
Write File Memory
data
Write
data
Sign/
Instruction[15:0] ALU
Zero
Extend control
Instruction[5:0]
1-cycle CPU Control – Lookup Table
Input or Output Signal Name R-format Lw Sw Beq
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Inputs Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0
Outputs MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1
PC + 4 [31..28]
Instruction
[31:26]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Problems?
Every instruction 1 cycle
Some instructions “do more work”
Eg, lw must read from DATAMEM
All instructions must have same clock period…
PC + 4 [31..28]
Instruction
[31:26]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Summary
Operation
1 cycle per instruction
Control signals held fixed during entire cycle (except BRANCH)
Only 2 registers
PC, updated every clock cycle
REGFILE, updated when required
During clock cycle, data flows from register-outputs to register-inputs
Fixed clock frequency / period
Performance
1 instruction per cycle
Slowest instruction determines clock frequency
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
5 Steps
1. Fetch instruction
2. Read registers
3. Compute address
4. Read data
5. Write registers
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
1. Fetch Instruction
InstructionRegister ← Mem[PC]
Load Word Instruction Sequence
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
2. Read Registers
A ← Registers[Rs]
Load Word Instruction Sequence
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
3. Compute Address
ALUOut ← A + {SignExt(Imm16),b’00’}
Load Word Instruction Sequence
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
4. Read Data
MDR ← Memory[ALUOut]
Load Word Instruction Sequence
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
5. Write Registers
Registers[Rt] ← MDR
Load Word Instruction Sequence
PC
M Instruction RdReg1
u Address [25:21] M
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Instruction[5:0]
Missing Steps?
Missing Steps?
Must increment the PC
Do it as part of the instruction fetch (in step 1)
Need PCWrite control signal
3. Overall check
Does the sequence of steps work ?
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
ALU
MemWrite
Control
MemtoReg ALUSrcB
Instruction[5:0]
ALUOp
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
ALU
MemWrite
Control
MemtoReg ALUSrcB
Instruction[5:0]
ALUOp
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Multi-cycle CPU Control: Overview
Control
Signal
Outputs
Control
Signal
Outputs
It is possible to
automatically
convert RTL
into this form !
61
FSM: Gates + FFs Implementation
FSM
High-level
Organization
Inputs
1
Sequencing
Microprogram Counter control
Adder
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Control FSM: Overview
66
Detailed FSM
Instruction
Fetch
67
Detailed FSM: Instruction Fetch
LW SW
69
Detailed FSM: R-Type Instruction
Single-cycle CPU
vs
Multi-cycle CPU
5 clock cycles
Multi-cycle CPU LW
4 clock cycles
Multi-cycle CPU SW, R-type
3 clock cycles
Multi-cycle CPU BEQ, J
What’s really happening?
Single-cycle CPU
Ideally:
Calc
Fetch Decode Memory Write
Addr
( Load Word Instruction )
Multi-cycle CPU
Single-cycle CPU
Calc
Fetch Decode Memory Write
Addr
Multi-cycle CPU
Calc
Fetch Decode Memory
Addr Write
55:035 Computer Architecture and Organization 76
Single-cycle vs Multi-cycle
LW instruction faster for single-cycle
Single-cycle CPU
Calc
Fetch Decode Memory Write
Addr
Multi-cycle CPU
Calc
Fetch Decode Memory Write
Addr
55:035 Computer Architecture and Organization 77
Single-cycle vs Multi-cycle
SW instruction ~ same speed
Single-cycle CPU
Calc
Fetch Decode Memory
Addr
Speed diff
Wasted time!
Multi-cycle CPU
Calc
Fetch Decode Memory
Addr
55:035 Computer Architecture and Organization 78
Single-cycle vs Multi-cycle
BEQ, J instruction faster for multi-cycle
Single-cycle CPU
Calc
Fetch Decode
Addr
Speed diff
Wasted time!
Multi-cycle CPU
Calc
Fetch Decode
Addr
55:035 Computer Architecture and Organization 79
Performance Summary
Which CPU implementation is faster?
LW single-cycle is faster
SW,R-type about the same
BEQ,J multi-cycle is faster
Multi-cycle CPU
<< 1 instruction per cycle (eg, 1MHz 0.2 MIPS)
Small time wasted on most complex instruction
Hence, this instruction always slower than single-cycle CPU
Small time wasted on simple instructions
Eliminates “large wasted time” by using fewer clock cycles
Complex controller (FSM)
Potential to create complex instructions