Sie sind auf Seite 1von 81

55:035

Computer Architecture and Organization

Lecture 9
Outline
 Building a CPU
 Basic Components
 MIPS Instructions
 Basic 5 Steps for CPU
 Single-Cycle Design
 Multi-cycle Design
 Comparison of Single and Multi-cycle Designs

55:035 Computer Architecture and Organization 2


Overview
 Brief look
 Digital logic

 CPU Datapath
 MIPS Example

55:035 Computer Architecture and Organization 3


Digital Logic
D-type Flip-flop Multiplexer
A
D Q 0
Clock
F
(edge- 1
triggered) B

S (Select input)

D-type Flip-flop with Enable

0 D Q
D Q Q EN
D 1 Clock
(edge-
Clock triggered)
EN (edge-
triggered)
(enable)

55:035 Computer Architecture and Organization 4


Digital Logic

1 Bit 4 Bits N Bits

D3 Q3
D Q D2 Q2 D Q
EN D1 Q1 EN
Clock Clock
(edge- D0 Q0 (edge-
triggered) triggered)
EN
Clock
(edge-
triggered)

Registers
55:035 Computer Architecture and Organization 5
Digital Logic
Tri-state Driver (Buffer)
In Drive Out
in out
0 0 Z
1 0 Z
0 1 0
drive 1 1 1

What is Z ??

55:035 Computer Architecture and Organization 6


Digital Logic
Adder/Subtractor or ALU
A B

Add/sub or ALUop
Carry-out Carry-in

55:035 Computer Architecture and Organization 7


Overview
 Brief look
 Digital logic

 How to Design a CPU Datapath


 MIPS Example

55:035 Computer Architecture and Organization 8


Designing a CPU: 5 Steps
 Analyze the instruction set  datapath requirements
 MIPS: ADD, SUB, ORI, LW, SW, BR
 Meaning of each instruction given by RTL (register transfers)
 2 types of registers: CPU/ISA registers, temporary registers

 Datapath requirements  select the datapath components


 ALU, register file, adder, data memory, etc

 Assemble the datapath


 Datapath must support planned register transfers
 Ensure all instructions are supported

 Analyze datapath control required for each instruction


 Assemble the control logic

55:035 Computer Architecture and Organization 9


Step 1a: Analyze ISA
 All MIPS instructions are 32 bits long.
 Three instruction formats:
31 26 21 16 11 6 0
 R-type op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
 I-type 31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
 J-type 31 26 0
op target address
6 bits 26 bits
 R: registers, I: immediate, J: jumps
 These formats intentionally chosen to simplify design

55:035 Computer Architecture and Organization 10


Step 1b: Analyze ISA
31 26 21 16 11 6 0
R- op rs rt rd shamt funct
type 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
31 26 0
J-type op target address
6 bits 26 bits
 Meaning of the fields:
 op: operation of the instruction
 rs, rt, rd: the source and destination register specifiers
 Destination is either rd (R-type), or rt (I-type)
 shamt: shift amount
 funct: selects the variant of the operation in the “op” field
 immediate: address offset or immediate value
 target address: target address of the jump instruction
55:035 Computer Architecture and Organization 11
MIPS ISA: subset for today
 ADD and SUB 31 26 21 16 11 6 0
 addU rd, rs, rt op rs rt rd shamt funct
 subU rd, rs, rt 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

 OR Immediate: 31 26 21 16 0
 ori rt, rs, imm16 op rs rt immediate
6 bits 5 bits 5 bits 16 bits
 LOAD and STORE Word
 lw rt, rs, imm16
 sw rt, rs, imm16 31 26 21 16 0
op rs rt immediate
 BRANCH: 6 bits 5 bits 5 bits 16 bits
 beq rs, rt, imm16
31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits

55:035 Computer Architecture and Organization 12


Step 2: Datapath Requirements
REGISTER FILE RdReg1
Register RdData1
 MIPS ISA requires 32 registers, 32b Numbers RdReg2
each (5 bits ea) REGFILE
 Called a register file WrReg
RdData2
 Contains 32 entries WrData
 Each entry is 32b
How to
 AddU rd,rs,rt or SubU rd,rs,rt
 Read two sources rs, rt implement? RegWrite
 Operation rs + rt or rs – rt
 Write destination rd ← rs+/-rt
Zero?
 Requirements
 Read two registers (rs, rt) Result
 Perform ALU operation
 Write a third register (rd) ALU

ALUop
55:035 Computer Architecture and Organization 13
Step 3: Datapath Assembly
 ADDU rd, rs, rt SUBU rd, rs, rt
 Need an ALU
 Hook it up to REGISTER FILE
 REGFILE has 2 read ports (rs,rt), 1 write port (rd)

Parameters rs RdReg1 Zero?


Come From RdData1
rt RdReg2
Instruction
REGFILE Result
Fields rd WrReg
RdData2
WrData ALU
Control Signals Depend
Upon Instruction Fields ALUop
RegWrite
Eg:
ALUop = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization 14
Steps 2 and 3: ORI Instruction
 ORI rt, rs, Imm16
 Need new ALUop for ‘OR’ function, hook up to REGFILE
 1 read port (rs), 1 write port (rt), 1 const value (Imm16)

rs RdReg1
RdData1
rt RdReg2 Zero?
From
Instruction REGFILE
X
rt rd WrReg Result
RdData2
WrData 0
ALU
ZERO- 1
Control Signals Imm16
Depend Upon RegWrite 16-bits EXTEND ALUop
Instruction Fields ALUsrc

E.g.:
ALUsrc = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization 15
Steps 2 and 3 Destination Register
 Must select proper destination, rd or rt
 Depends on Instruction Type
 R-type may write rd
 I-type may write rt

rs RdReg1
RdData1
rt RdReg2 Zero?
From 1 REGFILE
Instruction WrReg RdData2 Result
rd 0 WrData 0
ALU
ZERO- 1
Imm16
RegWrite 16-bits EXTEND ALUop
RegDst
ALUsrc

55:035 Computer Architecture and Organization 16


Steps 2 and 3: Load Word
 LW rt, rs, Imm16
 Need Data Memory: data ← Mem[Addr]
 Addr is rs+Imm16, Imm16 is signed, use ALU for +
 Store in rt: rt ← Mem[rs+Imm16]

rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg RdData2 Result Addr
rd 0 WrData 0 RdData 0
ALU
Imm16 SIGN/ 1
ZERO- 1
RegDst RegWrite EXTEND
ALUsrc ALUop MemtoReg

ExtOp
17
55:035 Computer Architecture and Organization
Steps 2 and 3: Store Word
 SW rt, rs, Imm16
 Need Data Memory: Mem[Addr] ← data
 Addr is rs+Imm16, Imm16 is signed, use ALU for +
 Store in Mem: Mem[rs+Imm16] ← rt

rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg Result Addr
RdData2
rd 0 WrData 0 RdData 1
ALU
WrData
Imm16 SIGN/ 1
ZERO- 0
RegWrite EXTEND
RegDst
ALUsrc ALUop MemWrite

ExtOp MemtoReg
55:035 Computer Architecture and Organization 18
Writes: Need to Control Timing
 Problem: write to data memory
 Data can come anytime
 Addr must come first
 MemWrite must come after Addr
 Else? writes to wrong Addr!

 Solution: use ideal data memory


 Assume everything works ok
 How to fix this for real?
 One solution: synchronous memory
 Another solution: delay MemWr to come late

 Problems?: write to register file


 Does RegWrite signal come after WrReg number?
 When does the write to a register happen?
 Read from same register as being written?

55:035 Computer Architecture and Organization 19


Missing Pieces: Instruction Fetching
 Where does the Instruction come from?
 From instruction memory, of course!

 Recall: stored-program concept


 Alternatives? How about hard-coding wires and switches…? This
is how ENIAC was programmed!

 How to branch?
 BEQ rs, rt, Imm16

55:035 Computer Architecture and Organization 20


Instruction Processing
 Fetch instruction
 Execute instruction

 Fetch next instruction


 Execute next instruction

 Fetch next instruction


 Execute next instruction

 Etc…

 How to maintain sequence? Use a counter!


 Branches (out of sequence) ? Load the counter!

55:035 Computer Architecture and Organization 21


Instruction Processing
 Program Counter
 Points to current instruction

 Address to instruction memory


 Instr ← InstrMem[PC]

 Next instruction: counts up by 4


 Remember: memory is byte-addressable, instructions are 4 bytes
 PC ← PC + 4

 Branch instruction: replace PC contents

55:035 Computer Architecture and Organization 22


Step 1: Analyze Instructions
 Register Transfer Language…
op | rs | rt | rd | shamt | funct = InstrMem[ PC ]
op | rs | rt | Imm16 = InstrMem[ PC ]

Instr Register Transfers

ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4

SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4

BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ }


else
PC ← PC + 4
55:035 Computer Architecture and Organization 23
Steps 2 and 3: Datapath & Assembly

Add
4

PC Read
address

Instruction Instruction[31:0]
[31:0]
Instruction
Memory

 PC: a register
 Counter, counts by +4
 Provides address to Instruction Memory
55:035 Computer Architecture and Organization 24
Steps 2 and 3: Datapath & Assembly

0
M
u
Add Add x
4 Add 1
result
Shift
Left 2 PCSrc

Instruction[25:21]
PC Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction Instruction[15:11]
Memory
PC: a register
 Counter, counts by +4
Instruction[15:0] (Imm16)
Sign/  Sometimes, must add
Zero
16 Extend 32
SignExtend{Imm16||b’00’} for
Note: the sign-extender for Imm16 branch instructions
is already in the datapath
ExtOp
(everything else is new) 25
Steps 2 and 3: Add Previous Datapath

0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2

Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Instruction[15:0] (Imm16)
Sign/
Zero ALU
16
Extend 32 Control
MemWrite

Instruction[5:0] (funct) ExtOp


ALUOp
What have we done?
 Created a simple CPU datapath
 Control still missing (next slide)

 Single-cycle CPU
 Every instruction takes 1 clock cycle
 Clocking ?

55:035 Computer Architecture and Organization 27


One Clock Cycle
 Clock Locations
 PC, REGFILE have clocks

 Operation
 On rising edge, PC will get new value
 Maybe REGFILE will have one value updated as well
 After rising edge
 PC and REGFILE can’t change
 New value out of PC
 Instruction out of INSTRMEM
 Instruction selects registers to read from REGFILE
 Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc Lots to do


ALU does its work
DataMem may be read (depending on instruction)
in only
 Result value goes back to REGFILE 1 clock
 New PC value goes back to PC
 Await next clock edge cycle !!

55:035 Computer Architecture and Organization 28


Missing Steps?
 Control is missing (Steps 4 and 5 we mentioned earlier)
 Generate the green signals
 ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
 These are all f(Instruction), where f() is a logic expression
 Will look at control strategies in upcoming lecture

 Implementation Details
 How to implement REGFILE?
 Read port: tristate buffers? Multiplexer? Memory?
 Two read ports: two of above?
 Write port: how to write only 1 register?
 How to control writes to memory? To register file?

 More instructions
 Shift instructions
 Jump instruction
 Etc

55:035 Computer Architecture and Organization 29


1-Cycle CPU Datapath
0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2

Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Sign/
Instruction[15:0] (Imm16) ALU
Zero
16 Extend 32 Control
MemWrite

Instruction[5:0] (funct) ExtOp


ALUOp
1-cycle CPU Datapath + Control

Add Add
4 Add
result
RegDst
Shift PCSrc
Left 2
Branch
Instruction MemRead
[31:26] Con- MemtoReg
trol ALUOp
MemWrite
ALUSrc
RegWrite

Instruction[25:21] Read
PC Read reg. 1 Read
address data 1
Instruction[20:16]
Read Zero
Instruction
[31:0]
reg. 2 ALU Read
Read ALU Addr-
Write data
Instruction data 2 result ess
Instruction[15:11] reg.
Memory Register Data
Write File Memory
data
Write
data
Sign/
Instruction[15:0] ALU
Zero
Extend control

Instruction[5:0]
1-cycle CPU Control – Lookup Table
Input or Output Signal Name R-format Lw Sw Beq
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Inputs Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0

Outputs MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1

 Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.


1-cycle CPU + Jump Instruction
Instruction[25:0] Jump address [31..0]

PC + 4 [31..28]

Instruction
[31:26]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]
1-cycle CPU Problems?
 Every instruction 1 cycle
 Some instructions “do more work”
 Eg, lw must read from DATAMEM
 All instructions must have same clock period…

 Many instructions run slower than necessary

 Tricky timing on MemWrite, RegWrite(?) signals


 Write signal must come *after* address is stable

 Need extra resources…


 PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM

55:035 Computer Architecture and Organization 34


Performance!
 Single-Cycle CPU Performance
 Execute one instruction per clock cycle (CPI=1)
 Clock cycle time? Note dataflow includes:
 INSTRMEM read
 REGFILE access
 Sign extension
 ALU operation
 DATAMEM read
 REGFILE/PC write
 Not every instruction uses all resources (eg, DATAMEM read)
 Can we change clock period for each instruction?
 No! (Why not?)
 One clock period: the worst case!
 This is why a single-cycle CPU is not good for performance

55:035 Computer Architecture and Organization 35


1-cycle CPU Datapath + Controller
Instruction[25:0] Jump address [31..0]

PC + 4 [31..28]

Instruction
[31:26]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]
1-cycle CPU Summary
 Operation
 1 cycle per instruction
 Control signals held fixed during entire cycle (except BRANCH)
 Only 2 registers
 PC, updated every clock cycle
 REGFILE, updated when required
 During clock cycle, data flows from register-outputs to register-inputs
 Fixed clock frequency / period

 Performance
 1 instruction per cycle
 Slowest instruction determines clock frequency

 Outstanding issue: MemWrite timing


 Assume this signal writes to memory at end of clock cycle

55:035 Computer Architecture and Organization 37


Multi-cycle CPU Goals
 Improve performance
 Break each instruction into smaller steps / multiple cycles
 LW instruction  5 cycles
 SW instruction  4 cycles
 R-type instruction  4 cycles
 Branch, Jump  3 cycles
 Aim for 5x clock frequency
 Complex instructions (eg, LW)  5 cycles  same performance as before
 Simple instructions (eg, ADD)  fewer cycles  faster

 Save resources (gates/transistors)


 Re-use ALU over multiple cycles
 Put INSTR + DATA in same memory

 MemWrite timing solved?

55:035 Computer Architecture and Organization 38


Multi-cycle CPU Datapath

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

 Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB)


 Move signal paths (+4, Shift Left 2)
Multi-cycle CPU Datapath

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

 Add registers + control signals (IR, MDR, A, B, ALUOut)


 Registers with no control signal load value every clock cycle (eg, PC)
Instruction Execution Example
 Execute a “Load Word” instruction
 LW rt, 0(rs)

 5 Steps
1. Fetch instruction
2. Read registers
3. Compute address
4. Read data
5. Write registers

55:035 Computer Architecture and Organization 41


Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

1. Fetch Instruction
InstructionRegister ← Mem[PC]
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

2. Read Registers
A ← Registers[Rs]
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

3. Compute Address
ALUOut ← A + {SignExt(Imm16),b’00’}
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

4. Read Data
MDR ← Memory[ALUOut]
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

5. Write Registers
Registers[Rt] ← MDR
Load Word Instruction Sequence

PC
M Instruction RdReg1
u Address [25:21] M
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

All 5 Steps Shown


Multi-cycle Load Word: Recap
1. Fetch Instruction InstructionRegister ← Mem[PC]

2. Read RegistersA ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

 Missing Steps?

55:035 Computer Architecture and Organization 48


Multi-cycle Load Word: Recap
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read RegistersA ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

 Missing Steps?
 Must increment the PC
 Do it as part of the instruction fetch (in step 1)
 Need PCWrite control signal

55:035 Computer Architecture and Organization 49


Multi-cycle R-Type Instruction
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]

3. Compute Value ALUOut ← A op B

4. Write Registers Registers[Rd] ← ALUOut

 RTL describes data flow action in each clock cycle


 Control signals determine precise data flow
 Each step implies unique control values

55:035 Computer Architecture and Organization 50


Multi-cycle R-Type Instruction:
Control Signal Values
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
MemRead=1, ALUSrcA=0, IorD=0, IRWrite,
ALUSrcB=01, ALUop=00, PCWrite, PCSource=00

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]


ALUSrcA=0, ALUSrcB=11, ALUop=00

3. Compute Value ALUOut ← A op B


ALUSrcA=1, ALUSrcB=00, ALUop=10

4. Write Registers Registers[Rd] ← ALUOut


RegDst=1, RegWrite, MemtoReg=0

 Each step implies unique control values


 Fixed for entire cycle
 “Default value” implied if unspecified

55:035 Computer Architecture and Organization 51


Check Your Work – Is RTL Valid ?
1. Datapath check
 Within one cycle…
 Each cycle has valid data flow path (path exists)
 Each register gets only one new value
 Across multiple cycles…
 Register value is defined before use in previous (earlier in time) clock cycle
 Eg, “A  3” must occur before “B  A”
 Make sure register value doesn’t disappear if set >1 cycle earlier

2. Control signal check


 Each cycle, RTL describing the datapath flow implies a value for each control
signal
 0 or 1 or default or don’t care
 Each control signal gets only one fixed value the entire cycle

3. Overall check
 Does the sequence of steps work ?

55:035 Computer Architecture and Organization 52


Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute Target


A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branch


if( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

55:035 Computer Architecture and Organization 53


Multi-cycle Datapath with Control Signals
PCSrc
PCWrite IRWrite

IorD RegWrite ALUSrcA


Jump
MemRead address
[31..0]
Instr[25:0]
RegDst PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

ALU
MemWrite
Control

MemtoReg ALUSrcB
Instruction[5:0]
ALUOp

55:035 Computer Architecture and Organization 54


Multi-cycle Datapath with Controller

Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

Instruction[5:0]
Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute Target


A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branch


if( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

55:035 Computer Architecture and Organization 56


Multi-cycle Datapath with Control Signals
PCSrc
PCWrite IRWrite

IorD RegWrite ALUSrcA


Jump
MemRead address
[31..0]
Instr[25:0]
RegDst PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

ALU
MemWrite
Control

MemtoReg ALUSrcB
Instruction[5:0]
ALUOp

55:035 Computer Architecture and Organization 57


Multi-cycle Datapath with Controller

Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

Instruction[5:0]
Multi-cycle CPU Control: Overview

Control
Signal
Outputs

Control
Signal
Outputs

 General approach: Finite State Machine (FSM)


 Need details in each branch of control…
 Precise outputs for each state (Mealy depends on inputs, Moore does not)
 Precise “next state” for each state (can depend on inputs)

55:035 Computer Architecture and Organization 59


How to Implement FSM ?
 Manually with logic gates + FFs
 Bubble diagram, next-state table, state assignment
 Karnaugh map for each state bit, each output bit (painful!)

 High-level language description (eg, Verilog, VHDL)


 Describe FSM bubble diagram (next-states, output values)
 Automatically synthesized into gates + FFs

 Microcode (µ-code) description


 Sequence through many µ-ops for each CPU instruction
 One µ-op (µ-instruction) sends correct control signal for 1 cycle
 µ-op similar to one bubble in FSM
 Acts like a mini-CPU within a CPU
 µPC: microcode program counter
 Microcode storage memory contains µ-ops
 Can look similar to RTL or some new “assembly language”

55:035 Computer Architecture and Organization 60


FSM Specification: Bubble Diagram
Can build this
by examining
RTL

It is possible to
automatically
convert RTL
into this form !

61
FSM: Gates + FFs Implementation

FSM
High-level
Organization

55:035 Computer Architecture and Organization 62


FSM: Microcode Implementation
Microcode
Storage
(memory) Datapath
control
Outputs
outputs

Inputs
1

Sequencing
Microprogram Counter control
Adder

Address Select Logic

Inputs from instruction


register opcode field

55:035 Computer Architecture and Organization 63


Multi-cycle CPU with Control FSM
Conditional
Branch
FSM
Control
Outputs
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

Instruction[5:0]
Control FSM: Overview

 General approach: Finite State Machine (FSM)


 Need details in each branch of control…

55:035 Computer Architecture and Organization 65


Detailed FSM

66
Detailed FSM
Instruction
Fetch

R-Type Branch Jump


Memory
Reference

67
Detailed FSM: Instruction Fetch

55:035 Computer Architecture and Organization 68


Detailed FSM: Memory Reference

LW SW

69
Detailed FSM: R-Type Instruction

55:035 Computer Architecture and Organization 70


Detailed FSM: Branch Instruction

55:035 Computer Architecture and Organization 71


Detailed FSM: Jump Instruction

55:035 Computer Architecture and Organization 72


Performance Comparison

Single-cycle CPU
vs
Multi-cycle CPU

55:035 Computer Architecture and Organization 73


Simple Comparison
1 clock cycle
Single-cycle CPU All

5 clock cycles
Multi-cycle CPU LW

4 clock cycles
Multi-cycle CPU SW, R-type

3 clock cycles
Multi-cycle CPU BEQ, J
What’s really happening?
Single-cycle CPU

Ideally:
Calc
Fetch Decode Memory Write
Addr
( Load Word Instruction )

Multi-cycle CPU

55:035 Computer Architecture and Organization 75


In practice, steps differ in speeds…
Load Word Instruction

Single-cycle CPU
Calc
Fetch Decode Memory Write
Addr

Wasted time! Violation!

Multi-cycle CPU
Calc
Fetch Decode Memory
Addr Write
55:035 Computer Architecture and Organization 76
Single-cycle vs Multi-cycle
LW instruction faster for single-cycle
Single-cycle CPU
Calc
Fetch Decode Memory Write
Addr

Now wasted time is larger! Violation fixed!

Multi-cycle CPU
Calc
Fetch Decode Memory Write
Addr
55:035 Computer Architecture and Organization 77
Single-cycle vs Multi-cycle
SW instruction ~ same speed
Single-cycle CPU
Calc
Fetch Decode Memory
Addr
Speed diff

Wasted time!

Multi-cycle CPU
Calc
Fetch Decode Memory
Addr
55:035 Computer Architecture and Organization 78
Single-cycle vs Multi-cycle
BEQ, J instruction faster for multi-cycle
Single-cycle CPU
Calc
Fetch Decode
Addr
Speed diff

Wasted time!

Multi-cycle CPU
Calc
Fetch Decode
Addr
55:035 Computer Architecture and Organization 79
Performance Summary
 Which CPU implementation is faster?
 LW  single-cycle is faster
 SW,R-type  about the same
 BEQ,J  multi-cycle is faster

 Real programs use a mix of these instructions

 Overall performance depends instruction frequency !

55:035 Computer Architecture and Organization 80


Implementation Summary
 Single-cycle CPU
 1 instruction per cycle (eg, 1MHz  1 MIPS)
 No “wasted time” on most complex instruction
 Large wasted time on simpler instructions
 Simple controller (just a lookup table or memory)
 Simple instructions

 Multi-cycle CPU
 << 1 instruction per cycle (eg, 1MHz  0.2 MIPS)
 Small time wasted on most complex instruction
 Hence, this instruction always slower than single-cycle CPU
 Small time wasted on simple instructions
 Eliminates “large wasted time” by using fewer clock cycles
 Complex controller (FSM)
 Potential to create complex instructions

55:035 Computer Architecture and Organization 81

Das könnte Ihnen auch gefallen