Computer Architecture and Organization

55:035
Computer Architecture and Organization
Lecture 9
Outline
 Building a CPU
 Basic Components
 MIPS Instructions
 Basic 5 Steps for CPU
 Single-Cycle Design
 Multi-cycle Design
 Comparison of Single and Multi-cycle Designs
55:035 Computer Architecture and Organization 2

Overview
 Brief look
 Digital logic
 CPU Datapath
 MIPS Example

Digital Logic
D-type Flip-flop Multiplexer
A
D Q 0
Clock
F
(edge- 1
triggered) B
S (Select input)
D-type Flip-flop with Enable
0 D Q
D Q Q EN
D 1 Clock
(edge-
Clock triggered)
EN (edge-
triggered)
(enable)

Digital Logic
1 Bit 4 Bits N Bits
D3 Q3
D Q D2 Q2 D Q
EN D1 Q1 EN
Clock Clock
(edge- D0 Q0 (edge-
triggered) triggered)
EN
Clock
(edge-
triggered)
Registers
Digital Logic
Tri-state Driver (Buffer)
In Drive Out
in out
0 0 Z
1 0 Z
0 1 0
drive 1 1 1
What is Z ??

Digital Logic
Adder/Subtractor or ALU
A B
Add/sub or ALUop
Carry-out Carry-in

Overview
 Brief look
 Digital logic
 How to Design a CPU Datapath

 MIPS Example

Designing a CPU: 5 Steps
 Analyze the instruction set  datapath requirements
 MIPS: ADD, SUB, ORI, LW, SW, BR
 Meaning of each instruction given by RTL (register transfers)
 2 types of registers: CPU/ISA registers, temporary registers
 Datapath requirements  select the datapath components

 ALU, register file, adder, data memory, etc
 Assemble the datapath

 Datapath must support planned register transfers
 Ensure all instructions are supported
 Analyze datapath control required for each instruction

 Assemble the control logic

Step 1a: Analyze ISA
 All MIPS instructions are 32 bits long.
 Three instruction formats:
31 26 21 16 11 6 0
 R-type op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
 I-type 31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
 J-type 31 26 0
op target address
6 bits 26 bits
 R: registers, I: immediate, J: jumps
 These formats intentionally chosen to simplify design

Step 1b: Analyze ISA
31 26 21 16 11 6 0
R- op rs rt rd shamt funct
type 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
I-type op rs rt immediate
31 26 0
J-type op target address
6 bits 26 bits
 Meaning of the fields:
 op: operation of the instruction
 rs, rt, rd: the source and destination register specifiers
 Destination is either rd (R-type), or rt (I-type)
 shamt: shift amount
 funct: selects the variant of the operation in the “op” field
 immediate: address offset or immediate value
 target address: target address of the jump instruction
MIPS ISA: subset for today
 ADD and SUB 31 26 21 16 11 6 0
 addU rd, rs, rt op rs rt rd shamt funct
 subU rd, rs, rt 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
 OR Immediate: 31 26 21 16 0
 ori rt, rs, imm16 op rs rt immediate
 LOAD and STORE Word
 lw rt, rs, imm16
 sw rt, rs, imm16 31 26 21 16 0
op rs rt immediate
 BRANCH: 6 bits 5 bits 5 bits 16 bits
 beq rs, rt, imm16
31 26 21 16 0
op rs rt immediate

Step 2: Datapath Requirements
REGISTER FILE RdReg1
Register RdData1
 MIPS ISA requires 32 registers, 32b Numbers RdReg2
each (5 bits ea) REGFILE
 Called a register file WrReg
RdData2
 Contains 32 entries WrData
 Each entry is 32b
How to
 AddU rd,rs,rt or SubU rd,rs,rt
 Read two sources rs, rt implement? RegWrite
 Operation rs + rt or rs – rt
 Write destination rd ← rs+/-rt
Zero?
 Requirements
 Read two registers (rs, rt) Result
 Perform ALU operation
 Write a third register (rd) ALU
ALUop
Step 3: Datapath Assembly
 ADDU rd, rs, rt SUBU rd, rs, rt
 Need an ALU
 Hook it up to REGISTER FILE
 REGFILE has 2 read ports (rs,rt), 1 write port (rd)
Parameters rs RdReg1 Zero?

Come From RdData1
rt RdReg2
Instruction
REGFILE Result
Fields rd WrReg
RdData2
WrData ALU
Control Signals Depend
Upon Instruction Fields ALUop
RegWrite
Eg:
ALUop = f(Instruction)
= f(op, funct)
Steps 2 and 3: ORI Instruction
 ORI rt, rs, Imm16
 Need new ALUop for ‘OR’ function, hook up to REGFILE
 1 read port (rs), 1 write port (rt), 1 const value (Imm16)
rs RdReg1
RdData1
rt RdReg2 Zero?
From
Instruction REGFILE
X
rt rd WrReg Result
RdData2
WrData 0
ALU
ZERO- 1
Control Signals Imm16
Depend Upon RegWrite 16-bits EXTEND ALUop
Instruction Fields ALUsrc
E.g.:
ALUsrc = f(Instruction)
= f(op, funct)
Steps 2 and 3 Destination Register
 Must select proper destination, rd or rt
 Depends on Instruction Type
 R-type may write rd
 I-type may write rt
rs RdReg1
RdData1
rt RdReg2 Zero?
From 1 REGFILE
Instruction WrReg RdData2 Result
rd 0 WrData 0
ALU
ZERO- 1
Imm16
RegWrite 16-bits EXTEND ALUop
RegDst
ALUsrc

Steps 2 and 3: Load Word
 LW rt, rs, Imm16
 Need Data Memory: data ← Mem[Addr]
 Addr is rs+Imm16, Imm16 is signed, use ALU for +
 Store in rt: rt ← Mem[rs+Imm16]
rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg RdData2 Result Addr
rd 0 WrData 0 RdData 0
ALU
Imm16 SIGN/ 1
ZERO- 1
RegDst RegWrite EXTEND
ALUsrc ALUop MemtoReg
ExtOp
17
55:035 Computer Architecture and Organization
Steps 2 and 3: Store Word
 SW rt, rs, Imm16
 Need Data Memory: Mem[Addr] ← data
 Addr is rs+Imm16, Imm16 is signed, use ALU for +
 Store in Mem: Mem[rs+Imm16] ← rt
rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg Result Addr
RdData2
rd 0 WrData 0 RdData 1
ALU
WrData
Imm16 SIGN/ 1
ZERO- 0
RegWrite EXTEND
RegDst
ALUsrc ALUop MemWrite
ExtOp MemtoReg
Writes: Need to Control Timing
 Problem: write to data memory
 Data can come anytime
 Addr must come first
 MemWrite must come after Addr
 Else? writes to wrong Addr!
 Solution: use ideal data memory

 Assume everything works ok
 How to fix this for real?
 One solution: synchronous memory
 Another solution: delay MemWr to come late
 Problems?: write to register file

 Does RegWrite signal come after WrReg number?
 When does the write to a register happen?
 Read from same register as being written?

Missing Pieces: Instruction Fetching
 Where does the Instruction come from?
 From instruction memory, of course!
 Recall: stored-program concept

 Alternatives? How about hard-coding wires and switches…? This
is how ENIAC was programmed!
 How to branch?
 BEQ rs, rt, Imm16

Instruction Processing
 Fetch instruction
 Execute instruction
 Fetch next instruction

 Execute next instruction
 Fetch next instruction

 Execute next instruction
 Etc…
 How to maintain sequence? Use a counter!

 Branches (out of sequence) ? Load the counter!

Instruction Processing
 Program Counter
 Points to current instruction
 Address to instruction memory

 Instr ← InstrMem[PC]
 Next instruction: counts up by 4

 Remember: memory is byte-addressable, instructions are 4 bytes
 PC ← PC + 4
 Branch instruction: replace PC contents

Step 1: Analyze Instructions
 Register Transfer Language…
op | rs | rt | rd | shamt | funct = InstrMem[ PC ]
op | rs | rt | Imm16 = InstrMem[ PC ]
Instr Register Transfers
ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4
SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4
ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4
LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ }

else
PC ← PC + 4
Steps 2 and 3: Datapath & Assembly
Add
4
PC Read
address
Instruction Instruction[31:0]
[31:0]
Instruction
Memory
 PC: a register
 Counter, counts by +4
 Provides address to Instruction Memory
Steps 2 and 3: Datapath & Assembly
0
M
u
Add Add x
4 Add 1
result
Shift
Left 2 PCSrc
Instruction[25:21]
PC Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction Instruction[15:11]
Memory
PC: a register
 Counter, counts by +4
Instruction[15:0] (Imm16)
Sign/  Sometimes, must add
Zero
16 Extend 32
SignExtend{Imm16||b’00’} for
Note: the sign-extender for Imm16 branch instructions
is already in the datapath
ExtOp
(everything else is new) 25
Steps 2 and 3: Add Previous Datapath
0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2
Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Instruction[15:0] (Imm16)
Sign/
Zero ALU
16
Extend 32 Control
MemWrite
Instruction[5:0] (funct) ExtOp

ALUOp
What have we done?
 Created a simple CPU datapath
 Control still missing (next slide)
 Single-cycle CPU
 Every instruction takes 1 clock cycle
 Clocking ?

One Clock Cycle
 Clock Locations
 PC, REGFILE have clocks
 Operation
 On rising edge, PC will get new value
 Maybe REGFILE will have one value updated as well
 After rising edge
 PC and REGFILE can’t change
 New value out of PC
 Instruction out of INSTRMEM
 Instruction selects registers to read from REGFILE
 Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc Lots to do


ALU does its work
DataMem may be read (depending on instruction)
in only
 Result value goes back to REGFILE 1 clock
 New PC value goes back to PC
 Await next clock edge cycle !!

Missing Steps?
 Control is missing (Steps 4 and 5 we mentioned earlier)
 Generate the green signals
 ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
 These are all f(Instruction), where f() is a logic expression
 Will look at control strategies in upcoming lecture
 Implementation Details
 How to implement REGFILE?
 Read port: tristate buffers? Multiplexer? Memory?
 Two read ports: two of above?
 Write port: how to write only 1 register?
 How to control writes to memory? To register file?
 More instructions
 Shift instructions
 Jump instruction
 Etc

1-Cycle CPU Datapath
0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2
Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Sign/
Instruction[15:0] (Imm16) ALU
Zero
16 Extend 32 Control
MemWrite
Instruction[5:0] (funct) ExtOp

ALUOp
1-cycle CPU Datapath + Control
Add Add
4 Add
result
RegDst
Shift PCSrc
Left 2
Branch
Instruction MemRead
[31:26] Con- MemtoReg
trol ALUOp
MemWrite
ALUSrc
RegWrite
Instruction[25:21] Read
PC Read reg. 1 Read
address data 1
Instruction[20:16]
Read Zero
Instruction
[31:0]
reg. 2 ALU Read
Read ALU Addr-
Write data
Instruction data 2 result ess
Instruction[15:11] reg.
Memory Register Data
Write File Memory
data
Write
data
Sign/
Instruction[15:0] ALU
Zero
Extend control
Instruction[5:0]
1-cycle CPU Control – Lookup Table
Input or Output Signal Name R-format Lw Sw Beq
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Inputs Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0
Outputs MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1
 Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.

1-cycle CPU + Jump Instruction
Instruction[25:0] Jump address [31..0]
PC + 4 [31..28]
Instruction
[31:26]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Problems?
 Every instruction 1 cycle
 Some instructions “do more work”
 Eg, lw must read from DATAMEM
 All instructions must have same clock period…
 Many instructions run slower than necessary
 Tricky timing on MemWrite, RegWrite(?) signals

 Write signal must come *after* address is stable
 Need extra resources…

 PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM

Performance!
 Single-Cycle CPU Performance
 Execute one instruction per clock cycle (CPI=1)
 Clock cycle time? Note dataflow includes:
 INSTRMEM read
 REGFILE access
 Sign extension
 ALU operation
 DATAMEM read
 REGFILE/PC write
 Not every instruction uses all resources (eg, DATAMEM read)
 Can we change clock period for each instruction?
 No! (Why not?)
 One clock period: the worst case!
 This is why a single-cycle CPU is not good for performance

1-cycle CPU Datapath + Controller
Instruction[25:0] Jump address [31..0]
PC + 4 [31..28]
Instruction
[31:26]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Summary
 Operation
 1 cycle per instruction
 Control signals held fixed during entire cycle (except BRANCH)
 Only 2 registers
 PC, updated every clock cycle
 REGFILE, updated when required
 During clock cycle, data flows from register-outputs to register-inputs
 Fixed clock frequency / period
 Performance
 1 instruction per cycle
 Slowest instruction determines clock frequency
 Outstanding issue: MemWrite timing

 Assume this signal writes to memory at end of clock cycle

Multi-cycle CPU Goals
 Improve performance
 Break each instruction into smaller steps / multiple cycles
 LW instruction  5 cycles
 SW instruction  4 cycles
 R-type instruction  4 cycles
 Branch, Jump  3 cycles
 Aim for 5x clock frequency
 Complex instructions (eg, LW)  5 cycles  same performance as before
 Simple instructions (eg, ADD)  fewer cycles  faster
 Save resources (gates/transistors)

 Re-use ALU over multiple cycles
 Put INSTR + DATA in same memory
 MemWrite timing solved?

Multi-cycle CPU Datapath
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
 Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB)

 Move signal paths (+4, Shift Left 2)
Multi-cycle CPU Datapath
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
 Add registers + control signals (IR, MDR, A, B, ALUOut)

 Registers with no control signal load value every clock cycle (eg, PC)
Instruction Execution Example
 Execute a “Load Word” instruction
 LW rt, 0(rs)
 5 Steps
1. Fetch instruction
2. Read registers
3. Compute address
4. Read data
5. Write registers

Load Word Instruction Sequence
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
1. Fetch Instruction
InstructionRegister ← Mem[PC]
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
2. Read Registers
A ← Registers[Rs]
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
3. Compute Address
ALUOut ← A + {SignExt(Imm16),b’00’}
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
4. Read Data
MDR ← Memory[ALUOut]
PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
5. Write Registers
Registers[Rt] ← MDR
PC
M Instruction RdReg1
u Address [25:21] M
x RdData1 A u
Instruction x
M
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x
Memory Sign Shift

Data Extend Left 2
Register
Instruction[5:0]
All 5 Steps Shown

Multi-cycle Load Word: Recap
1. Fetch Instruction InstructionRegister ← Mem[PC]
2. Read RegistersA ← Registers[Rs]
3. Compute Address ALUOut ← A + {SignExt(Imm16)}
4. Read Data MDR ← Memory[ALUOut]
5. Write Registers Registers[Rt] ← MDR
 Missing Steps?


Multi-cycle Load Word: Recap
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read RegistersA ← Registers[Rs]
3. Compute Address ALUOut ← A + {SignExt(Imm16)}
4. Read Data MDR ← Memory[ALUOut]
5. Write Registers Registers[Rt] ← MDR
 Missing Steps?
 Must increment the PC
 Do it as part of the instruction fetch (in step 1)
 Need PCWrite control signal

Multi-cycle R-Type Instruction
2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]
3. Compute Value ALUOut ← A op B
4. Write Registers Registers[Rd] ← ALUOut
 RTL describes data flow action in each clock cycle

 Control signals determine precise data flow
 Each step implies unique control values

Multi-cycle R-Type Instruction:
Control Signal Values
MemRead=1, ALUSrcA=0, IorD=0, IRWrite,
ALUSrcB=01, ALUop=00, PCWrite, PCSource=00
2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]

ALUSrcA=0, ALUSrcB=11, ALUop=00
3. Compute Value ALUOut ← A op B

ALUSrcA=1, ALUSrcB=00, ALUop=10
4. Write Registers Registers[Rd] ← ALUOut

RegDst=1, RegWrite, MemtoReg=0
 Each step implies unique control values

 Fixed for entire cycle
 “Default value” implied if unspecified

Check Your Work – Is RTL Valid ?
1. Datapath check
 Within one cycle…
 Each cycle has valid data flow path (path exists)
 Each register gets only one new value
 Across multiple cycles…
 Register value is defined before use in previous (earlier in time) clock cycle
 Eg, “A  3” must occur before “B  A”
 Make sure register value doesn’t disappear if set >1 cycle earlier
2. Control signal check

 Each cycle, RTL describing the datapath flow implies a value for each control
signal
 0 or 1 or default or don’t care
 Each control signal gets only one fixed value the entire cycle
3. Overall check
 Does the sequence of steps work ?

Multi-cycle BEQ Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers, Precompute Target

A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branch

if( (A – B) ==0 ) PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)

Multi-cycle Datapath with Control Signals
PCSrc
PCWrite IRWrite
IorD RegWrite ALUSrcA

Jump
MemRead address
[31..0]
Instr[25:0]
RegDst PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
ALU
MemWrite
Control
MemtoReg ALUSrcB
Instruction[5:0]
ALUOp

Multi-cycle Datapath with Controller
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Multi-cycle BEQ Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers, Precompute Target

A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branch

if( (A – B) ==0 ) PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)

Multi-cycle Datapath with Control Signals
PCSrc
PCWrite IRWrite
IorD RegWrite ALUSrcA

Jump
MemRead address
[31..0]
Instr[25:0]
RegDst PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
ALU
MemWrite
Control
MemtoReg ALUSrcB
Instruction[5:0]
ALUOp

Multi-cycle Datapath with Controller
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Multi-cycle CPU Control: Overview
Control
Signal
Outputs
Control
Signal
Outputs
 General approach: Finite State Machine (FSM)

 Need details in each branch of control…
 Precise outputs for each state (Mealy depends on inputs, Moore does not)
 Precise “next state” for each state (can depend on inputs)

How to Implement FSM ?
 Manually with logic gates + FFs
 Bubble diagram, next-state table, state assignment
 Karnaugh map for each state bit, each output bit (painful!)
 High-level language description (eg, Verilog, VHDL)

 Describe FSM bubble diagram (next-states, output values)
 Automatically synthesized into gates + FFs
 Microcode (µ-code) description

 Sequence through many µ-ops for each CPU instruction
 One µ-op (µ-instruction) sends correct control signal for 1 cycle
 µ-op similar to one bubble in FSM
 Acts like a mini-CPU within a CPU
 µPC: microcode program counter
 Microcode storage memory contains µ-ops
 Can look similar to RTL or some new “assembly language”

FSM Specification: Bubble Diagram
Can build this
by examining
RTL
It is possible to
automatically
convert RTL
into this form !
61
FSM: Gates + FFs Implementation
FSM
High-level
Organization

FSM: Microcode Implementation
Microcode
Storage
(memory) Datapath
control
Outputs
outputs
Inputs
1
Sequencing
Microprogram Counter control
Adder
Address Select Logic
Inputs from instruction

register opcode field

Multi-cycle CPU with Control FSM
Conditional
Branch
FSM
Control
Outputs
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Control FSM: Overview
 General approach: Finite State Machine (FSM)

 Need details in each branch of control…

Detailed FSM
66
Detailed FSM
Instruction
Fetch
R-Type Branch Jump

Memory
Reference
67
Detailed FSM: Instruction Fetch

Detailed FSM: Memory Reference
LW SW
69
Detailed FSM: R-Type Instruction

Detailed FSM: Branch Instruction

Detailed FSM: Jump Instruction

Performance Comparison
Single-cycle CPU
vs
Multi-cycle CPU

Simple Comparison
1 clock cycle
Single-cycle CPU All
5 clock cycles
Multi-cycle CPU LW
4 clock cycles
Multi-cycle CPU SW, R-type
3 clock cycles
Multi-cycle CPU BEQ, J
What’s really happening?
Single-cycle CPU
Ideally:
Calc
Fetch Decode Memory Write
Addr
( Load Word Instruction )
Multi-cycle CPU

In practice, steps differ in speeds…
Load Word Instruction
Single-cycle CPU
Calc
Addr
Wasted time! Violation!
Multi-cycle CPU
Calc
Fetch Decode Memory
Addr Write
Single-cycle vs Multi-cycle
LW instruction faster for single-cycle
Single-cycle CPU
Calc
Addr
Now wasted time is larger! Violation fixed!
Multi-cycle CPU
Calc
Addr
SW instruction ~ same speed
Single-cycle CPU
Calc
Fetch Decode Memory
Addr
Speed diff
Wasted time!
Multi-cycle CPU
Calc
Fetch Decode Memory
Addr
BEQ, J instruction faster for multi-cycle
Single-cycle CPU
Calc
Fetch Decode
Addr
Speed diff
Wasted time!
Multi-cycle CPU
Calc
Fetch Decode
Addr
Performance Summary
 Which CPU implementation is faster?
 LW  single-cycle is faster
 SW,R-type  about the same
 BEQ,J  multi-cycle is faster
 Real programs use a mix of these instructions
 Overall performance depends instruction frequency !

Implementation Summary
 Single-cycle CPU
 1 instruction per cycle (eg, 1MHz  1 MIPS)
 No “wasted time” on most complex instruction
 Large wasted time on simpler instructions
 Simple controller (just a lookup table or memory)
 Simple instructions
 Multi-cycle CPU
 << 1 instruction per cycle (eg, 1MHz  0.2 MIPS)
 Small time wasted on most complex instruction
 Hence, this instruction always slower than single-cycle CPU
 Small time wasted on simple instructions
 Eliminates “large wasted time” by using fewer clock cycles
 Complex controller (FSM)
 Potential to create complex instructions

Computer Architecture and Organization

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Computer Architecture and Organization

Hochgeladen von

Copyright:

Verfügbare Formate

55:035

Computer Architecture and Organization

55:035 Computer Architecture and Organization 2

55:035 Computer Architecture and Organization 3

D-type Flip-flop with Enable

55:035 Computer Architecture and Organization 4

1 Bit 4 Bits N Bits

55:035 Computer Architecture and Organization 6

55:035 Computer Architecture and Organization 7

 How to Design a CPU Datapath

55:035 Computer Architecture and Organization 8

 Datapath requirements  select the datapath components

 Assemble the datapath

 Analyze datapath control required for each instruction

55:035 Computer Architecture and Organization 9

55:035 Computer Architecture and Organization 10

55:035 Computer Architecture and Organization 12

Parameters rs RdReg1 Zero?

55:035 Computer Architecture and Organization 16

 Solution: use ideal data memory

 Problems?: write to register file

55:035 Computer Architecture and Organization 19

 Recall: stored-program concept

55:035 Computer Architecture and Organization 20

 Fetch next instruction

 Fetch next instruction

 How to maintain sequence? Use a counter!

55:035 Computer Architecture and Organization 21

 Address to instruction memory

 Next instruction: counts up by 4

 Branch instruction: replace PC contents

55:035 Computer Architecture and Organization 22

Instr Register Transfers

ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4

SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4

BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ }

Instruction[5:0] (funct) ExtOp

55:035 Computer Architecture and Organization 27

55:035 Computer Architecture and Organization 28

55:035 Computer Architecture and Organization 29

Instruction[5:0] (funct) ExtOp

 Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.

 Many instructions run slower than necessary

 Tricky timing on MemWrite, RegWrite(?) signals

 Need extra resources…

55:035 Computer Architecture and Organization 34

55:035 Computer Architecture and Organization 35

 Outstanding issue: MemWrite timing

55:035 Computer Architecture and Organization 37

 Save resources (gates/transistors)

 MemWrite timing solved?

55:035 Computer Architecture and Organization 38

Memory Sign Shift

 Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB)

Memory Sign Shift

 Add registers + control signals (IR, MDR, A, B, ALUOut)

55:035 Computer Architecture and Organization 41

Memory Sign Shift

Memory Sign Shift