Sie sind auf Seite 1von 29

CMPE110 Lecture 10

Building a Processor 2

Heiner Litz

https://canvas.ucsc.edu/courses/19290

CMPE110– Winter 2019 – Lecture 10


Announcements

2
Review

3
Initial Processor Datapath

n Cannot just join wires together


n These connections will actually require multiplexors
4
Fetching the Instruction
n Not that complex

n Instruction = Memory[PC]
n Fetch the instruction from memory
n Always 32-bits

n Update program counter for next cycle


n What is the address of the next instruction?

5
Fetching the Instruction

Increment by 4 for
32b
next instruction
32-bit
register

6
Arithmetic Instructions
n Read two register operands
n Perform arithmetic/logical operation
n Write register result

7
ORI Instruction
n OR immediate instruction

ori rd, rs1, imm #R[rd]<-R[rs1] OR ZeroExt(imm)

n Need to get instr[11:0] into the datapath

I-type
immediate[11:0] rs1 funct3 rd opcode

8
Datapath: ORI Instruction
n Read data 2 is ignored for immediates
n ALUsrc and ALUOp set based on instruction
I-type
immediate[11:0] rs1 funct3 rd opcode
RegWrite

ALUO p
Instruction [19-15] Read
register 1
Read
Instruction [24-20] data 1 ALUSrc
Read
register 2 Zero
Instruction
Registers Read ALU ALU
[31– 0] Instruction [11-7] 0
W rite data 2 result
register M
u
W rite x
data 1

16 Sign 32
Instruction [31-20]
or Zero
extend

9
Branch Instruction
n Branch instruction: beq rs1, rs2, immediate

Cond <- R[rs1] – R[rs2]


if (cond eq 0)
PC <- PC + 4 + SignExt(imm)*4
else
PC <- PC + 4;

S/B-type
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode

10
Datapath for the PC
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode
30

0
M
u
x
30 ALU
Add 1
result
Add

1
Branch
Zero

Read
PC
address
00

Instruction
[31– 0]
Instruction
memory

16 30
Instruction [15– 0] Sign
extend

12
Control
n State free
n Every instruction takes a single cycle
n Just decode instruction bits

n There are also few control points


n Control on the multiplexers
n Operation type for the ALU
n Write control on the register file & data memory
13
Control: Instruction Fetch
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 1
M M
u u
x x
ALU
Add 1 0
result
Add
Shift
left 2 Jump
4 <prev>
Branch
<prev>

<prev>
RegWrite
<prev>
ALUO p <prev>
Instruction [19-15] Read M emWrite
PC
Read register 1
Read
<prev> <prev>
address
Instruction [24-20] data 1 ALUSrc M emtoReg
Read
register 2 Zero
Instruction
Registers Read ALU ALU
[31– 0] Instruction [11-7] 0 Read
W rite data 2 result Address 1
Instruction register M data
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data

16 32
Instruction [15– 0] Sign
extend M emRead
<prev>

<prev>

14
Control: addu
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 1
M M
u u
x x
ALU
Add 1 0
result
Add
Shift
left 2 Jump
4 0
Branch
0

1
RegWrite
<op>
ALUO p 0
Instruction [19-15] Read M emWrite
PC
Read register 1
Read
0 0
address
Instruction [24-20] data 1 ALUSrc M emtoReg
Read
register 2 Zero
Instruction
Registers Read ALU ALU
[31– 0] Instruction [11-7] 0 Read
W rite data 2 result Address 1
Instruction register M data
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
Instruction [31-20] Sign
extend M emRead
X 0

15
Control: Load
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 1
M M
u u
x x
ALU
Add 1 0
result
Add
Shift
left 2 Jump
4 0
Branch
0

1
RegWrite
Add
ALUO p 0
Instruction [19-15] Read M emWrite
PC
Read register 1
Read
1 1
address
Instruction [24-20] data 1 ALUSrc M emtoReg
Read
register 2 Zero
Instruction
Instruction [11-7] Registers Read ALU ALU
[31– 0] 0 Read
W rite data 2 result Address 1
Instruction register M data
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
Instruction [31-20] Sign
extend M emRead
1 1

16
Putting It All Together:
Our First Processor
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 0
M M
u u
x x
ALU
Add 1 1
result
Add
Shift
left 2 Jump
4 Branch
M emRead
Instruction [6-0] M emtoReg
Control ALUO p
M emWrite
ALUSrc
RegWrite

Instruction [19-15] Read


Read register 1
PC Read
address
Instruction [24-20] data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
[31– 0] Instruction [11-7] 0 Read
W rite data 2 result Address 1
Instruction register M data
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
Instruction [31-20] Sign
extend ALU
control

Instruction [5– 0]

17
How to generate control signals?
Branch
M emRead
M emtoReg
n Consider the hypothetical example:
Instruction [6-0] co ntrol ALUO p
M emWrite
ALUSrc
RegWrite
n MemWrite equals 1 if:
Instruction[0] & Instruction[2] &
! Instruction[5]
n Build using combinatorial logic
Instruction[0]
Instruction[2] MemWrite

Instruction[5]
18
Single Cycle Processor
Performance
n Functional unit delay
n Memory: 200ps
n ALU and adders: 200ps
n Register file: 100 ps

Instruction Instruction Register ALU Data Register Total


Class memory read operation memory write

R-type 200 100 200 100 600


load 200 100 200 200 100 800
store 200 100 200 200 700
branch 200 100 200 500
jump 200 200

n CPU clock cycle = 800 ps = 0.8ns (1.25GHz)


19
Single Cycle RISC-V Processor
n Pros
n Single cycle per instruction makes logic simple

n Cons
n Cycle time is the worst case path ® long cycle times
n Worst case = load
n Hardware is underutilized
n ALU and memory used only for a fraction of clock cycle
n Not well amortized!
n Best possible CPI is 1
20
Variable Clock Single Cycle
Processor Performance
n Instruction Mix Instructio Instructio Register ALU Data Register Total
n n read operation memory write
n 45% ALU Class memory
n 25% loads
n 10% stores R-type 200 100 200 100 600
n 15% branches load 200 100 200 200 100 800
n 5% jumps store 200 100 200 200 700
branch 200 100 200 500
jump 200 200

n CPU clock cycle = 0.6x45% + 0.8x25% + 0.7x10% + 0.5x15% + 0.2x5%


= 0.625 ns (1.6GHz)
n Difficult to implement

21
Key Tools for System Architects
1. Pipelining
2. Parallelism
3. Out-of-order execution
4. Prediction
5. Caching
6. Indirection
7. Amortization
8. Redundancy
9. Specialization
10. Focus on the common case

22
Pipelining: The Laundry Analogy
n Ann, Brian, Cathy, Dave doing laundry

n Washer takes 30 minutes A B C D

n Dryer takes 40 minutes

n “Folding bench” takes 20 minutes

23
Single-cycle Laundry
6 PM 7 8 9 10 11 Midnight
Time

T
a
30 40 20 30 40 20 30 40 20 30 40 20
s
k A

O
r B
d
e C
r

D
Single-cycle laundry takes 6 hours for 4 loads
24
Pipelined Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r D
Pipelined laundry takes 3.5 hours for 4 loads
25
Lessons from Laundry Analogy
6 PM 7 8 9 n Pipelining doesn’t help latency of
Time single task, it helps throughput of
entire workload
T 30 40 40 40 40 20 n Multiple tasks operating
a
simultaneously
s
k
A n Potential speedup = Number pipe
stages
O
r B n Pipeline rate limited by slowest
pipeline stage
d
e Unbalanced lengths of pipe stages
C
n
r reduces speedup
n Time to “fill” pipeline and time to
D “drain” it reduces speedup

26
Another Analogy:
Model T Assembly Line

27
Pipelining the Processor
n 5 stages, one clock cycle per stage
n IF: instruction fetch from memory
n ID: instruction decode & register read
n EX: execute operation or calculate address
n MEM: access memory operand
n WB: write result back to register
Cycle 1 Cycle 2 Cycle Cycle 4 Cycle 5
3

lw IF RF/ID EX MEM WB

28
Pipelining the Processor
n Overlap instructions in different stages
n All hardware used all the time (amortization)
n Clock cycle is fast
n CPI is still 1
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Clock

1st lw IF RF/ID EX MEM WB

2nd lw IF RF/ID EX MEM WB

3rd lw IF RF/ID EX MEM WB

29
To Be Continued
n Pipelined datapath and control

n Pipeline dependencies, hazards, and stalls

n The limits of pipelining

30

Das könnte Ihnen auch gefallen