Lect 10 Processor2 PDF

CMPE110 Lecture 10
Building a Processor 2
Heiner Litz
https://canvas.ucsc.edu/courses/19290
CMPE110– Winter 2019 – Lecture 10

Announcements
2
Review
3
Initial Processor Datapath
n Cannot just join wires together

n These connections will actually require multiplexors
4
Fetching the Instruction
n Not that complex
n Instruction = Memory[PC]
n Fetch the instruction from memory
n Always 32-bits
n Update program counter for next cycle

n What is the address of the next instruction?
5
Fetching the Instruction
Increment by 4 for
32b
next instruction
32-bit
register
6
Arithmetic Instructions
n Read two register operands
n Perform arithmetic/logical operation
n Write register result
7
ORI Instruction
n OR immediate instruction
ori rd, rs1, imm #R[rd]<-R[rs1] OR ZeroExt(imm)
n Need to get instr[11:0] into the datapath
I-type
immediate[11:0] rs1 funct3 rd opcode
8
Datapath: ORI Instruction
n Read data 2 is ignored for immediates
n ALUsrc and ALUOp set based on instruction
I-type
immediate[11:0] rs1 funct3 rd opcode
RegWrite
ALUO p
Instruction [19-15] Read
register 1
Read
Instruction [24-20] data 1 ALUSrc
Read
register 2 Zero
Instruction
Registers Read ALU ALU
[31– 0] Instruction [11-7] 0
W rite data 2 result
register M
u
W rite x
data 1
16 Sign 32
Instruction [31-20]
or Zero
extend
9
Branch Instruction
n Branch instruction: beq rs1, rs2, immediate
Cond <- R[rs1] – R[rs2]

if (cond eq 0)
PC <- PC + 4 + SignExt(imm)*4
else
PC <- PC + 4;
S/B-type
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode
10
Datapath for the PC
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode
30
0
M
u
x
30 ALU
Add 1
result
Add
1
Branch
Zero
Read
PC
address
00
Instruction
[31– 0]
Instruction
memory
16 30
Instruction [15– 0] Sign
extend
12
Control
n State free
n Every instruction takes a single cycle
n Just decode instruction bits
n There are also few control points

n Control on the multiplexers
n Operation type for the ALU
n Write control on the register file & data memory
13
Control: Instruction Fetch
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 1
M M
u u
x x
ALU
Add 1 0
result
Add
Shift
left 2 Jump
4 <prev>
Branch
<prev>
<prev>
RegWrite
<prev>
ALUO p <prev>
Instruction [19-15] Read M emWrite
PC
Read register 1
Read
<prev> <prev>
address
Instruction [24-20] data 1 ALUSrc M emtoReg
Read
register 2 Zero
Instruction
[31– 0] Instruction [11-7] 0 Read
W rite data 2 result Address 1
Instruction register M data
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
Instruction [15– 0] Sign
extend M emRead
<prev>
<prev>
14
Control: addu
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 1
M M
u u
x x
ALU
Add 1 0
result
Add
Shift
left 2 Jump
4 0
Branch
0
1
RegWrite
<op>
ALUO p 0
PC
Read register 1
Read
0 0
address
Read
register 2 Zero
Instruction
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
Instruction [31-20] Sign
extend M emRead
X 0
15
Control: Load
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 1
M M
u u
x x
ALU
Add 1 0
result
Add
Shift
left 2 Jump
4 0
Branch
0
1
RegWrite
Add
ALUO p 0
PC
Read register 1
Read
1 1
address
Read
register 2 Zero
Instruction
Instruction [11-7] Registers Read ALU ALU
[31– 0] 0 Read
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
extend M emRead
1 1
16
Putting It All Together:
Our First Processor
P C [31– 28 ] Instruction [25– 0 ] 0 0 0 0
M M
u u
x x
ALU
Add 1 1
result
Add
Shift
left 2 Jump
4 Branch
M emRead
Instruction [6-0] M emtoReg
Control ALUO p
M emWrite
ALUSrc
RegWrite
Instruction [19-15] Read

Read register 1
PC Read
address
Instruction [24-20] data 1
Read
register 2 Zero
Instruction
u M
memory u
W rite x
Data x
data 1 m em ory 0
Write
data
16 32
extend ALU
control
Instruction [5– 0]
17
How to generate control signals?
Branch
M emRead
M emtoReg
n Consider the hypothetical example:
Instruction [6-0] co ntrol ALUO p
M emWrite
ALUSrc
RegWrite
n MemWrite equals 1 if:
Instruction[0] & Instruction[2] &
! Instruction[5]
n Build using combinatorial logic
Instruction[0]
Instruction[2] MemWrite
Instruction[5]
18
Single Cycle Processor
Performance
n Functional unit delay
n Memory: 200ps
n ALU and adders: 200ps
n Register file: 100 ps
Instruction Instruction Register ALU Data Register Total

Class memory read operation memory write
R-type 200 100 200 100 600

load 200 100 200 200 100 800
store 200 100 200 200 700
branch 200 100 200 500
jump 200 200
n CPU clock cycle = 800 ps = 0.8ns (1.25GHz)

19
Single Cycle RISC-V Processor
n Pros
n Single cycle per instruction makes logic simple
n Cons
n Cycle time is the worst case path ® long cycle times
n Worst case = load
n Hardware is underutilized
n ALU and memory used only for a fraction of clock cycle
n Not well amortized!
n Best possible CPI is 1
20
Variable Clock Single Cycle
Processor Performance
n Instruction Mix Instructio Instructio Register ALU Data Register Total
n n read operation memory write
n 45% ALU Class memory
n 25% loads
n 10% stores R-type 200 100 200 100 600
n 15% branches load 200 100 200 200 100 800
n 5% jumps store 200 100 200 200 700
branch 200 100 200 500
jump 200 200
n CPU clock cycle = 0.6x45% + 0.8x25% + 0.7x10% + 0.5x15% + 0.2x5%

= 0.625 ns (1.6GHz)
n Difficult to implement
21
Key Tools for System Architects
1. Pipelining
2. Parallelism
3. Out-of-order execution
4. Prediction
5. Caching
6. Indirection
7. Amortization
8. Redundancy
9. Specialization
10. Focus on the common case
22
Pipelining: The Laundry Analogy
n Ann, Brian, Cathy, Dave doing laundry
n Washer takes 30 minutes A B C D
n Dryer takes 40 minutes
n “Folding bench” takes 20 minutes
23
Single-cycle Laundry
6 PM 7 8 9 10 11 Midnight
Time
T
a
30 40 20 30 40 20 30 40 20 30 40 20
s
k A
O
r B
d
e C
r
D
Single-cycle laundry takes 6 hours for 4 loads
24
Pipelined Laundry
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r D
Pipelined laundry takes 3.5 hours for 4 loads
25
Lessons from Laundry Analogy
6 PM 7 8 9 n Pipelining doesn’t help latency of
Time single task, it helps throughput of
entire workload
T 30 40 40 40 40 20 n Multiple tasks operating
a
simultaneously
s
k
A n Potential speedup = Number pipe
stages
O
r B n Pipeline rate limited by slowest
pipeline stage
d
e Unbalanced lengths of pipe stages
C
n
r reduces speedup
n Time to “fill” pipeline and time to
D “drain” it reduces speedup
26
Another Analogy:
Model T Assembly Line
27
Pipelining the Processor
n 5 stages, one clock cycle per stage
n IF: instruction fetch from memory
n ID: instruction decode & register read
n EX: execute operation or calculate address
n MEM: access memory operand
n WB: write result back to register
Cycle 1 Cycle 2 Cycle Cycle 4 Cycle 5
3
lw IF RF/ID EX MEM WB
28
Pipelining the Processor
n Overlap instructions in different stages
n All hardware used all the time (amortization)
n Clock cycle is fast
n CPI is still 1
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Clock
1st lw IF RF/ID EX MEM WB
2nd lw IF RF/ID EX MEM WB
3rd lw IF RF/ID EX MEM WB
29
To Be Continued
n Pipelined datapath and control
n Pipeline dependencies, hazards, and stalls
n The limits of pipelining
30

Lect 10 Processor2 PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lect 10 Processor2 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

CMPE110 Lecture 10

CMPE110– Winter 2019 – Lecture 10

n Cannot just join wires together

n Update program counter for next cycle

ori rd, rs1, imm #R[rd]<-R[rs1] OR ZeroExt(imm)

n Need to get instr[11:0] into the datapath

Cond <- R[rs1] – R[rs2]

n There are also few control points

Instruction [19-15] Read

Instruction Instruction Register ALU Data Register Total

R-type 200 100 200 100 600

n CPU clock cycle = 800 ps = 0.8ns (1.25GHz)

n CPU clock cycle = 0.6x45% + 0.8x25% + 0.7x10% + 0.5x15% + 0.2x5%

n Washer takes 30 minutes A B C D

n Dryer takes 40 minutes

n “Folding bench” takes 20 minutes

1st lw IF RF/ID EX MEM WB

2nd lw IF RF/ID EX MEM WB

3rd lw IF RF/ID EX MEM WB

n Pipeline dependencies, hazards, and stalls

n The limits of pipelining

Das könnte Ihnen auch gefallen