Sie sind auf Seite 1von 39

RISC

Pipeline
Han Wang CS3410, Spring 2010 Computer Science Cornell University
See: P&H Chapter 4.6
1

Homework 2

0 1 2 3 4 5 6 7 8 9

Announcements - Homework 2 due tomorrow midnight - Programming Assignment 1 release tomorrow


- Pipelined MIPS processor (topic of today) - Subset of MIPS ISA
-

Feedback

- We want to hear from you! - Content?

Absolute Jump

Prog. inst Mem


+4 +4

Reg. File
5 5 5
=?
cmp

ALU

addr

PC

oset +
||

control imm

Data Mem
Could have used ALU for link add

tgt

ext

op 0x3

mnemonic description JAL target r31 = PC+8 (+8 due to branch delay slot) PC = (PC+4)31..28 || (target << 2)

A Processor

Review: Single cycle processor

memory inst

register le
+4

alu

+4

PC
oset

=? control
imm extend cmp

addr din dout memory

new pc

target

Single Cycle Processor Advantages


Single Cycle per instruc`on make logic and clock simple

Disadvantages
Since instruc`ons take dierent `me to nish, memory and func`onal unit are not eciently u`lized. Cycle `me is the longest delay.
Load instruc`on

Best possible CPI is 1

Pipeline Hazards

0h

1h

2h

3h

A Processor

memory inst

register le

alu

+4

addr control din compute jump/branch targets dout memory

PC new pc

imm

extend

Instruc`on Fetch

Instruc`on Decode

Execute

Memory

Write - Back
8

Basic Pipeline

Five stage RISC load-store architecture


1.Instruc`on fetch (IF)
get instruc`on from memory, increment PC translate opcode into control signals and read registers

2.Instruc`on Decode (ID) 3.Execute (EX)

perform ALU opera`on, compute jump/branch targets 4.Memory (MEM) access memory if needed 5.Writeback (WB) update register le

Slides thanks to Sally McKee & Kavita Bala

Pipelined Implementa`on

Break instruc`ons across mul`ple clock cycles (ve, in this case) Design a separate stage for the execu`on performed during each clock cycle Add pipeline registers to isolate signals between dierent stages

10

Pipelined Processor

alu B

+4

addr inst control


extend

PC new pc

din

dout

imm

compute jump/branch targets

memory

Instruc`on Fetch IF/ID

Instruc`on Decode

ID/EX

Write Memory - Execute Back EX/MEM MEM/WB


ctrl ctrl
11

ctrl

memory

register le

IF

Stage 1: Instruc`on Fetch Fetch a new instruc`on every cycle


Current PC is index to instruc`on memory Increment the PC at end of cycle (assume no branches for now)

Write values of interest to pipeline register (IF/ID)


Instruc`on bits (for later decoding) PC+4 (for later compu`ng branch targets)

12

IF

instruc`on memory
addr
+4

1 WE

00 = read word

PC new pc pcsel
pcreg pcrel pcabs

IF/ID
13

Rest of pipeline

mc

PC+4

inst

ID

Stage 2: Instruc`on Decode On every cycle:


Read IF/ID pipeline register to get instruc`on bits Decode instruc`on, generate control signals Read from register le

Write values of interest to pipeline register (ID/EX)


Control informa`on, Rd index, immediates, osets, Contents of Ra, Rb PC+4 (for compu`ng branch targets later)

14

result

ID

Stage 1: Instruc`on Fetch

extend

IF/ID

ID/EX
15

ctrl PC+4 imm

decode

PC+4

Rest of pipeline

WE A Rd register D le B Ra Rb

dest

inst

EX

Stage 3: Execute On every cycle:


Read ID/EX pipeline register to get values and control bits Perform ALU opera`on Compute targets (PC+4+oset, etc.) in case this is a branch Decide if jump/branch should be taken

Write values of interest to pipeline register (EX/MEM)


Control informa`on, Rd index, Result of ALU opera`on Value in case this is a memory store instruc`on

16

pcrel

ID/EX ctrl PC+4 imm B A

ctrl
17

pcabs pcsel pcreg

Stage 2: Instruc`on Decode

+
branch? alu B
Rest of pipeline

||

EX/MEM

D
EX

MEM

Stage 4: Memory On every cycle:


Read EX/MEM pipeline register to get values and control bits Perform memory load/store if needed
address is ALU result


Write values of interest to pipeline register (MEM/WB)
Control informa`on, Rd index, Result of memory opera`on Pass result of ALU opera`on

18

Stage 3: Execute

EX/MEM ctrl B D
din addr

memory
mc

dout

ctrl
19

MEM/WB

M
Rest of pipeline

D
MEM

WB

Stage 5: Write-back On every cycle:


Read MEM/WB pipeline register to get values and control bits Select value and write to register le

20

Stage 4: Memory

MEM/WB ctrl M
dest

D
result

21
WB

inst

inst mem

imm

+4

Rd

OP

Rd

IF/ID

ID/EX

EX/MEM

OP

MEM/WB

OP
22

Rd

PC

PC+4

PC+4

mem

Rd A D B Ra Rb

addr din dout

Example

add r3, r1, r2; nand r6, r4, r5; lw r4, 20(r2); add r5, r2, r5; sw r7, 12(r3);

23

sw r r5, 2(r3) lw r4, 6, 2, 5 nand 3, r1, r2 add 7, 20(r2) 5 r 1 r4, r

aw r4, 20(r2) 5 s and 5, 2(r3) lw r r3, r2, 5 ndd 7, 6, 1, r2 r 1 r4, r r0 r1 r2 Rd r3 D r4 r5 r6 Ra r7 0 36 A 9 12 18 B 7 41 Rb 77 22

aw r4, 20(r2) 5 s and 5, 2(r3) lw r r3, r2, 5 ndd 7, 6, 1, r2 r 1 r4, r

aw r4, 20(r2) 5 naw r4, 20(r2) s and 5, 2(r3) lw r r3, r2, 5 ndd 7, 6, 1, r2 r 1 r4, r lw r7, 1r1, r2 and 5, 2, 5 s dd r3, 2(r3) 6, 4,

imm

+4

Rd

OP

Rd

IF/ID

ID/EX

EX/MEM

OP

MEM/WB

OP
24

Rd

PC

PC+4

PC+4

mem

0:add 1:nand inst 2:lw 3:add mem 4:sw

inst

addr din dout

Clock cycle
1 2 add nand lw add sw

Time Graphs

IF

ID IF

EX MEM WB ID IF EX MEM WB ID IF EX MEM WB ID IF EX MEM WB ID EX MEM WB

Latency: Throughput: Concurrency:

CPI =

25

Pipelining Recap

Powerful technique for masking latencies


Logically, instruc`ons execute one at a `me Physically, instruc`ons execute in parallel
Instruc`on level parallelism

Abstrac`on promotes decoupling


Interface (ISA) vs. implementa`on (Pipeline)

26

The end

27

Sample Code (Simple)

Assume eight-register machine Run the following code on a pipelined datapath


add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7

28
Slides thanks to Sally McKee

M U X

target

PC+1
R0

regA regB

R1 R2 R3 R5 R6 R7 R4

Register le

PC

Inst mem

PC+1
ALU result

valA valB
oset

M U X

A L U

ALU result

mdata

M U X

IF/ID

instrucJon

Data mem

data dest

valB
Bits 0-2 Bits 15-17 Bits 21-23
M U X

dest op

dest op

dest op

ID/EX

EX/MEM

MEM/WB
29

data dest

IF/ID

ID/EX

EX/MEM

MEM/WB
30

add 3 1 2

M U X

PC

Register le

Inst mem

0 R1 36 R2 9 R3 12 R4 18 R5 7 R6 41 R7 22
R0

0 0

0 0
0

0 0
Data mem

M U X

A L U

M U X

add 3 1 2

data dest

Fetch: add 3 1 2

Bits 0-2 Bits 15-17 Bits 21-23

M U X

0 nop

0 nop

0 nop

Time: 1

IF/ID

ID/EX

EX/MEM

MEM/WB
31

nand 6 4 5 add 3 1 2

M U X

2
1 2

PC

Register le

Inst mem

0 R1 36 R2 9 R3 12 R4 18 R5 7 R6 41 R7 22
R0

0 0

36 9 3
M U X

0 0
Data mem

A L U

M U X

nand 6 4 5

data dest

0 0 nop 0 nop

Fetch: nand 6 4 5

Bits 0-2 Bits 15-17 Bits 21-23

M U X

3 add

Time: 2

IF/ID

ID/EX

EX/MEM

MEM/WB
32

lw 4 20(2) nand 6 4 5 add 3 1 2

M U X

3
4 5

PC

Register le

Inst mem

0 R1 36 R2 9 R3 12 R4 18 R5 7 R6 41 R7 22
R0

4 0

18 7 6

36 9 A L U

0 0
Data mem

45

M U X

lw 4 20(2)

M U X

data dest

9
3

Fetch: lw 4 20(2)

Bits 0-2 Bits 15-17 Bits 21-23

M U X

6 nand

3 add

0 nop

Time: 3

IF/ID

ID/EX

EX/MEM

MEM/WB
33

add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2

M U X

4
2 4

PC

Register le

Inst mem

0 R1 36 R2 9 R3 12 R4 18 R5 7 R6 41 R7 22
R0

8 0

9 18 20

18 7 A L U

45 0
Data mem

-3 45

M U X

add 5 2 5

M U X

data dest

7
6

Fetch: add 5 2 5

Bits 0-2 Bits 15-17 Bits 21-23

M U X

4 lw

6 nand

3 add

Time: 4

IF/ID

ID/EX

EX/MEM

MEM/WB
34

sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 6 4 5 add 3 1 2

M U X

5
2 5

PC

Register le

Inst mem

0 R1 36 R2 9 R3 45 R4 18 R5 7 R6 41 R7 22
R0

23 0
45

9 7 5

9 A L U

-3 0
Data mem

29 -3

M U X

sw 7 12(3)

M U 20 X

data dest

18
4

Fetch: sw 7 12(3)

Bits 0-2 Bits 15-17 Bits 21-23

M U X

5 add

4 lw

6 nand

Time: 5

IF/ID

ID/EX

EX/MEM

MEM/WB
35

sw 7 12(3) add 5 2 5 lw 4 20(2) nand 6 4 5

M U X


3 7

PC

Register le

Inst mem

0 R1 36 R2 9 R3 45 R4 18 R5 7 R6 -3 R7 22
R0

9 0
-3

45 22 12

9 7 A L U

29 99
Data mem

16 29

M U X

M U X

data dest

7
5

No more instrucJons

Bits 0-2 Bits 15-17 Bits 21-23

M U X

7 sw

5 add

4 lw

Time: 6

IF/ID

ID/EX

EX/MEM

MEM/WB
36

nop nop sw 7 12(3) add 5 2 5 lw 4 20(2)

M U X

PC

Register le

Inst mem

0 R1 36 R2 9 R3 45 R4 99 R5 7 R6 -3 R7 22
R0

15 0

45 A L U

16 0
Data mem

57 16

M U 99 X

M U 12 X

data dest

22
7

No more instrucJons

Bits 0-2 Bits 15-17 Bits 21-23

M U X

7 sw

5 add

Time: 7

IF/ID

ID/EX

EX/MEM

MEM/WB
37

nop nop nop sw 7 12(3) add 5 2 5

M U X

PC

Register le

Inst mem

0 R1 36 R2 9 R3 45 R4 99 R5 16 R6 -3 R7 22
R0


16


M U X

57
57

A L U

22
Data mem

M U X

data dest 5

22

No more instrucJons

Bits 0-2 Bits 15-17 Bits 21-23

M U X

7 sw

Time: 8

IF/ID

ID/EX

EX/MEM

MEM/WB
38

Slides thanks to Sally McKee

nop nop nop nop sw 7 12(3)

M U X

PC

Register le

Inst mem

0 R1 36 R2 9 R3 45 R4 99 R5 16 R6 -3 R7 22
R0


M U X


Data mem

A L U

M U X

data dest

No more instrucJons

Bits 0-2 Bits 15-17 Bits 21-23

M U X

Time: 9

IF/ID

ID/EX

EX/MEM

MEM/WB
39

Das könnte Ihnen auch gefallen