Sie sind auf Seite 1von 45

Revisiting Hazards

So far our datapath and control have ignored hazards


We shall revisit data hazards and control hazards and
enhance our datapath and control to handle them in
hardware

Data Hazards and Forwarding


Problem with starting an instruction before previous are finished:

data dependencies that go backward in time called data hazards


Time (in clock cycles)

$2 = 10 before sub;
$2 = -20 after sub

CC 1
Value of
register $2: 10

CC 2

CC 3

CC 4

CC 5

CC 6

CC 7

CC 8

CC 9

10

10

10

10/ 20

20

20

20

20

DM

Reg

Program
execution
order
(in instructions)
sub $2, $1, $3

sub
and
or
add
sw

$2,
$12,
$13,
$14,
$15,

$1, $3
$2, $5
$6, $2
$2, $2
100($2)

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

IM

Reg

IM

DM

Reg

IM

DM

Reg

IM

Reg

DM

Reg

IM

Reg

Reg

Reg

DM

Reg

Software Solution

Have compiler guarantee never any data hazards!

by rearranging instructions to insert independent instructions


between instructions that would otherwise have a data hazard
between them,
or, if such rearrangement is not possible, insert nops

sub

$2,

$1, $3

lw
slt
and
or
add
sw

$10, 40($3)
$5, $6, $7
$12, $2, $5
$13, $6, $2
$14, $2, $2
$15, 100($2)

or

sub

$2,

nop
nop
and
or
add
sw

$12,
$13,
$14,
$15,

$1, $3

$2, $5
$6, $2
$2, $2
100($2)

Such compiler solutions may not always be possible, and nops


slow the machine down
MIPS: nop = no operation = 000 (32bits) = sll $0, $0, 0

Hardware Solution:
Forwarding
Idea: use intermediate data, do not wait for result to be
finally written to the destination register. Two steps:

1.
2.

Detect data hazard


Forward intermediate data to resolve hazard

Pipelined Datapath with Control II (as before)


PCSrc

ID/EX

0
M
u
x
1

WB

Control

IF/ID

EX/ME M

WB

EX

ME M/WB
WB

Add
Add

Instruction
memory

Branch

Shift
left 2
ALUSrc

Read
register 1

Read
data 1
Read
register 2
Registers Read
W rite
data 2
register
W rite
data

MemtoReg

Address

Add
result

MemWrite

PC

Instruction

RegW rite

Zero

ALU

0
M
u
x
1

ALU
result

Read
data

Address
Data
memory
Write
data

Instruction
16
[15 0]

Control signals
emanate from
the control
portions of the
pipeline registers

Instruction
[20 16]

Instruction
[15 11]

Sign
extend

32

ALU
control

0
M
u
x
1

ALUOp

RegDst

M emRead

1
M
u
x
0

Hazard Detection

Hazard conditions:

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs


1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Eg., in the earlier example, first hazard between sub $2, $1, $3 and
and $12, $2, $5 is detected when the and is in EX stage and the
sub is in MEM stage because

EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a)

Whether to forward also depends on:

if the later instruction is going to write a register if not, no need to

forward, even if there is register number match as in conditions above


if the destination register of the later instruction is $0 in which case
there is no need to forward value ($0 is always 0 and never overwritten)

Data Forwarding

Plan:

allow inputs to the ALU not just from ID/EX, but also later
pipeline registers, and
use multiplexors and control signals to choose appropriate
inputs to ALU
Time (in clock cycles)
CC 1
Value of register $2 : 10
Value of EX/MEM : X
Value of MEM/WB : X

CC 2

CC 3

CC 4

CC 5

CC 6

CC 7

CC 8

CC 9

10
X
X

10
X
X

10
20
X

10/20
X
20

20
X
X

20
X
X

20
X
X

20
X
X

DM

Reg

Program
execution order
(in instructions)
sub $2, $1, $3

sub
and
or
add
sw

$2,
$12,
$13,
$14,
$15,

$1, $3
$2, $5
$6, $2
$2, $2
100($2)

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

IM

Reg

IM

Reg

IM

DM

Reg

IM

Reg

DM

Reg

IM

Reg

DM

Reg

Reg

DM

Dependencies between pipelines move forward in time

Reg

ID/EX

R e g is te rs

EX/MEM

MEM/WB

ALU
D a ta
m e m o ry

a . N o fo rw a rd in g

M
u
x

Datapath before adding forwarding hardware


ID / E X

E X /M E M

M E M /W B

M
u
x
R e g is te rs
F o r w a rd A

ALU

M
u
x

Rs
Rt
Rt
Rd

D a ta
m e m o ry

F o rw a rd B
M
u
x

E X /M E M .R e g iste rR d
F o rw a rd in g
u n it

b . W ith fo rw a rd in g

M E M /W B .R e g is te rR d

Datapath after adding forwarding hardware

M
u
x

Forwarding Hardware:
Multiplexor Control
Mux control

Source

ForwardA = 00
ForwardA = 10
ForwardA = 01

ID/EX
EX/MEM
MEM/WB

ForwardB = 00
ForwardB = 10
ForwardB = 01

ID/EX
EX/MEM
MEM/WB

Explanation
The first ALU operand comes from the register file
The first ALU operand is forwarded from prior ALU result
The first ALU operand is forwarded from data memory
or an earlier ALU result
The second ALU operand comes from the register file
The second ALU operand is forwarded from prior ALU result
The second ALU operand is forwarded from data memory
or an earlier ALU result

Depending on the selection in the rightmost multiplexor


(see datapath with control diagram)

Data Hazard: Detection and


Forwarding

1.

Forwarding unit determines multiplexor control according to the


following rules:
EX hazard
if (
EX/MEM.RegWrite
// if there is a write
and ( EX/MEM.RegisterRd 0 )
// to a non-$0 register
and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // which matches, then
ForwardA = 10
if (

EX/MEM.RegWrite
// if there is a write
and ( EX/MEM.RegisterRd 0 )
// to a non-$0 register
and ( EX/MEM.RegisterRd = ID/EX.RegisterRt ) ) // which matches, then
ForwardB = 10

Data Hazard: Detection and


Forwarding
2.

MEM hazard
if (
MEM/WB.RegWrite
and ( MEM/WB.RegisterRd 0 )
and ( EX/MEM.RegisterRd ID/EX.RegisterRs )

// if there is a write
// to a non-$0 register
// and not already a register match
// with earlier pipeline register
and ( MEM/WB.RegisterRd = ID/EX.RegisterRs ) ) // but match with later pipeline
register, then
ForwardA = 01
if (

MEM/WB.RegWrite
and ( MEM/WB.RegisterRd 0 )
and ( EX/MEM.RegisterRd ID/EX.RegisterRt )

// if there is a write
// to a non-$0 register
// and not already a register match
// with earlier pipeline register
and ( MEM/WB.RegisterRd = ID/EX.RegisterRt ) ) // but match with later pipeline
register, then
ForwardB = 01
This check is necessary, e.g., for sequences such as add $1, $1, $2; add $1, $1, $3; add $1, $1, $4;
(array summing?), where an earlier pipeline (EX/MEM) register has more recent data

Forwarding Hardware with Control


Called forwarding unit, not hazard detection unit,
because once data is forwarded there is no hazard!
ID/EX
WB
Control

PC

Instruction
memory

Instruction

IF/ID

EX/MEM

WB

EX

MEM/WB
WB

M
u
x
Registers

ALU

Data
memory

M
u
x
IF/ID.RegisterRs

Rs

IF/ID.RegisterRt

Rt

IF/ID.RegisterRt

Rt

IF/ID.RegisterRd

Rd

M
u
x

EX/MEM.RegisterRd

Forwarding
unit

MEM/WB.RegisterRd

Datapath with forwarding hardware and control wires certain details,


e.g., branching hardware, are omitted to simplify the drawing
Note: so far we have only handled forwarding to R-type instructions!

M
u
x

Forwarding

Execution
example:
sub
and
or
add

$2,
$4,
$4,
$9,

$1,
$2,
$4,
$4,

$3
$5
$2
$2

or $4, $4, $2

and $4 , $2, $5

sub $ 2 , $ 1 , $ 3

b e fo re < 1 >

b e fo re < 2 >

ID/EX

$2,
$4,
$4,
$9,

$1,
$2,
$4,
$4,

$3
$5
$2
$2

10

Control

IF/ID

PC

Instruction
memory

10
WB

In stru ctio n

sub
and
or
add

$2

EX/MEM

WB

EX

WB

$1
M
u
x

5
Registers

ALU
$5

$3

M
u
x

M
u
x
Forwarding
unit

C lo c k 3

MEM/WB

Data
memory

M
u
x

a dd $9 , $ 4, $2

sub
and
or
add

$2,
$4,
$4,
$9,

$1,
$2,
$4,
$4,

or $ 4 , $ 4, $ 2

$3
$5
$2
$2

a nd $4 , $ 2, $5

10

memory

10
WB

EX/MEM
10

Control

I nstr uction

PC

$4

WB

EX

MEM/WB

WB

$2
M
u
x

6
Registers

Data

ALU
$2

memory

$5
M
u
x

M
u
x

Forwarding
unit

C lo c k 4

b e fo r e < 1 >

ID/EX

IF/ID

Instruction

sub $ 2, . . .

M
u
x

a fte r < 1 >

sub
and
or
add

add $9, $4, $2

$2,
$4,
$4,
$9,

$1,
$2,
$4,
$4,

or $4, $4, $2

$3
$5
$2
$2

a nd $ 4, . . .

sub $ 2 , . . .

ID/EX
10

10
WB

EX/MEM
10

Control

WB

EX

MEM/WB
1

IF/ID

PC

Instruction
memory

In s tru ctio n

$4

WB

$4
M
u
x

Registers
Data

ALU
$2

memory

$2

M
u
x

M
u
x

M
u
x

Forwarding
unit

C lo c k 5

a f te r < 1 >

a f te r < 2 >

sub
and
or
add

$2,
$4,
$4,
$9,

$1,
$2,
$4,
$4,

a dd $9 , $ 4, $2

$3
$5
$2
$2

or $4, . . .

and $4, . . .

ID/EX
10
WB

EX/MEM
10

Control

WB

EX

MEM/WB
1

IF/ID

WB

PC

Instruction
memory

In stru ctio n

$4
M
u
x

Registers
Data

ALU

memory

$2

M
u
x

M
u
x

4
2

M
u
x
Forwarding
unit

C lo c k 6

Data Hazards and Stalls

Load word can still cause a hazard:

an instruction tries to read a register following a load instruction that


writes to the same register
Time (in clock cycles)

lw
and
or
add
Slt

$2,
$4,
$8,
$9,
$1,

20($1)
$2, $5
$2, $6
$4, $2
$6, $7

As even a pipeline
dependency goes
backward in time
forwarding will not
solve the hazard

Program
CC 1
execution
order
(in instructions)
lw $2, 20($1)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

IM

CC 2

CC 3

R eg

IM

CC 4

CC 5

DM

Reg

Reg

IM

DM

Reg

IM

CC 6

CC 8

CC 9

R eg

DM

Reg

IM

CC 7

Reg

DM

Reg

Reg

DM

therefore, we need a hazard detection unit to stall the pipeline after


the load instruction

Reg

Pipelined Datapath with Control II (as before)


PCSrc

ID/EX

0
M
u
x
1

WB
Control

IF/ID

EX/MEM

WB

EX

MEM/WB
WB

Add

ALUSrc

Read
register 1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register
Write
data

Zero
ALU ALU
result

0
M
u
x
1

MemtoReg

Instruction
memory

Branch

Shift
left 2

MemWrite

Address

Instruction

PC

Add
Add result

RegWrite

Address
Data
memory

Read
data

Write
data
Instruction 16
[150]

Control signals
emanate from
the control
portions of the
pipeline registers

Instruction
[2016]
Instruction
[1511]

Sign
extend

32

ALU
control

0
M
u
x
1

ALUOp

RegDst

MemRead

1
M
u
x
0

Hazard Detection Logic to Stall

Hazard detection unit implements the following check if to stall

if ( ID/EX.MemRead
// if the instruction in the EX stage is a load
and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs )
// and the destination register
or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register
// of the instruction in the ID stage, then
stall the pipeline

Mechanics of Stalling

If the check to stall verifies, then the pipeline needs to stall


only 1 clock cycle after the load as after that the forwarding
unit can resolve the dependency
What the hardware does to stall the pipeline 1 cycle:

does not let the IF/ID register change (disable write!) this will
cause the instruction in the ID stage to repeat, i.e., stall
therefore, the instruction, just behind, in the IF stage must be
stalled as well so hardware does not let the PC change
(disable write!) this will cause the instruction in the IF stage to
repeat, i.e., stall

changes all the EX, MEM and WB control fields in the ID/EX
pipeline register to 0, so effectively the instruction just behind
the load becomes a nop a bubble is said to have been inserted
into the pipeline

note that we cannot turn that instruction into an nop by 0ing all the
bits in the instruction itself recall nop = 000 (32 bits) because
it has already been decoded and control signals generated

Hazard Detection Unit


ID/EX.MemRead

IF/IDWrite

Hazard
detection
unit

ID/EX
WB
Control
0

M
u
x

PC

Instruction
memory

Instruction

PCWrite

IF/ID

EX/MEM

WB

EX

MEM/WB
WB

M
u
x
Registers
ALU

Data
memory

M
u
x
IF/ID.RegisterRs
IF/ID.RegisterRt
IF/ID.RegisterRt

Rt

IF/ID.RegisterRd

Rd

ID/EX.RegisterRt

Rs
Rt

M
u
x

EX/MEM.RegisterRd

Forwarding
unit

MEM/WB.RegisterRd

Datapath with forwarding hardware, the hazard detection unit and


controls wires certain details, e.g., branching hardware are omitted
to simplify the drawing

M
u
x

Stalling Resolves a Hazard

Same instruction sequence as before for which forwarding by


itself could not resolve the hazard:
Program
Time (inclock cycles)
execution
CC 1
CC 2
order
(ininstructions)
lw$2, 20($1)

lw
and
or
add
Slt

$2,
$4,
$8,
$9,
$1,

20($1)
$2, $5
$2, $6
$4, $2
$6, $7

and$4, $2, $5

or $8, $2, $6

IM

CC 3

Reg

IM

CC 4

CC 5

DM

Reg

Reg

Reg

IM

IM

CC 6

CC 7

DM

Reg

Reg

DM

CC 8

CC 9

CC 10

Reg

bubble
add$9, $4, $2

slt $1, $6, $7

IM

DM

Reg

IM

Reg

Reg

DM

Reg

Hazard detection unit inserts a 1-cycle bubble in the pipeline, after


which all pipeline register dependencies go forward so then the
forwarding unit can handle them and there are no more hazards

Stall/Bubble in the Pipeline

Stall inserted
here

Stall/Bubble in the Pipeline

Or, more
accurately

Stalling

Execution
example:
lw
and
or
add

$2,
$4,
$4,
$9,

20($1)
$2, $5
$4, $2
$4, $2

a nd $ 4 , $ 2 , $ 5

lw $ 2 , 2 0 ( $ 1 )

b e fo re < 1 >

Hazard

b e fo re < 3 >

ID/EX.MemRead

detection
unit

b e fo re < 2 >

ID/EX

X
11
IF /ID W rite

WB
M
u
x

Control

EX/MEM

WB

EX

MEM/WB

0
IF/ID

Instruction
memory

PC

Instructio n

P C W rite

$1
M
u
x

X
Registers

ALU
$X
M
u
x

lw
and
or
add

$2,
$4,
$4,
$9,

20($1)
$2, $5
$4, $2
$4, $2

C lo c k 2

WB

X
2

ID/EX.RegisterRt

M
u
x
Forwarding
unit

Data
memory

M
u
x

or $4, $4, $2

lw $ 2 , 2 0 ( $ 1 )

a nd $4 , $ 2, $5

b e fo re < 1 >

b e fo r e < 2 >

Hazard
ID/EX.MemRead

detection
2

unit

ID/EX

5
11

00
I F /ID W rite

WB
M
u
x

Control

EX/MEM

WB

EX

MEM/WB

P C W rite

IF/ID

$2

PC

Instruction

I n structio n

$1
M
u
x

5
Registers

ALU

memory

$5

$X

M
u
x

lw
and
or
add

$2,
$4,
$4,
$9,

C lo c k 3

20($1)
$2, $5
$4, $2
$4, $2

WB

X
2

4
ID/EX.RegisterRt

M
u
x
Forwarding
unit

Data
memory

M
u
x

or $4 , $ 4, $2

a nd $ 4, $2 , $ 5

b u b b le

Hazard

b e fo r e < 1 >

ID/EX.MemRead

detection

lw $ 2 , . . .

unit

ID/EX

5
10

00

IF /ID W r ite

WB
M
u
x

Control

EX/MEM

WB

EX

11
MEM/WB

0
IF/ID

PC

Instruction

I ns tru ctio n

P C W r ite

$2

WB

$2
M
u
x

5
Registers

Data

ALU

memory

$5

memory

$5
M
u
x

lw
and
or
add

$2,
$4,
$4,
$9,

C lo c k 4

20($1)
$2, $5
$4, $2
$4, $2

ID/EX.RegisterRt

M
u
x
Forwarding
unit

M
u
x

add $9, $4, $2

or $4, $4, $2

and $4, $2, $5

Hazard

lw $ 2 , . . .

ID/EX.MemRead

detection
unit

b u b b le

ID/EX

2
I F / ID W rite

10

10
WB

M
u
x

Control

EX/MEM
0

WB

EX

MEM/WB

0
11

P C W rit e

IF/ID

PC

Instruction

Ins truction

$4

WB

$2
M
u
x

Registers

ALU

memory

$2

$5

Data
memory

M
u
x

M
u
x

lw
and
or
add

$2,
$4,
$4,
$9,
C lo c k 5

20($1)
$2, $5
$4, $2
$4, $2

ID/EX.RegisterRt

M
u
x

Forwarding

unit

a f te r < 1 >

add $9, $4, $2

Hazard
unit

ID/EX
10

10
WB

IF / ID W rite

b u b b le

ID/EX.MemRead

detection

4
2

a nd $4 , . . .

or $4, $4, $2

M
u
x

Control

EX/MEM

10
M

WB

EX

MEM/WB

P CW rit e

IF/ID

Instruction

PC

I ns truc t io n

$4

WB

$4
M
u
x

2
Registers

Data

ALU

memory

$2

memory

$2
M
u
x

lw
and
or
add

$2,
$4,
$4,
$9,

C lo c k 6

20($1)
$2, $5
$4, $2
$4, $2

ID/EX.RegisterRt

M
u
x

Forwarding
unit

M
u
x

a f te r < 2 >

a ft e r < 1 >

add $9 , $4, $2

Hazard

or $4, . . .

a nd $ 4 , . . .

ID/EX.MemRead

detection
unit

ID/EX
10

10

I F/ I DW r ite

WB
M
u
x

Control

EX/MEM
10

WB

EX

MEM/WB

0
1
IF/ID

WB

PC

Instruction

Ins tr uct io n

PC W rite

$4
M
u
x
4

Registers

Data

ALU

memory

memory

$2

M
u
x

M
u
x

4
2

lw
and
or
add

$2,
$4,
$4,
$9,

C lo c k 7

20($1)
$2, $5
$4, $2
$4, $2

9
ID/EX.RegisterRt

M
u
x

Forwarding
unit

Control (or Branch) Hazards

Problem with branches in the pipeline we have so far is that the


branch decision is not made till the MEM stage so what

instructions, if at all, should we insert into the pipeline following


the branch instructions?

Possible solution: stall the pipeline till branch decision is known

not efficient, slow the pipeline significantly!

Another solution: predict the branch outcome

e.g., always predict branch-not-taken continue with next

sequential instructions

if the prediction is wrong have to flush the pipeline behind the


branch discard instructions already fetched or decoded and

continue execution at the branch target

Predicting Branch-not-taken:
Misprediction delay
Time (in clock cycles)
Program
execution
CC 1
CC 2
order
(in instructions)
40 beq $1, $3, 7

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

IM

CC 3

Reg

IM

CC 4

CC 5

DM

Reg

Reg

IM

DM

Reg

IM

CC 6

CC 8

CC 9

Reg

DM

Reg

IM

CC 7

Reg

DM

Reg

Reg

DM

Reg

The outcome of branch taken (prediction wrong) is decided only when


beq is in the MEM stage, so the following three sequential instructions
already in the pipeline have to be flushed and execution resumes at lw

Optimizing the Pipeline to Reduce Branch Delay

Move the branch decision from the MEM stage (as in our
current pipeline) earlier to the ID stage

calculating the branch target address involves moving the

branch adder from the MEM stage to the ID stage inputs to


this adder, the PC value and the immediate fields are already
available in the IF/ID pipeline register
calculating the branch decision is efficiently done, e.g., for
equality test, by XORing respective bits and then ORing all the
results and inverting, rather than using the ALU to subtract and
then test for zero (when there is a carry delay)

with the more efficient equality test we can put it in the ID stage
without significantly lengthening this stage remember an objective
of pipeline design is to keep pipeline stages balanced

we must correspondingly make additions to the forwarding and


hazard detection units to forward to or stall the branch at the ID
stage in case the branch decision depends on an earlier result

Flushing on Misprediction

Same strategy as for stalling on load-use data hazard


Zero out all the control values (or the instruction itself) in
pipeline registers for the instructions following the branch that
are already in the pipeline effectively turning them into nops
so they are flushed

in the optimized pipeline, with branch decision made in the ID


stage, we have to flush only one instruction in the IF stage the
branch delay penalty is then only one clock cycle

Optimized Datapath for Branch


IF.Flush
Hazard
detection
unit
ID/EX

M
u
x

WB
Control

M
u
x

IF/ID

EX/MEM

WB

EX

MEM/WB
WB

Shift
left 2

Registers
PC

IF.Flush control zeros out the instruction in the IF/ID


pipeline register (which follows the branch)

M
u
x

Instruction
memory

ALU
M
u
x

Data
memory

M
u
x

Sign
extend

M
u
x
Forwarding
unit

Branch decision is moved from the MEM stage to the ID stage simplified drawing
not showing enhancements to the forwarding and hazard detection units

Reducing Branch Delay

Move hardware to determine outcome to ID


stage

Target address adder


Register comparator

Example: branch taken


36:
40:
44:
48:
52:
56:
72:

sub
beq
and
or
add
slt
...
lw

$10,
$1,
$12,
$13,
$14,
$15,

$4,
$3,
$2,
$2,
$4,
$6,

$8
7
$5
$6
$2
$7

$4, 50($7)

Example: Branch Taken

Example: Branch Taken

Data Hazards for Branches

If a comparison register is a destination


of 2nd or 3rd preceding ALU instruction

add $1, $2, $3

IF

add $4, $5, $6

beq $1, $4, target

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

Can resolve using forwarding

WB

Data Hazards for Branches

If a comparison register is a destination


of preceding ALU instruction or 2nd
preceding load instruction

lw

Need 1 stall cycle


$1, addr

IF

add $4, $5, $6


beq stalled
beq $1, $4, target

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID
ID

EX

MEM

WB

Data Hazards for Branches

If a comparison register is a destination


of immediately preceding load
instruction

lw

Need 2 stall cycles


$1, addr

IF

beq stalled
beq stalled
beq $1, $0, target

ID

EX

IF

ID

MEM

WB

ID
ID

EX

MEM

WB

Simple Example: Comparing


Performance

Compare performance for single-cycle, multicycle, and pipelined


datapaths using the gcc instruction mix

assume 2 ns for memory access, 2 ns for ALU operation, 1 ns


for register read or write
assume gcc instruction mix 23% loads, 13% stores, 19%
branches, 2% jumps, 43% ALU
for pipelined execution assume

50% of the loads are followed immediately by an instruction that


uses the result of the load
25% of branches are mispredicted
branch delay on misprediction is 1 clock cycle
jumps always incur 1 clock cycle delay so their average time is 2
clock cycles

Simple Example: Comparing


Performance

Single-cycle : average instruction time 8 ns


Multicycle : average instruction time 8.04 ns
Pipelined:
loads use 1 cc (clock cycle) when no load-use dependency and 2 cc
when there is dependency given 50% of loads are followed by
dependency the average cc per load is 1.5

stores use 1 cc each

branches use 1 cc when predicted correctly and 2 cc when not


given 25% misprediction average cc per branch is 1.25

jumps use 2 cc each

ALU instructions use 1 cc each

therefore, average CPI is


1.5 23% + 1 13% + 1.25 19% + 2 2% + 1 43% = 1.18

therefore, average instruction time is 1.18 2 = 2.36 ns

Das könnte Ihnen auch gefallen