Beruflich Dokumente
Kultur Dokumente
Instruction-Level Parallelism
and Its Exploitation
Ex)
for (I=1; I <= 1000; I++)
x[I] = x[I] + y[I];
HW approach (dynamic)
Pipeline CPI
Example)
ADD
R1,R2,R3
SUB
R4,R5,R6
AND
R7,R8,R9
OR
R10,R11,R12
XOR
R13,R14,R15
ADD
R16,R17,R18
Pipeline CPI
Example)
ADD
R1,R2,R3
LD
R4,100(R5)
AND
R7,R4,R9
OR
R10,R11,R12
XOR
R13,R14,R15
ADD
R16,R17,R18
10
WB
ME WB
EX ME
WB
Pipeline CPI
10
WB
EX ME WB
Ex)
for (I=1; I <= 1000; I++)
x[I] = x[I] + y[I];
HW approach (dynamic)
Data Dependence
This overlap may cause a hazard because BNE requires the result of
DADDUI at ID stage
MEM
WB
L.D
ADD.D
ID
IF
EX
ID
MEM WB
EX
MEM WB
Data hazard
Ex) ADD.D F4,F0,F2 S.D F4,0(R1)
ADD.D
S.D
IF
IF
ID
IF
EX
ID
No data hazard
MEM WB
EX
MEM WB
Name Dependences
Ex)
ADD.D
SUB.D
S.D
Ex)
ADD.D
SUB.D
F2, F4, F6
F6, F8, F10
F6, 0 (R1)
F2, F4, F6
F2, F8, F10
EX)
ADD.D
SUB.D
S.D
F2, F4, F6
F12, F8, F10
F12, 0 (R1)
10
Data Hazards
11
Control Dependence
Control Dependence:
p1
if
}
p2
{
s1;
{
s2;
}
S1 control dependent on p1
S2 control dependent on p2, but not p1
12
Control Dependence
Control Dependence
L1:
DADDU
LW
BEQZ
R2, R3, R4
R1, 0 (R2) cause memory protection
R2, L1
14
Control Dependence
Data flow: the flow of data values among instructions that produce results and consume
them.
Branch makes data flow dynamic: preserve of data dependence alone is not sufficient for
the correctness
Ex)
DADDU
R1, R2, R3
BEQZ
R4, L
DSUBU
R1, R5, R6
L:
OR
R7, R1, R8
DADDU
DSUBU
BEQZ
OR
L:
R1,R2,R3
R1, R5, R6
R4, L
R7, R1, R8
15
Control Dependence
Violation of control dependence may not always affect the
exception behavior or the data flow
Ex)
DAADU
R1, R2, R3
BEQZ
R12, skipnext
BSUBU
R4, R5, R6
DADDU
R5, R4, R9
skipnext:
OR
R7, R8, R9
skipnext:
DAADU
BSUBU
BEQZ
DADDU
OR
R1, R2, R3
R4, R5, R6
R12, skipnext
R5, R4, R9
R7, R8, R9
16
Example)
for (i=1000; i>0; i=i-1)
x[i] = x[i] + s;
assembly code
Loop:
L.D
ADD.D
S.D
DADDUI
BNE
F0, 0(R1)
F4, F0, F2
F4, 0(R1)
R1, R1, #-8
R1, R2, Loop
17
Instruction
producing
result
Instruction
using result
Latency in
clock cycles
FP ALU op
FP ALU op
FP ALU op
Store double
Load double
FP ALU op
Load double
Store double
Example)
Pipeline stall: dependent instructions
cycles equal to the latency
Loop: L.D
F0, 0(R1)
stall
ADD.D F4, F0, F2
stall
stall
S.D
F4, 0(R1)
DADDUI R1, R1, #-8
stall
BNE
R1, R2, Loop
# of cycles per iteration:9
18
should be separated by
1
2
3
4
5
6
7
8
9
Example)
Scheduling:
note
19
1
2
3
4
5
6
7
20
21
L.D
ADD.D
S.D
L.D
ADD.D
S.D
L.D
ADD.D
S.D
L.D
ADD.D
S.D
DADDUI
BNE
F0, 0(R1)
F4, F0, F2
F4, 0(R1)
F6, -8 (R1)
F8, F6, F2
F8, -8 (R1)
F10, -16 (R1)
F12, F10, F2
F12, -16 (R1)
F14, -24 (R1)
F16, F14, F2
F16, -24 (R1)
R1, R1, #-32
R1, R2, Loop
22
L.D
F0, 0(R1)
stall
ADD.D
F4, F0, F2
stall
stall
S.D
F4, 0(R1)
L.D
F6, -8 (R1)
stall
ADD.D
F8, F6, F2
stall
stall
S.D
F8, -8 (R1)
L.D
F10, -16 (R1)
stall
ADD.D
F12, F10, F2
stall
stall
S.D
F12, -16 (R1)
L.D
F14, -24 (R1)
stall
ADD.D
F16, F14, F2
stall
stall
S.D
F16, -24 (R1)
DADDUI R1, R1, #-32
stall
BNE
R1, R2, Loop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
23
L.D
L.D
L.D
L.D
ADD.D
ADD.D
ADD.D
ADD.D
S.D
S.D
DADDUI
S.D
S.D
BNE
F0, 0(R1)
F6, -8 (R1)
F10, -16 (R1)
F14, -24 (R1)
F4, F0, F2
F8, F6, F2
F12, F10, F2
F16, F14, F2
F4, 0(R1)
F8, -8 (R1)
R1, R1, #-32
F12, -16 (R1)
F16, 8 (R1)
R1, R2, Loop
24
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Boundary condition
Example)
for (i=0; i<n; i++)
x[i] = x[i]+s;
26