Beruflich Dokumente
Kultur Dokumente
Ali Mustafa Zaidi Super isor! Dr" Da id #rea es $ni ersit% of Cam&ridge Computer La&orator%
+
$tili,ation 2all
*"+#H, ./nm 01/23
18%
=
54nm 9 1nm 07*x resour)es3
Dark Silicon
CP$! 7"4x: #P$ *"5x 0Cnsr "3 CP$! 6".x: #P$ *"6x 0I'RS3
2
Esmaeilzadeh et al, "Dark Silicon and the End of Multicore Scalin "! "EEE Micro #$1#!
+
$tili,ation 2all
*"+#H, ./nm 01/23
18%
=
Dark Silicon
Esmaeilzadeh et al, "Dark Silicon and the End of Multicore Scalin "! "EEE Micro #$1#!
+d-anta es
S)ala&le: De)entrali,ed ar)hite)tures: with short: p*p wiring" High Computational Densit% +/>+///x Energ% and Performan)e effi)ien)%" Poor Programma&ilit%! often re?uiring low>le el hardware (nowledge Limited Amena&ilit%! poor performan)e on se?uential: irregular: or )omplex )ontrol>flow )ode" Conser ation Cores! Performan)e @ in>order MIPS*5AE )ore Phoenix CASH Hardware! Performan)e 7/B less than 5>wa% <<< Core"
4
"ssues
E.am/les
= A[i] > 0
T
i++ < 100
foo()
Single flow of )ontrol If>)on ersion C h%per&lo)( formation for forward &ran)hes" 0o acceleration of 1ack'ards 1ranches2
5
bar()
End
McFarlin et al., Discerning the dominant out-of-order performance advantage: is it speculation or dynamism? , +S)34S 513
4ur Solution
i
= A[i] > 0
T
i++ < 100
bar()
End
Instead of GHasi) Hlo)(s I Control DlowJ: we ha e GKested Su&graphs I DataflowJ Dun)tions 9 nested su&graphs Loops 9 tail>re)ursi e fun)tions"
i++
Multiple Su&graphs ma% exe)ute )on)urrentl% in Dataflow order 0unli(e &asi) &lo)(s3" Exposes Multi/le (lo's of %ontrol2
< 100
'
bar()
STATE_OUT
Infinite DA#
Loops represented as 'ail Re)ursion Hran)hes represented ia if>)on ersion Ena&les + ressi-e S/eculation2
Instead: )ontrol implemented ia 8Hoolean Predi)ate Expressions8" Logi) minimi,ation )an simplif% expressions: fa)ilitating %ontrol De/endence +nal8sis2
< 100
'
bar()
STATE_OUT
Su&graphs ma% &e 8predi)ated8: or exe)uted spe)ulati el% 0 ia 8if> )on ersion83" 'Flattening' loop tail>)all su&graphs 9 loop unrolling;pipelining" Multiple loops in a loop>nest ma% &e unrolled independentl% to expose ILP
< 100
'
bar()
STATE_OUT
10
11
%1 = mul i32 %x, %y; %2 = srem i32 %1, %z; %3 = icmp slt i32 %2, %1;
rule mul_inst; let !l1 = let !l2 = let rslt = srem_1"en% icmp_1"en% en$rule
rule srem_inst; let !l1 = srem_1"#irst; srem_1"$e%; let !l2 = z"#irst; z"$e%; let rslt = !l1 % !l2; icmp_2"en% (rslt); en$rule "
12
Leg$p HLS 'ool: C Altera Kios IIf Pro)essor: implemented on Altera Stratix IF #M DP#A" Kehalem Core i6 0Sniper inter al simulator from Intel3" In all )ases: memor% a))ess laten)% assumed NN + C%)le"
Leg$p
<ur 'ool)hain
LLFM *". <* Ko L'<: no L'I Ko <p Chaining Stati)all% S)heduled CD#
LLFM *"L <* Ko L'<: no L'I Ko <p Chaining D%nami)all% S)heduled FSD#
13
14
Dre?uen)% C Dela%
Frequency (Higher is e!!er"
450 400 350 300 250
)H*
200 150 100 50 0 epic adpcm dfadd #eg$p (%F&" dfdiv '(F&_0 dfmul '(F&_1 '(F&_3 dfsin mips** small_bimpa
1+4 1+2 1 0+0+, 0+4 0+2 0 epic adpcm dfadd #eg$p (%F&" dfdiv '(F&_0 dfmul '(F&_1 '(F&_3 dfsin mips** small_bimpa
15
#eg$p '(F&_0_2ff
16
17
Kormali,ed Energ%
100
62 31 17 18 3 2 1 4 7 2 3 2 3 14 3 5 6 2 3 4 12
10
3 3 1 1 3 1 1
5 6
0.1
epic
adpcm LegUp
dfadd VSFG_0
dfdiv
dfmul VSFG_1
dfsin VSFG_3
mips Nios
GEOMEAN
18
Halan)e &etween spe)ulation C predi)ation must &e found for effi)ien)% C performan)e
Power gating for predi)ated regions to redu)e stati) power= Sele)ti e loop unrolling"
19
Limitations on Performan)e
74B &etter performan)e than stati)all% s)heduled CD#: without an% optimi,ations!
Impro ements due to d%nami) s)heduling: MDC C CDA $nrolling helps: &ut speed>up saturates ?ui)(l%"
Halan)e &etween /redication C s/eculation: to impro e speed>up without unrolling 0thus redu)ing area and energ% )osts3 State>edge is on )riti)al path O limits &oth unrolling C MDC"
20
'han( Pou
21
22
= A[i]
A i
> 0
foo()
D P '
T
i++ < 100
'
bar()
End bar()
STATE_OUT
23
C%)le )ounts normali,ed to Leg$p results FSD# implemented with all loops unrolled /: +: and 7 times Dull Spe)ulation! all su&graphs 0ex)ept loops3 triggered without predi)ates
24
25
26
27
142055-
200000000 1-0000000 105443 1,0000000 140000000 42004 41-5, 41-5, 120000000 100000000 -0000000 ,0000000 40000000 20000000 0 35,,455,
343344552
.i/s 2f
#eg$p dfsin
'(F&_0
'(F&_1
'(F&_3
%/re i4
.i/s 2f
#eg$p small_bimpa
'(F&_0
'(F&_1
'(F&_3
28
1,000 14000 12000 10000 -000 ,000 4000 2000 0 %/re i4 .i/s 2f #eg$p dfmul '(F&_0 '(F&_1 '(F&_3
35000 30000 25000 20000 15000 10000 5000 0 %/re i4 .i/s 2f #eg$p mips** '(F&_0 '(F&_1 '(F&_3
29
= A[i] > 0
Lam C 2ilson 0+..73: Ma( C M%)roft 0*//.3! 1$. "3) /ossi1le, 'ith:
T
i++ < 100
foo()
Single flow of )ontrol If>)on ersion C h%per&lo)( formation for forward &ran)hes" 0o acceleration of 1ack'ards 1ranches2
bar()
End
30
LLFM
FSD#
Low> Le el IR
Hluespe) S%stemFerilog
ASIC ; DP#A
31
mul %1
srem %2
icmp %3
32
337M ",
rule mul_inst; let !l1 = let !l2 = let rslt = srem_1"en% icmp_1"en% en$rule
rule srem_inst; let !l1 = srem_1"#irst; srem_1"$e%; let !l2 = z"#irst; z"$e%; let rslt = !l1 % !l2; icmp_2"en% (rslt); en$rule " " E=ui-alent >lues/ec %ode 33 "