Sie sind auf Seite 1von 4

CS521CSEIITG 11/23/2012

Allaresimilarterminology
Multicore(ConnectedInternallyinsideaChip)
Multiprocessor(ConnectedViaBUS/OnaMother
board)
Multicomputer(ConnectedViaLAN)
ParallelArchitecture

ASahu 1 ASahu 2

PowerVsfrequency
Saturationofsingleprocessorperformance Ghz PowerWall
Speedlimitnottocrosses:4GHz
3Ghz
Theultimatepoint Xeon
Powerconsumptionisproportionaltosquareof 65nm
frequency
2Ghz PentiumM
P *C*V*f2
P=*C*V*f 90Nm
Singleprocessor Pentium4
Branchpredictionaccuracygoneupto 95% 130nm
1Ghz
L1Cachehitsgoneupto 80% PIII
ILPexploitedbyuniprocessor isupto 8(mostly) 180nm
Thread/Datalevelparallelismneedstoexploit Pentium350nm
80386
8086

ASahu 3 ASahu 4

PowerVsfrequency ApplicationspecificIC(ASIC)
Highperformance,lowpowerthanProcessor
ButcomplexityofASICdesignisveryHigh
Eraof Example:12MP+HDVideo,GPSCamerainside
WorkRequired Parallelism mobilehandset
Itisfixedforanapplication

ASahu 5 ASahu 6

ASahu 1
CS521CSEIITG 11/23/2012

VLSItechnologyofferinghighintegrationdensity
Manyapplicationsarehighlyparallel
MooresLaw(In1965,GordonMoorePrediction)
Takebenefitofallparallelism(instruction,dataand
ExponentialgrowthofthenumberoftransistorsonanIC thread)
Doubledevery26monthsforthepastthreedecades
Multiprocessors
WhymoretransistorsperIC?Smallertransistors,Largerdice
Flexible,programmable,highperformance
Takebenefitofallparallelism(instruction,data
Take benefit of all parallelism (instruction data
andthread)
Likelytobecost/powereffectivesolutions

ASahu 7 ASahu 8

Multiprocessorsare Multiprocessorsarelikelytobecost/power
Flexible,programmable,highperformance effectivesolutions
Processorareprogrammableascompared Sharelotsofresources
toASIC Personalroomiscostlierthan
Flexibleintermsofportabilityascompared dormitory
toASIC
to ASIC Youcannt
You cannt allocateaBungalowtoeach
allocate a Bungalow to each
HigherPerformancethansingleprocessor student:itwilltoocostly
Hostelroomwithsharedfacilityis
sufficeint
Neednotrequireveryhighfrequencytorun
Lotsofreplicationmakeseasytomanageand
costeffectiveindesign
ASahu 9 ASahu 10

Multiprocessorsarelikelytobecost/power
effectivesolutions
Becauseitsharelotsofresources
Personalroomiscostlierthandormitory
Sharingresourcearisemanyotherproblems
CriticalSections
Part of ACA
PartofACA
L k dB i D i
LockandBarrierDesign Course@IITG
Coherence
Shareddataatallplacedshouldbesame
Consistency
Ordershouldbesimilartoserial(ROB)
OneprocessorInterferenceothers
Shareefficientlyusingsomepolicy
ASahu 11 ASahu 12

ASahu 2
CS521CSEIITG 11/23/2012

Manyapplicationsarehighlyparallel
Takebenefitofallparallelism(instruction, Taskschedulinginmultiprocessors
dataandthread) Deterministictaskschedulingon
Mostofthecoderwritesequentialcode multiprocessorwithmorethan2
Whowillextractparallelismfrom processorisNPCompleteproblem
applications ?
applications? 4 Tasks (A B C and D) 3 Processor
4Tasks(A,B,CandD),3Processor
Thereisnosuccessfulautoparallelisation {A,B,C,D}{}{},{A,B,C}{D}{},......Exponential
tooltilldate Solutions
Attempts:Cetus,SUIF,SolarisCC PartofACA
Course@IITG
ExcludedPartofACACourse
NotinDetail
@IITG
RelatedtoCOMPILER
ASahu 13 ASahu 14

Function-parallel
architectures Overheadlimited loadimbalance
andparallelism
limited
Instruction Thread Process
level PAs level PAs level PAs
Speedup

ILPs MIMDs
Built using
general purpose
processors
Pipelined VLIWs Superscalar Shared Distributed Finegrain Optgrainsize Coarsegrain
processors processors Memory Memory
ASahu slide15 MIMD MIMD ASahu slide16

100 100 100 100


T1 = Time on uniprocessor
T p = Time on p processors 100 50 50 25 25 25 25 Processors,Time0
T T
Speed up = S p = 1 Efficiency = E p = 1 100 100 100 100
Tp p Tp
Usually, S p < p or E p < 1 due to overheads
100 50 50 25 25 25 25 Processors,Time0
Sometime super linear speedup ( S p > p or E p > 1) is reported
This may be due to 100 100 100 100
failure to use the best uni processor algorithm Work500, Work500, Work500,
Work500,
advantage due to larger memory Time500 Time400 Time350 Time300
Sp=1X Sp=1.25X Sp=1.4X Sp=1.7X
ASahu slide17 ASahu 18

ASahu 3
CS521CSEIITG 11/23/2012

Ts
Serial fraction = s =
T1
NextClass
T1 Ts
Tp = Ts + CriticalsectionaccessinParallelRegionneedto
p
beserialized
T1 T1 T1
Sp = = = Computerarebecomingpowerfultohandlelarger
Computer are becoming powerful to handle larger
Tp T + T1 Ts T (1 1 ) + T1
s s task
p p p
Sp=p Taskmultiplicationinsteadofdivision
1 p
= = InpresenceofMemoryhierarchyitbehaves
1 1 s ( p 1) + 1
s (1 ) + Sp differently
p p Sp=1
Lt 1
Sp = .5 1
p s ASahu
s 19 ASahu 20

ASahu 4

Das könnte Ihnen auch gefallen