Beruflich Dokumente
Kultur Dokumente
Allaresimilarterminology
Multicore(ConnectedInternallyinsideaChip)
Multiprocessor(ConnectedViaBUS/OnaMother
board)
Multicomputer(ConnectedViaLAN)
ParallelArchitecture
ASahu 1 ASahu 2
PowerVsfrequency
Saturationofsingleprocessorperformance Ghz PowerWall
Speedlimitnottocrosses:4GHz
3Ghz
Theultimatepoint Xeon
Powerconsumptionisproportionaltosquareof 65nm
frequency
2Ghz PentiumM
P *C*V*f2
P=*C*V*f 90Nm
Singleprocessor Pentium4
Branchpredictionaccuracygoneupto 95% 130nm
1Ghz
L1Cachehitsgoneupto 80% PIII
ILPexploitedbyuniprocessor isupto 8(mostly) 180nm
Thread/Datalevelparallelismneedstoexploit Pentium350nm
80386
8086
ASahu 3 ASahu 4
PowerVsfrequency ApplicationspecificIC(ASIC)
Highperformance,lowpowerthanProcessor
ButcomplexityofASICdesignisveryHigh
Eraof Example:12MP+HDVideo,GPSCamerainside
WorkRequired Parallelism mobilehandset
Itisfixedforanapplication
ASahu 5 ASahu 6
ASahu 1
CS521CSEIITG 11/23/2012
VLSItechnologyofferinghighintegrationdensity
Manyapplicationsarehighlyparallel
MooresLaw(In1965,GordonMoorePrediction)
Takebenefitofallparallelism(instruction,dataand
ExponentialgrowthofthenumberoftransistorsonanIC thread)
Doubledevery26monthsforthepastthreedecades
Multiprocessors
WhymoretransistorsperIC?Smallertransistors,Largerdice
Flexible,programmable,highperformance
Takebenefitofallparallelism(instruction,data
Take benefit of all parallelism (instruction data
andthread)
Likelytobecost/powereffectivesolutions
ASahu 7 ASahu 8
Multiprocessorsare Multiprocessorsarelikelytobecost/power
Flexible,programmable,highperformance effectivesolutions
Processorareprogrammableascompared Sharelotsofresources
toASIC Personalroomiscostlierthan
Flexibleintermsofportabilityascompared dormitory
toASIC
to ASIC Youcannt
You cannt allocateaBungalowtoeach
allocate a Bungalow to each
HigherPerformancethansingleprocessor student:itwilltoocostly
Hostelroomwithsharedfacilityis
sufficeint
Neednotrequireveryhighfrequencytorun
Lotsofreplicationmakeseasytomanageand
costeffectiveindesign
ASahu 9 ASahu 10
Multiprocessorsarelikelytobecost/power
effectivesolutions
Becauseitsharelotsofresources
Personalroomiscostlierthandormitory
Sharingresourcearisemanyotherproblems
CriticalSections
Part of ACA
PartofACA
L k dB i D i
LockandBarrierDesign Course@IITG
Coherence
Shareddataatallplacedshouldbesame
Consistency
Ordershouldbesimilartoserial(ROB)
OneprocessorInterferenceothers
Shareefficientlyusingsomepolicy
ASahu 11 ASahu 12
ASahu 2
CS521CSEIITG 11/23/2012
Manyapplicationsarehighlyparallel
Takebenefitofallparallelism(instruction, Taskschedulinginmultiprocessors
dataandthread) Deterministictaskschedulingon
Mostofthecoderwritesequentialcode multiprocessorwithmorethan2
Whowillextractparallelismfrom processorisNPCompleteproblem
applications ?
applications? 4 Tasks (A B C and D) 3 Processor
4Tasks(A,B,CandD),3Processor
Thereisnosuccessfulautoparallelisation {A,B,C,D}{}{},{A,B,C}{D}{},......Exponential
tooltilldate Solutions
Attempts:Cetus,SUIF,SolarisCC PartofACA
Course@IITG
ExcludedPartofACACourse
NotinDetail
@IITG
RelatedtoCOMPILER
ASahu 13 ASahu 14
Function-parallel
architectures Overheadlimited loadimbalance
andparallelism
limited
Instruction Thread Process
level PAs level PAs level PAs
Speedup
ILPs MIMDs
Built using
general purpose
processors
Pipelined VLIWs Superscalar Shared Distributed Finegrain Optgrainsize Coarsegrain
processors processors Memory Memory
ASahu slide15 MIMD MIMD ASahu slide16
ASahu 3
CS521CSEIITG 11/23/2012
Ts
Serial fraction = s =
T1
NextClass
T1 Ts
Tp = Ts + CriticalsectionaccessinParallelRegionneedto
p
beserialized
T1 T1 T1
Sp = = = Computerarebecomingpowerfultohandlelarger
Computer are becoming powerful to handle larger
Tp T + T1 Ts T (1 1 ) + T1
s s task
p p p
Sp=p Taskmultiplicationinsteadofdivision
1 p
= = InpresenceofMemoryhierarchyitbehaves
1 1 s ( p 1) + 1
s (1 ) + Sp differently
p p Sp=1
Lt 1
Sp = .5 1
p s ASahu
s 19 ASahu 20
ASahu 4