Sie sind auf Seite 1von 6

12/19/2014

GPU Programming in MATLAB

GPU Programming in MATLAB


ByJillReese,MathWorksandSarahZaranek,MathWorks
Multicoremachinesandhyperthreadingtechnologyhaveenabledscientists,engineers,andfinancialanalysts
tospeedupcomputationallyintensiveapplicationsinavarietyofdisciplines.Today,anothertypeofhardware
promisesevenhighercomputationalperformance:thegraphicsprocessingunit(GPU).
Originallyusedtoaccelerategraphicsrendering,GPUsareincreasinglyappliedtoscientificcalculations.
UnlikeatraditionalCPU,whichincludesnomorethanahandfulofcores,aGPUhasamassivelyparallelarray
ofintegerandfloatingpointprocessors,aswellasdedicated,highspeedmemory.AtypicalGPUcomprises
hundredsofthesesmallerprocessors(Figure1).

Figure1.ComparisonofthenumberofcoresonaCPUsystemandaGPU.

ThegreatlyincreasedthroughputmadepossiblebyaGPU,however,comesatacost.First,memoryaccess
becomesamuchmorelikelybottleneckforyourcalculations.DatamustbesentfromtheCPUtotheGPU
beforecalculationandthenretrievedfromitafterwards.BecauseaGPUisattachedtothehostCPUviathe
PCIExpressbus,thememoryaccessisslowerthanwithatraditionalCPU.1 Thismeansthatyouroverall
computationalspeedupislimitedbytheamountofdatatransferthatoccursinyouralgorithm.Second,
programmingforGPUsinCorFortranrequiresadifferentmentalmodelandaskillsetthatcanbedifficultand
timeconsumingtoacquire.Additionally,youmustspendtimefinetuningyourcodeforyourspecificGPUto
optimizeyourapplicationsforpeakperformance.
ThisarticledemonstratesfeaturesinParallelComputingToolboxthatenableyoutorunyourMATLAB
codeonaGPUbymakingafewsimplechangestoyourcode.Weillustratethisapproachbysolvingasecond
orderwaveequationusingspectralmethods.

WhyParallelizeaWaveEquationSolver?
Waveequationsareusedinawiderangeofengineeringdisciplines,includingseismology,fluiddynamics,
acoustics,andelectromagnetics,todescribesound,light,andfluidwaves.
Analgorithmthatusesspectralmethodstosolvewaveequationsisagoodcandidateforparallelization
becauseitmeetsbothofthecriteriaforaccelerationusingtheGPU(see"WillExecutiononaGPUAccelerate
MyApplication?"):
Itiscomputationallyintensive.ThealgorithmperformsmanyfastFouriertransforms(FFTs)andinverse
fastFouriertransforms(IFFTs).Theexactnumberdependsonthesizeofthegrid(Figure2)andthenumber
oftimestepsincludedinthesimulation.EachtimesteprequirestwoFFTsandfourIFFTsondifferent
matrices,andasinglecomputationcaninvolvehundredsofthousandsoftimesteps.
Itismassivelyparallel.TheparallelFFTalgorithmisdesignedto"divideandconquer"sothatasimilartask
isperformedrepeatedlyondifferentdata.Additionally,thealgorithmrequiressubstantialcommunication

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

1/6

12/19/2014

GPU Programming in MATLAB

betweenprocessingthreadsandplentyofmemorybandwidth.TheIFFTcansimilarlyberuninparallel.

Figure2.Asolutionforasecondorderwaveequationona32x32grid(seeanimation
(http://www.mathworks.com/videos/solutionofsecondorderwaveequationanimation79288.html?type=shadow)
).

WillExecutiononaGPUAccelerateMyApplication?
AGPUcanaccelerateanapplicationifitfitsbothofthefollowingcriteria:
ComputationallyintensiveThetimespentoncomputationsignificantlyexceedsthetimespentontransferringdata
toandfromGPUmemory.
MassivelyparallelThecomputationscanbebrokendownintohundredsorthousandsofindependentunitsofwork.
ApplicationsthatdonotsatisfythesecriteriamightactuallyrunsloweronaGPUthanonaCPU.

GPUComputinginMATLAB
Beforecontinuingwiththewaveequationexample,let'squicklyreviewhowMATLABworkswiththeGPU.
FFT,IFFT,andlinearalgebraicoperationsareamongmorethan100builtinMATLABfunctionsthatcanbe
executeddirectlyontheGPUbyprovidinganinputargumentofthetypeGPUArray,aspecialarraytype
providedbyParallelComputingToolbox.TheseGPUenabledfunctionsareoverloadedinotherwords,they
operatedifferentlydependingonthedatatypeoftheargumentspassedtothem.
Forexample,thefollowingcodeusesanFFTalgorithmtofindthediscreteFouriertransformofavectorof
pseudorandomnumbersontheCPU:
A = rand(2^16,1);

B = fft(A);

ToperformthesameoperationontheGPU,wefirstusethegpuArraycommandtotransferdatafromthe
MATLABworkspacetodevicememory.Thenwecanrunfft,whichisoneoftheoverloadedfunctionsonthat
data:
A = gpuArray(rand(2^16,1));

B = fft(A);

ThefftoperationisexecutedontheGPUratherthantheCPUsinceitsinput(aGPUArray)isheldonthe
GPU.
Theresult,B,isstoredontheGPU.However,itisstillvisibleintheMATLABworkspace.Byrunningclass(B),
wecanseethatitisaGPUArray.
class(B)

ans =

parallel.gpu.GPUArray

WecancontinuetomanipulateBonthedeviceusingGPUenabledfunctions.Forexample,tovisualizeour

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

2/6

12/19/2014

GPU Programming in MATLAB

results,theplotcommandautomaticallyworksonGPUArrays:
plot(B);

ToreturnthedatabacktothelocalMATLABworkspace,youcanusethegathercommandforexample
C = gather(B);
CisnowadoubleinMATLABandcanbeoperatedonbyanyoftheMATLABfunctionsthatworkondoubles.

Inthissimpleexample,thetimesavedbyexecutingasingleFFTfunctionisoftenlessthanthetimespent
transferringthevectorfromtheMATLABworkspacetothedevicememory.Thisisgenerallytruebutis
dependentonyourhardwareandsizeofthearray.Datatransferoverheadcanbecomesosignificantthatit
degradestheapplication'soverallperformance,especiallyifyourepeatedlyexchangedatabetweentheCPU
andGPUtoexecuterelativelyfewcomputationallyintensiveoperations.Itismoreefficienttoperformseveral
operationsonthedatawhileitisontheGPU,bringingthedatabacktotheCPUonlywhenrequired2 .
NotethatGPUs,likeCPUs,havefinitememories.However,unlikeCPUs,theydonothavetheabilitytoswap
memorytoandfromdisk.Thus,youmustverifythatthedatayouwanttokeepontheGPUdoesnotexceed
itsmemorylimits,particularlywhenyouareworkingwithlargematrices.ByrunninggpuDevice,youcanquery
yourGPUcard,obtaininginformationsuchasname,totalmemory,andavailablememory.

ImplementingandAcceleratingtheAlgorithmtoSolveaWaveEquationinMATLAB
Toputtheaboveexampleintocontext,let'simplementtheGPUfunctionalityonarealproblem.Our
computationalgoalistosolvethesecondorderwaveequation

withtheconditionu=0ontheboundaries.Weuseanalgorithmbasedonspectralmethodstosolvethe
equationinspaceandasecondordercentralfinitedifferencemethodtosolvetheequationintime.
Spectralmethodsarecommonlyusedtosolvepartialdifferentialequations.Withspectralmethods,the
solutionisapproximatedasalinearcombinationofcontinuousbasisfunctions,suchassinesandcosines.In
thiscase,weapplytheChebyshevspectralmethod,whichusesChebyshevpolynomialsasthebasis
functions.
Ateverytimestep,wecalculatethesecondderivativeofthecurrentsolutioninboththexandydimensions
usingtheChebyshevspectralmethod.Usingthesederivativestogetherwiththeoldsolutionandthecurrent
solution,weapplyasecondordercentraldifferencemethod(alsoknownastheleapfrogmethod)tocalculate
thenewsolution.Wechooseatimestepthatmaintainsthestabilityofthisleapfrogmethod.
TheMATLABalgorithmiscomputationallyintensive,andasthenumberofelementsinthegridoverwhichwe
computethesolutiongrows,thetimethealgorithmtakestoexecuteincreasesdramatically.Whenexecutedon
asingleCPUusinga2048x2048grid,ittakesmorethanaminutetocompletejust50timesteps.Notethat
thistimealreadyincludestheperformancebenefitoftheinherentmultithreadinginMATLAB.SinceR2007a,
MATLABsupportsmultithreadedcomputationforanumberoffunctions.Thesefunctionsautomatically
executeonmultiplethreadswithouttheneedtoexplicitlyspecifycommandstocreatethreadsinyourcode.
WhenconsideringhowtoacceleratethiscomputationusingParallelComputingToolbox,wewillfocusonthe
codethatperformscomputationsforeachtimestep.Figure3illustratesthechangesrequiredtogetthe
algorithmrunningontheGPU.NotethatthecomputationsinvolveMATLABoperationsforwhichGPUenabled
overloadedfunctionsareavailablethroughParallelComputingToolbox.TheseoperationsincludeFFTand
IFFT,matrixmultiplication,andvariouselementwiseoperations.Asaresult,wedonotneedtochangethe
algorithminanywaytoexecuteitonaGPU.WesimplytransferthedatatotheGPUusinggpuArraybefore
enteringtheloopthatcomputesresultsateachtimestep.

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

3/6

12/19/2014

GPU Programming in MATLAB

Figure3.CodeComparisonToolshowingthedifferencesintheCPUandGPUversionsofthecode.TheGPUandCPU
versionsshareover84%oftheircodeincommon(94linesoutof111).

AfterthecomputationsareperformedontheGPU,wetransfertheresultsfromtheGPUtotheCPU.Each
variablereferencedbytheGPUenabledfunctionsmustbecreatedontheGPUortransferredtotheGPU
beforeitisused.
ToconvertoneoftheweightsusedforspectraldifferentiationtoaGPUArrayvariable,weuse
W1T = gpuArray(W1T);

CertaintypesofarrayscanbeconstructeddirectlyontheGPUwithoutourhavingtotransferthemfromthe
MATLABworkspace.Forexample,tocreateamatrixofzerosdirectlyontheGPU,weuse
uxx = parallel.gpu.GPUArray.zeros(N+1,N+1);

WeusethegatherfunctiontobringdatabackfromtheGPUforexample:
vvg = gather(vv);

NotethatthereisasingletransferofdatatotheGPU,followedbyasingletransferofdatafromtheGPU.All
thecomputationsforeachtimestepareperformedontheGPU.

ComparingCPUandGPUExecutionSpeeds
ToevaluatethebenefitsofusingtheGPUtosolvesecondorderwaveequations,weranabenchmarkstudy
inwhichwemeasuredtheamountoftimethealgorithmtooktoexecute50timestepsforgridsizesof64,128,
512,1024,and2048onanIntelXeonProcessorX5650andthenusinganNVIDIATeslaC2050GPU.
Foragridsizeof2048,thealgorithmshowsa7.5xdecreaseincomputetimefrommorethanaminuteonthe
CPUtolessthan10secondsontheGPU(Figure4).ThelogscaleplotshowsthattheCPUisactuallyfaster
forsmallgridsizes.Asthetechnologyevolvesandmatures,however,GPUsolutionsareincreasinglyableto
handlesmallerproblems,atrendthatweexpecttocontinue.

Figure4.Plotofbenchmarkresultsshowingthetimerequiredtocomplete50timestepsfordifferentgridsizes,using
eitheralinearscale(left)oralogscale(right).

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

4/6

12/19/2014

GPU Programming in MATLAB

AdvancedGPUProgrammingwithMATLAB
ParallelComputingToolboxprovidesastraightforwardwaytospeedupMATLABcodebyexecutingitona
GPU.Yousimplychangethedatatypeofafunction'sinputtotakeadvantageofthemanyMATLAB
commandsthathavebeenoverloadedforGPUArrays.(AcompletelistofbuiltinMATLABfunctionsthat
supportGPUArrayisavailableintheParallelComputingToolboxdocumentation
(http://www.mathworks.com/help/toolbox/distcomp/bsic4fr1.html#bsloua31).)
ToaccelerateanalgorithmwithmultiplesimpleoperationsonaGPU,youcanusearrayfun,whichappliesa
functiontoeachelementofanarray.BecausearrayfunisaGPUenabledfunction,youincurthememory
transferoverheadonlyonthesinglecalltoarrayfun,notoneachindividualoperation.
Finally,experiencedprogrammerswhowritetheirownCUDAcodecanusetheCUDAKernelinterfacein
ParallelComputingToolboxtointegratethiscodewithMATLAB.TheCUDAKernelinterfaceenableseven
morefinegrainedcontroltospeedupportionsofcodethatwereperformancebottlenecks.Itcreatesa
MATLABobjectthatprovidesaccesstoyourexistingkernelcompiledintoPTXcode(PTXisalowlevelparallel
threadexecutioninstructionset).YoutheninvokethefevalcommandtoevaluatethekernelontheGPU,
usingMATLABarraysasinputandoutput.

Summary
EngineersandscientistsaresuccessfullyemployingGPUtechnology,originallyintendedforaccelerating
graphicsrendering,toacceleratetheirdisciplinespecificcalculations.Withminimaleffortandwithoutextensive
knowledgeofGPUs,youcannowusethepromisingpowerofGPUswithMATLAB.GPUArraysandGPU
enabledMATLABfunctionshelpyouspeedupMATLABoperationswithoutlowlevelCUDAprogramming.If
youarealreadyfamiliarwithprogrammingforGPUs,MATLABalsoletsyouintegrateyourexistingCUDA
kernelsintoMATLABapplicationswithoutrequiringanyadditionalCprogramming.
ToachievespeedupswiththeGPUs,yourapplicationmustsatisfysomecriteria,amongthemthefactthat
sendingthedatabetweentheCPUandGPUmusttakelesstimethantheperformancegainedbyrunningon
theGPU.Ifyourapplicationsatisfiesthesecriteria,itisagoodcandidatefortherangeofGPUfunctionality
availablewithMATLAB.
GPUGlossary
CPU(centralprocessingunit).Thecentralunitinacomputerresponsibleforcalculationsandforcontrollingor
supervisingotherpartsofthecomputer.TheCPUperformslogicalandfloatingpointoperationsondataheldinthe
computermemory.
GPU(graphicsprocessingunit).Programmablechiporiginallyintendedforgraphicsrendering.Thehighlyparallel
structureofaGPUmakesthemmoreeffectivethangeneralpurposeCPUsforalgorithmswhereprocessingoflarge
blocksofdataisdoneinparallel.
Core.AsingleindependentcomputationalunitwithinaCPUorGPUchip.CPUandGPUcoresarenotequivalentto
eachotherGPUcoresperformspecializedoperationswhereasCPUcoresaredesignedforgeneralpurposeprograms.
CUDA.AparallelcomputingtechnologyfromNVIDIAthatconsistsofaparallelcomputingarchitectureanddeveloper
tools,libraries,andprogrammingdirectivesforGPUcomputing.
Device.AhardwarecardcontainingtheGPUanditsassociatedmemory.
Host.TheCPUandsystemmemory.
Kernel.CodewrittenforexecutionontheGPU.Kernelsarefunctionsthatcanrunonalargenumberofthreads.The
parallelismarisesfromeachthreadindependentlyrunningthesameprogramondifferentdata.

Published201191967v01

References
1.SeeChapter6(MemoryOptimization)oftheNVIDIACUDACBestPracticesdocumentationforfurtherinformation
aboutpotentialGPUcomputingbottlenecksandoptimizationofGPUmemoryaccess.
2.SeeChapter6(MemoryOptimization)oftheNVIDIACUDACBestPracticesdocumentationforfurtherinformation
aboutimprovingperformancebyminimizingdatatransfers.

ProductsUsed

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

5/6

12/19/2014

GPU Programming in MATLAB

MATLAB(http://www.mathworks.com/products/matlab)
ParallelComputingToolbox(http://www.mathworks.com/products/parallelcomputing)

LearnMore
SpectralMethods,LloydN.Trefethen(http://www.mathworks.com/support/books/book48110.html?
category=6&language=1&view=category)
IntroductiontoMATLABGPUComputing(http://www.mathworks.com/discovery/matlabgpu.html)
AcceleratingSignalProcessingAlgorithmswithGPUsandMATLAB
(http://www.mathworks.com/discovery/gpusignalprocessing.html)

Thispagewasprintedfrom:http://www.mathworks.com/company/newsletters/articles/gpuprogramminginmatlab.html

19942014TheMathWorks,Inc.

http://www.mathworks.com/company/newsletters/articles/gpu-programming-in-matlab.html

6/6

Das könnte Ihnen auch gefallen