Beruflich Dokumente
Kultur Dokumente
POSIXThreadsProgramming
Author:BlaiseBarney,LawrenceLivermoreNationalLaboratory
UCRLMI133316
TableofContents
1. Abstract
2. PthreadsOverview
1. WhatisaThread?
2. WhatarePthreads?
3. WhyPthreads?
4. DesigningThreadedPrograms
3. ThePthreadsAPI
4. CompilingThreadedPrograms
5. ThreadManagement
1. CreatingandTerminatingThreads
2. PassingArgumentstoThreads
3. JoiningandDetachingThreads
4. StackManagement
5. MiscellaneousRoutines
6. Exercise1
7. MutexVariables
1. MutexVariablesOverview
2. CreatingandDestroyingMutexes
3. LockingandUnlockingMutexes
8. ConditionVariables
1. ConditionVariablesOverview
2. CreatingandDestroyingConditionVariables
3. WaitingandSignalingonConditionVariables
9. Monitoring,DebuggingandPerformanceAnalysisToolsforPthreads
10. LLNLSpecificInformationandRecommendations
11. TopicsNotCovered
12. Exercise2
13. ReferencesandMoreInformation
14. AppendixA:PthreadLibraryRoutinesReference
Abstract
Insharedmemorymultiprocessorarchitectures,threadscanbeusedtoimplementparallelism.Historically,hardwarevendorshaveimplementedtheir
ownproprietaryversionsofthreads,makingportabilityaconcernforsoftwaredevelopers.ForUNIXsystems,astandardizedClanguagethreads
programminginterfacehasbeenspecifiedbytheIEEEPOSIX1003.1cstandard.ImplementationsthatadheretothisstandardarereferredtoasPOSIX
threads,orPthreads.
Thetutorialbeginswithanintroductiontoconcepts,motivations,anddesignconsiderationsforusingPthreads.Eachofthethreemajorclassesof
routinesinthePthreadsAPIarethencovered:ThreadManagement,MutexVariables,andConditionVariables.Examplecodesareusedthroughoutto
demonstratehowtousemostofthePthreadsroutinesneededbyanewPthreadsprogrammer.ThetutorialconcludeswithadiscussionofLLNL
specificsandhowtomixMPIwithpthreads.Alabexercise,withnumerousexamplecodes(CLanguage)isalsoincluded.
Level/Prerequisites:Thistutorialisoneoftheeighttutorialsinthe4+day"UsingLLNL'sSupercomputers"workshop.Itisdealforthosewhoarenew
toparallelprogrammingwiththreads.AbasicunderstandingofparallelprogramminginCisrequired.ForthosewhoareunfamiliarwithParallel
Programmingingeneral,thematerialcoveredinEC3500:IntroductionToParallelComputingwouldbehelpful.
PthreadsOverview
WhatisaThread?
Technically,athreadisdefinedasanindependentstreamofinstructionsthatcanbescheduledtorunassuchbytheoperatingsystem.Butwhat
doesthismean?
Tothesoftwaredeveloper,theconceptofa"procedure"thatrunsindependentlyfromitsmainprogrammaybestdescribeathread.
https://computing.llnl.gov/tutorials/pthreads/
1/28
10/25/2014
Togoonestepfurther,imagineamainprogram(a.out)thatcontainsanumberofprocedures.Thenimaginealloftheseproceduresbeingableto
bescheduledtorunsimultaneouslyand/orindependentlybytheoperatingsystem.Thatwoulddescribea"multithreaded"program.
Howisthisaccomplished?
Beforeunderstandingathread,onefirstneedstounderstandaUNIXprocess.Aprocessiscreatedbytheoperatingsystem,andrequiresafair
amountof"overhead".Processescontaininformationaboutprogramresourcesandprogramexecutionstate,including:
ProcessID,processgroupID,userID,andgroupID
Environment
Workingdirectory.
Programinstructions
Registers
Stack
Heap
Filedescriptors
Signalactions
Sharedlibraries
Interprocesscommunicationtools(suchasmessagequeues,pipes,semaphores,orsharedmemory).
UNIXPROCESS
THREADSWITHINAUNIXPROCESS
Threadsuseandexistwithintheseprocessresources,yetareabletobescheduledbytheoperatingsystemandrunasindependententitieslargely
becausetheyduplicateonlythebareessentialresourcesthatenablethemtoexistasexecutablecode.
Thisindependentflowofcontrolisaccomplishedbecauseathreadmaintainsitsown:
Stackpointer
Registers
Schedulingproperties(suchaspolicyorpriority)
Setofpendingandblockedsignals
Threadspecificdata.
So,insummary,intheUNIXenvironmentathread:
Existswithinaprocessandusestheprocessresources
HasitsownindependentflowofcontrolaslongasitsparentprocessexistsandtheOSsupportsit
Duplicatesonlytheessentialresourcesitneedstobeindependentlyschedulable
Maysharetheprocessresourceswithotherthreadsthatactequallyindependently(anddependently)
Diesiftheparentprocessdiesorsomethingsimilar
Is"lightweight"becausemostoftheoverheadhasalreadybeenaccomplishedthroughthecreationofitsprocess.
Becausethreadswithinthesameprocessshareresources:
Changesmadebyonethreadtosharedsystemresources(suchasclosingafile)willbeseenbyallotherthreads.
Twopointershavingthesamevaluepointtothesamedata.
Readingandwritingtothesamememorylocationsispossible,andthereforerequiresexplicitsynchronizationbytheprogrammer.
PthreadsOverview
WhatarePthreads?
https://computing.llnl.gov/tutorials/pthreads/
2/28
10/25/2014
Historically,hardwarevendorshaveimplementedtheirownproprietaryversionsofthreads.Theseimplementationsdifferedsubstantiallyfrom
eachothermakingitdifficultforprogrammerstodevelopportablethreadedapplications.
Inordertotakefulladvantageofthecapabilitiesprovidedbythreads,astandardizedprogramminginterfacewasrequired.
ForUNIXsystems,thisinterfacehasbeenspecifiedbytheIEEEPOSIX1003.1cstandard(1995).
ImplementationsadheringtothisstandardarereferredtoasPOSIXthreads,orPthreads.
MosthardwarevendorsnowofferPthreadsinadditiontotheirproprietaryAPI's.
ThePOSIXstandardhascontinuedtoevolveandundergorevisions,includingthePthreadsspecification.
Someusefullinks:
standards.ieee.org/findstds/standard/1003.12008.html
www.opengroup.org/austin/papers/posix_faq.html
www.unix.org/version3/ieee_std.html
PthreadsaredefinedasasetofClanguageprogrammingtypesandprocedurecalls,implementedwithapthread.hheader/includefileanda
threadlibrarythoughthislibrarymaybepartofanotherlibrary,suchaslibc,insomeimplementations.
PthreadsOverview
WhyPthreads?
LightWeight:
Whencomparedtothecostofcreatingandmanagingaprocess,athreadcanbecreatedwithmuchlessoperatingsystemoverhead.Managing
threadsrequiresfewersystemresourcesthanmanagingprocesses.
Forexample,thefollowingtablecomparestimingresultsforthefork()subroutineandthepthread_create()subroutine.Timingsreflect
50,000process/threadcreations,wereperformedwiththetimeutility,andunitsareinseconds,nooptimizationflags.
Note:don'texpectthesytemandusertimestoadduptorealtime,becausetheseareSMPsystemswithmultipleCPUs/coresworkingonthe
problematthesametime.Atbest,theseareapproximationsrunonlocalmachines,pastandpresent.
Platform
fork()
real
user
pthread_create()
sys
real
user
sys
Intel2.6GHzXeonE52670(16cores/node)
8.1
0.1
2.9
0.9
0.2
0.3
Intel2.8GHzXeon5660(12cores/node)
4.4
0.4
4.3
0.7
0.2
0.5
AMD2.3GHzOpteron(16cores/node)
12.5
1.0
12.5
1.2
0.2
1.3
AMD2.4GHzOpteron(8cores/node)
17.6
2.2
15.7
1.4
0.3
1.3
IBM4.0GHzPOWER6(8cpus/node)
9.5
0.6
8.8
1.6
0.1
0.4
64.2
30.7
27.6
1.7
0.6
1.1
104.5
48.6
47.2
2.1
1.0
1.5
INTEL2.4GHzXeon(2cpus/node)
54.9
1.5
20.8
1.6
0.7
0.9
INTEL1.4GHzItanium2(4cpus/node)
54.5
1.1
22.2
2.0
1.2
0.6
IBM1.9GHzPOWER5p5575(8cpus/node)
IBM1.5GHzPOWER4(8cpus/node)
fork_vs_thread.txt
EfficientCommunications/DataExchange:
TheprimarymotivationforconsideringtheuseofPthreadsonamultiprocessorarchitectureistoachieveoptimumperformance.Inparticular,if
anapplicationisusingMPIforonnodecommunications,thereisapotentialthatperformancecouldbeimprovedbyusingPthreadsinstead.
MPIlibrariesusuallyimplementonnodetaskcommunicationviasharedmemory,whichinvolvesatleastonememorycopyoperation(process
toprocess).
ForPthreadsthereisnointermediatememorycopyrequiredbecausethreadssharethesameaddressspacewithinasingleprocess.Thereisno
datatransfer,perse.Itcanbeasefficientassimplypassingapointer.
Intheworstcasescenario,PthreadcommunicationsbecomemoreofacachetoCPUormemorytoCPUbandwidthissue.Thesespeedsare
muchhigherthanMPIsharedmemorycommunications.
Forexample:somelocalcomparisons,pastandpresent,areshownbelow:
PthreadsWorstCase
https://computing.llnl.gov/tutorials/pthreads/
3/28
10/25/2014
Platform
MPISharedMemoryBandwidth MemorytoCPUBandwidth
(GB/sec)
(GB/sec)
Intel2.6GHzXeonE52670
4.5
51.2
Intel2.8GHzXeon5660
5.6
32
AMD2.3GHzOpteron
1.8
5.3
AMD2.4GHzOpteron
1.2
5.3
IBM1.9GHzPOWER5p5575
4.1
16
IBM1.5GHzPOWER4
2.1
Intel2.4GHzXeon
0.3
4.3
Intel1.4GHzItanium2
1.8
6.4
OtherCommonReasons:
Threadedapplicationsofferpotentialperformancegainsandpracticaladvantagesovernonthreadedapplicationsinseveralotherways:
OverlappingCPUworkwithI/O:Forexample,aprogrammayhavesectionswhereitisperformingalongI/Ooperation.Whileonethread
iswaitingforanI/Osystemcalltocomplete,CPUintensiveworkcanbeperformedbyotherthreads.
Priority/realtimescheduling:taskswhicharemoreimportantcanbescheduledtosupersedeorinterruptlowerprioritytasks.
Asynchronouseventhandling:taskswhichserviceeventsofindeterminatefrequencyanddurationcanbeinterleaved.Forexample,aweb
servercanbothtransferdatafrompreviousrequestsandmanagethearrivalofnewrequests.
Aperfectexampleisthetypicalwebbrowser,wheremanyinterleavedtaskscanbehappeningatthesametime,andwheretaskscanvaryin
priority.
Anothergoodexampleisamodernoperatingsystem,whichmakesextensiveuseofthreads.AscreenshotoftheMSWindowsOSand
applicationsusingthreadsisshownbelow.
Clickonimageforalargerversion
PthreadsOverview
DesigningThreadedPrograms
ParallelProgramming:
Onmodern,multicoremachines,pthreadsareideallysuitedforparallelprogramming,andwhateverappliestoparallelprogrammingingeneral,
appliestoparallelpthreadsprograms.
Therearemanyconsiderationsfordesigningparallelprograms,suchas:
Whattypeofparallelprogrammingmodeltouse?
Problempartitioning
Loadbalancing
Communications
Datadependencies
Synchronizationandraceconditions
Memoryissues
https://computing.llnl.gov/tutorials/pthreads/
4/28
10/25/2014
I/Oissues
Programcomplexity
Programmereffort/costs/time
...
Coveringthesetopicsisbeyondthescopeofthistutorial,howeverinterestedreaderscanobtainaquickoverviewintheIntroductiontoParallel
Computingtutorial.
Ingeneralthough,inorderforaprogramtotakeadvantageofPthreads,itmustbeabletobeorganizedintodiscrete,independenttaskswhich
canexecuteconcurrently.Forexample,ifroutine1androutine2canbeinterchanged,interleavedand/oroverlappedinrealtime,theyare
candidatesforthreading.
Programshavingthefollowingcharacteristicsmaybewellsuitedforpthreads:
Workthatcanbeexecuted,ordatathatcanbeoperatedon,bymultipletaskssimultaneously:
BlockforpotentiallylongI/Owaits
UsemanyCPUcyclesinsomeplacesbutnotothers
Mustrespondtoasynchronousevents
Someworkismoreimportantthanotherwork(priorityinterrupts)
Severalcommonmodelsforthreadedprogramsexist:
Manager/worker:asinglethread,themanagerassignsworktootherthreads,theworkers.Typically,themanagerhandlesallinputand
parcelsoutworktotheothertasks.Atleasttwoformsofthemanager/workermodelarecommon:staticworkerpoolanddynamicworker
pool.
Pipeline:ataskisbrokenintoaseriesofsuboperations,eachofwhichishandledinseries,butconcurrently,byadifferentthread.An
automobileassemblylinebestdescribesthismodel.
Peer:similartothemanager/workermodel,butafterthemainthreadcreatesotherthreads,itparticipatesinthework.
SharedMemoryModel:
Allthreadshaveaccesstothesameglobal,sharedmemory
Threadsalsohavetheirownprivatedata
Programmersareresponsibleforsynchronizingaccess(protecting)globallyshareddata.
https://computing.llnl.gov/tutorials/pthreads/
5/28
10/25/2014
Threadsafeness:
Threadsafeness:inanutshell,refersanapplication'sabilitytoexecutemultiplethreadssimultaneouslywithout"clobbering"shareddataor
creating"race"conditions.
Forexample,supposethatyourapplicationcreatesseveralthreads,eachofwhichmakesacalltothesamelibraryroutine:
Thislibraryroutineaccesses/modifiesaglobalstructureorlocationinmemory.
Aseachthreadcallsthisroutineitispossiblethattheymaytrytomodifythisglobalstructure/memorylocationatthesametime.
Iftheroutinedoesnotemploysomesortofsynchronizationconstructstopreventdatacorruption,thenitisnotthreadsafe.
Theimplicationtousersofexternallibraryroutinesisthatifyouaren't100%certaintheroutineisthreadsafe,thenyoutakeyourchanceswith
problemsthatcouldarise.
Recommendation:Becarefulifyourapplicationuseslibrariesorotherobjectsthatdon'texplicitlyguaranteethreadsafeness.Whenindoubt,
assumethattheyarenotthreadsafeuntilprovenotherwise.Thiscanbedoneby"serializing"thecallstotheuncertainroutine,etc.
ThreadLimits:
AlthoughthePthreadsAPIisanANSI/IEEEstandard,implementationscan,andusuallydo,varyinwaysnotspecifiedbythestandard.
Becauseofthis,aprogramthatrunsfineononeplatform,mayfailorproducewrongresultsonanotherplatform.
Forexample,themaximumnumberofthreadspermitted,andthedefaultthreadstacksizearetwoimportantlimitstoconsiderwhendesigning
yourprogram.
Severalthreadlimitsarediscussedinmoredetaillaterinthistutorial.
https://computing.llnl.gov/tutorials/pthreads/
6/28
10/25/2014
ThePthreadsAPI
TheoriginalPthreadsAPIwasdefinedintheANSI/IEEEPOSIX1003.11995standard.ThePOSIXstandardhascontinuedtoevolveand
undergorevisions,includingthePthreadsspecification.
CopiesofthestandardcanbepurchasedfromIEEEordownloadedforfreefromothersitesonline.
ThesubroutineswhichcomprisethePthreadsAPIcanbeinformallygroupedintofourmajorgroups:
1. Threadmanagement:Routinesthatworkdirectlyonthreadscreating,detaching,joining,etc.Theyalsoincludefunctionstoset/query
threadattributes(joinable,schedulingetc.)
2. Mutexes:Routinesthatdealwithsynchronization,calleda"mutex",whichisanabbreviationfor"mutualexclusion".Mutexfunctions
provideforcreating,destroying,lockingandunlockingmutexes.Thesearesupplementedbymutexattributefunctionsthatsetormodify
attributesassociatedwithmutexes.
3. Conditionvariables:Routinesthataddresscommunicationsbetweenthreadsthatshareamutex.Baseduponprogrammerspecified
conditions.Thisgroupincludesfunctionstocreate,destroy,waitandsignalbaseduponspecifiedvariablevalues.Functionstoset/query
conditionvariableattributesarealsoincluded.
4. Synchronization:Routinesthatmanageread/writelocksandbarriers.
Namingconventions:Allidentifiersinthethreadslibrarybeginwithpthread_.Someexamplesareshownbelow.
RoutinePrefix
FunctionalGroup
pthread_
Threadsthemselvesandmiscellaneoussubroutines
pthread_attr_
Threadattributesobjects
pthread_mutex_
Mutexes
pthread_mutexattr_
Mutexattributesobjects.
pthread_cond_
Conditionvariables
pthread_condattr_
Conditionattributesobjects
pthread_key_
Threadspecificdatakeys
pthread_rwlock_
Read/writelocks
pthread_barrier_
Synchronizationbarriers
TheconceptofopaqueobjectspervadesthedesignoftheAPI.Thebasiccallsworktocreateormodifyopaqueobjectstheopaqueobjectscan
bemodifiedbycallstoattributefunctions,whichdealwithopaqueattributes.
ThePthreadsAPIcontainsaround100subroutines.Thistutorialwillfocusonasubsetofthesespecifically,thosewhicharemostlikelytobe
immediatelyusefultothebeginningPthreadsprogrammer.
Forportability,thepthread.hheaderfileshouldbeincludedineachsourcefileusingthePthreadslibrary.
ThecurrentPOSIXstandardisdefinedonlyfortheClanguage.FortranprogrammerscanusewrappersaroundCfunctioncalls.SomeFortran
compilersmayprovideaFortrampthreadsAPI.
AnumberofexcellentbooksaboutPthreadsareavailable.SeveralofthesearelistedintheReferencessectionofthistutorial.
CompilingThreadedPrograms
Severalexamplesofcompilecommandsusedforpthreadscodesarelistedinthetablebelow.
Compiler/Platform
INTEL
Linux
PGI
https://computing.llnl.gov/tutorials/pthreads/
CompilerCommand
Description
icc -pthread
icpc -pthread
C++
pgcc -lpthread
C
7/28
10/25/2014
Linux
pgCC -lpthread
C++
GNU
Linux,BlueGene
gcc -pthread
GNUC
g++ -pthread
GNUC++
IBM
BlueGene
bgxlc_r / bgcc_r
C(ANSI/nonANSI)
bgxlC_r, bgxlc++_r
C++
ThreadManagement
CreatingandTerminatingThreads
Routines:
pthread_create (thread,attr,start_routine,arg)
pthread_exit (status)
pthread_cancel (thread)
pthread_attr_init (attr)
pthread_attr_destroy (attr)
CreatingThreads:
Initially,yourmain()programcomprisesasingle,defaultthread.Allotherthreadsmustbeexplicitlycreatedbytheprogrammer.
pthread_createcreatesanewthreadandmakesitexecutable.Thisroutinecanbecalledanynumberoftimesfromanywherewithinyourcode.
pthread_createarguments:
thread:Anopaque,uniqueidentifierforthenewthreadreturnedbythesubroutine.
attr:Anopaqueattributeobjectthatmaybeusedtosetthreadattributes.Youcanspecifyathreadattributesobject,orNULLforthe
defaultvalues.
start_routine:theCroutinethatthethreadwillexecuteonceitiscreated.
arg:Asingleargumentthatmaybepassedtostart_routine.Itmustbepassedbyreferenceasapointercastoftypevoid.NULLmaybe
usedifnoargumentistobepassed.
Themaximumnumberofthreadsthatmaybecreatedbyaprocessisimplementationdependent.Programsthatattempttoexceedthelimitcan
failorproducewrongresults.
Queryingandsettingyourimplementation'sthreadlimitLinuxexampleshown.Demonstratesqueryingthedefault(soft)limitsandthensetting
themaximumnumberofprocesses(includingthreads)tothehardlimit.Thenverifyingthatthelimithasbeenoverridden.
bash/ksh/sh
$ ulimit -a
core file size
(blocks, -c) 16
data seg size
(kbytes, -d) unlimited
scheduling priority
(-e) 0
file size
(blocks, -f) unlimited
pending signals
(-i) 255956
max locked memory
(kbytes, -l) 64
max memory size
(kbytes, -m) unlimited
open files
(-n) 1024
pipe size
(512 bytes, -p) 8
POSIX message queues
(bytes, -q) 819200
real-time priority
(-r) 0
stack size
(kbytes, -s) unlimited
cpu time
(seconds, -t) unlimited
max user processes
(-u) 1024
virtual memory
(kbytes, -v) unlimited
file locks
(-x) unlimited
$ ulimit -Hu
7168
$ ulimit -u 7168
$ ulimit -a
core file size
data seg size
scheduling priority
file size
(blocks, -c) 16
(kbytes, -d) unlimited
(-e) 0
(blocks, -f) unlimited
https://computing.llnl.gov/tutorials/pthreads/
tcsh/csh
% limit
cputime
unlimited
filesize
unlimited
datasize
unlimited
stacksize
unlimited
coredumpsize 16 kbytes
memoryuse
unlimited
vmemoryuse unlimited
descriptors 1024
memorylocked 64 kbytes
maxproc
1024
% limit maxproc unlimited
% limit
cputime
unlimited
filesize
unlimited
datasize
unlimited
stacksize
unlimited
coredumpsize 16 kbytes
memoryuse
unlimited
vmemoryuse unlimited
descriptors 1024
memorylocked 64 kbytes
maxproc
7168
8/28
10/25/2014
pending signals
(-i) 255956
max locked memory
(kbytes, -l) 64
max memory size
(kbytes, -m) unlimited
open files
(-n) 1024
pipe size
(512 bytes, -p) 8
POSIX message queues
(bytes, -q) 819200
real-time priority
(-r) 0
stack size
(kbytes, -s) unlimited
cpu time
(seconds, -t) unlimited
max user processes
(-u) 7168
virtual memory
(kbytes, -v) unlimited
file locks
(-x) unlimited
Oncecreated,threadsarepeers,andmaycreateotherthreads.Thereisnoimpliedhierarchyordependencybetweenthreads.
ThreadAttributes:
Bydefault,athreadiscreatedwithcertainattributes.Someoftheseattributescanbechangedbytheprogrammerviathethreadattributeobject.
pthread_attr_initandpthread_attr_destroyareusedtoinitialize/destroythethreadattributeobject.
Otherroutinesarethenusedtoquery/setspecificattributesinthethreadattributeobject.Attributesinclude:
Detachedorjoinablestate
Schedulinginheritance
Schedulingpolicy
Schedulingparameters
Schedulingcontentionscope
Stacksize
Stackaddress
Stackguard(overflow)size
Someoftheseattributeswillbediscussedlater.
ThreadBindingandScheduling:
Question:Afterathreadhasbeencreated,howdoyouknowa)whenitwillbescheduledtorunbytheoperatingsystem,andb)which
processor/coreitwillrunon?
Answer
ThePthreadsAPIprovidesseveralroutinesthatmaybeusedtospecifyhowthreadsarescheduledforexecution.Forexample,threadscanbe
scheduledtorunFIFO(firstinfirstout),RR(roundrobin)orOTHER(operatingsystemdetermines).Italsoprovidestheabilitytosetathread's
schedulingpriorityvalue.
Thesetopicsarenotcoveredhere,howeveragoodoverviewof"howthingswork"underLinuxcanbefoundinthesched_setschedulerman
page.
ThePthreadsAPIdoesnotprovideroutinesforbindingthreadstospecificcpus/cores.However,localimplementationsmayincludethis
functionalitysuchasprovidingthenonstandardpthread_setaffinity_nproutine.Notethat"_np"inthenamestandsfor"nonportable".
Also,thelocaloperatingsystemmayprovideawaytodothis.Forexample,Linuxprovidesthesched_setaffinityroutine.
TerminatingThreads&pthread_exit():
Thereareseveralwaysinwhichathreadmaybeterminated:
Thethreadreturnsnormallyfromitsstartingroutine.It'sworkisdone.
https://computing.llnl.gov/tutorials/pthreads/
9/28
10/25/2014
Thethreadmakesacalltothepthread_exitsubroutinewhetheritsworkisdoneornot.
Thethreadiscanceledbyanotherthreadviathepthread_cancelroutine.
Theentireprocessisterminatedduetomakingacalltoeithertheexec()orexit()
Ifmain()finishesfirst,withoutcallingpthread_exitexplicitlyitself
Thepthread_exit()routineallowstheprogrammertospecifyanoptionalterminationstatusparameter.Thisoptionalparameteristypically
returnedtothreads"joining"theterminatedthread(coveredlater).
Insubroutinesthatexecutetocompletionnormally,youcanoftendispensewithcallingpthread_exit()unless,ofcourse,youwanttopass
theoptionalstatuscodeback.
Cleanup:thepthread_exit()routinedoesnotclosefilesanyfilesopenedinsidethethreadwillremainopenafterthethreadisterminated.
Discussiononcallingpthread_exit()frommain():
Thereisadefiniteproblemifmain()finishesbeforethethreadsitspawnedifyoudon'tcallpthread_exit()explicitly.Allofthethreads
itcreatedwillterminatebecausemain()isdoneandnolongerexiststosupportthethreads.
Byhavingmain()explicitlycallpthread_exit()asthelastthingitdoes,main()willblockandbekeptalivetosupportthethreadsit
createduntiltheyaredone.
Example:PthreadCreationandTermination
Thissimpleexamplecodecreates5threadswiththepthread_create()routine.Eachthreadprintsa"HelloWorld!"message,andthen
terminateswithacalltopthread_exit().
ExampleCodePthreadCreationandTermination
#include <pthread.h>
#include <stdio.h>
#define NUM_THREADS
ThreadManagement
PassingArgumentstoThreads
Thepthread_create()routinepermitstheprogrammertopassoneargumenttothethreadstartroutine.Forcaseswheremultiplearguments
mustbepassed,thislimitationiseasilyovercomebycreatingastructurewhichcontainsallofthearguments,andthenpassingapointertothat
structureinthepthread_create()routine.
Allargumentsmustbepassedbyreferenceandcastto(void*).
Question:Howcanyousafelypassdatatonewlycreatedthreads,giventheirnondeterministicstartupandscheduling?
Answer
https://computing.llnl.gov/tutorials/pthreads/
10/28
10/25/2014
Example1ThreadArgumentPassing
Thiscodefragmentdemonstrateshowtopassasimpleintegertoeachthread.Thecallingthreadusesauniquedata
structureforeachthread,insuringthateachthread'sargumentremainsintactthroughouttheprogram.
long *taskids[NUM_THREADS];
for(t=0; t<NUM_THREADS; t++)
{
taskids[t] = (long *) malloc(sizeof(long));
*taskids[t] = t;
printf("Creating thread %ld\n", t);
rc = pthread_create(&threads[t], NULL, PrintHello, (void *) taskids[t]);
...
}
Example2ThreadArgumentPassing
Thisexampleshowshowtosetup/passmultipleargumentsviaastructure.Eachthreadreceivesauniqueinstanceofthe
structure.
struct thread_data{
int thread_id;
int sum;
char *message;
};
struct thread_data thread_data_array[NUM_THREADS];
void *PrintHello(void *threadarg)
{
struct thread_data *my_data;
...
my_data = (struct thread_data *) threadarg;
taskid = my_data->thread_id;
sum = my_data->sum;
hello_msg = my_data->message;
...
}
int main (int argc, char *argv[])
{
...
thread_data_array[t].thread_id = t;
thread_data_array[t].sum = sum;
thread_data_array[t].message = messages[t];
rc = pthread_create(&threads[t], NULL, PrintHello,
(void *) &thread_data_array[t]);
...
}
Example3ThreadArgumentPassing(Incorrect)
Thisexampleperformsargumentpassingincorrectly.Itpassestheaddressofvariablet,whichissharedmemoryspace
andvisibletoallthreads.Astheloopiterates,thevalueofthismemorylocationchanges,possiblybeforethecreated
threadscanaccessit.
int rc;
long t;
for(t=0; t<NUM_THREADS; t++)
{
printf("Creating thread %ld\n", t);
rc = pthread_create(&threads[t], NULL, PrintHello, (void *) &t);
...
}
ThreadManagement
JoiningandDetachingThreads
https://computing.llnl.gov/tutorials/pthreads/
11/28
10/25/2014
Routines:
pthread_join (threadid,status)
pthread_detach (threadid)
pthread_attr_setdetachstate (attr,detachstate)
pthread_attr_getdetachstate (attr,detachstate)
Joining:
"Joining"isonewaytoaccomplishsynchronizationbetweenthreads.Forexample:
Thepthread_join()subroutineblocksthecallingthreaduntilthespecifiedthreadidthreadterminates.
Theprogrammerisabletoobtainthetargetthread'sterminationreturnstatusifitwasspecifiedinthetargetthread'scalltopthread_exit().
Ajoiningthreadcanmatchonepthread_join()call.Itisalogicalerrortoattemptmultiplejoinsonthesamethread.
Twoothersynchronizationmethods,mutexesandconditionvariables,willbediscussedlater.
JoinableorNot?
Whenathreadiscreated,oneofitsattributesdefineswhetheritisjoinableordetached.Onlythreadsthatarecreatedasjoinablecanbejoined.If
athreadiscreatedasdetached,itcanneverbejoined.
ThefinaldraftofthePOSIXstandardspecifiesthatthreadsshouldbecreatedasjoinable.
Toexplicitlycreateathreadasjoinableordetached,theattrargumentinthepthread_create()routineisused.Thetypical4stepprocessis:
1. Declareapthreadattributevariableofthepthread_attr_tdatatype
2. Initializetheattributevariablewithpthread_attr_init()
3. Settheattributedetachedstatuswithpthread_attr_setdetachstate()
4. Whendone,freelibraryresourcesusedbytheattributewithpthread_attr_destroy()
Detaching:
Thepthread_detach()routinecanbeusedtoexplicitlydetachathreadeventhoughitwascreatedasjoinable.
Thereisnoconverseroutine.
Recommendations:
Ifathreadrequiresjoining,considerexplicitlycreatingitasjoinable.Thisprovidesportabilityasnotallimplementationsmaycreatethreadsas
joinablebydefault.
Ifyouknowinadvancethatathreadwillneverneedtojoinwithanotherthread,considercreatingitinadetachedstate.Somesystemresources
maybeabletobefreed.
Example:PthreadJoining
ExampleCodePthreadJoining
Thisexampledemonstrateshowto"wait"forthreadcompletionsbyusingthePthreadjoinroutine.Sincesome
implementationsofPthreadsmaynotcreatethreadsinajoinablestate,thethreadsinthisexampleareexplicitlycreatedin
ajoinablestatesothattheycanbejoinedlater.
#include <pthread.h>
#include <stdio.h>
https://computing.llnl.gov/tutorials/pthreads/
12/28
10/25/2014
#include <stdlib.h>
#include <math.h>
#define NUM_THREADS
ThreadManagement
StackManagement
Routines:
pthread_attr_getstacksize (attr, stacksize)
pthread_attr_setstacksize (attr, stacksize)
pthread_attr_getstackaddr (attr, stackaddr)
pthread_attr_setstackaddr (attr, stackaddr)
PreventingStackProblems:
ThePOSIXstandarddoesnotdictatethesizeofathread'sstack.Thisisimplementationdependentandvaries.
Exceedingthedefaultstacklimitisoftenveryeasytodo,withtheusualresults:programterminationand/orcorrupteddata.
https://computing.llnl.gov/tutorials/pthreads/
13/28
10/25/2014
Safeandportableprogramsdonotdependuponthedefaultstacklimit,butinstead,explicitlyallocateenoughstackforeachthreadbyusingthe
pthread_attr_setstacksizeroutine.
Thepthread_attr_getstackaddrandpthread_attr_setstackaddrroutinescanbeusedbyapplicationsinanenvironmentwherethestack
forathreadmustbeplacedinsomeparticularregionofmemory.
SomePracticalExamplesatLC:
Defaultthreadstacksizevariesgreatly.Themaximumsizethatcanbeobtainedalsovariesgreatly,andmaydependuponthenumberofthreads
pernode.
Bothpastandpresentarchitecturesareshowntodemonstratethewidevariationindefaultthreadstacksize.
Node
Architecture
IntelXeonE52670
16
32
2,097,152
IntelXeon5660
12
24
2,097,152
AMDOpteron
16
2,097,152
IntelIA64
33,554,432
IntelIA32
2,097,152
IBMPower5
32
196,608
IBMPower4
16
196,608
IBMPower3
16
16
98,304
Example:StackManagement
ExampleCodeStackManagement
Thisexampledemonstrateshowtoqueryandsetathread'sstacksize.
#include <pthread.h>
#include <stdio.h>
#define NTHREADS 4
#define N 1000
#define MEGEXTRA 1000000
pthread_attr_t attr;
void *dowork(void *threadid)
{
double A[N][N];
int i,j;
long tid;
size_t mystacksize;
tid = (long)threadid;
pthread_attr_getstacksize (&attr, &mystacksize);
printf("Thread %ld: stack size = %li bytes \n", tid, mystacksize);
for (i=0; i<N; i++)
for (j=0; j<N; j++)
A[i][j] = ((i*j)/3.452) + (N-i);
pthread_exit(NULL);
https://computing.llnl.gov/tutorials/pthreads/
14/28
10/25/2014
ThreadManagement
MiscellaneousRoutines
pthread_self ()
pthread_equal (thread1,thread2)
pthread_selfreturnstheunique,systemassignedthreadIDofthecallingthread.
pthread_equalcomparestwothreadIDs.IfthetwoIDsaredifferent0isreturned,otherwiseanonzerovalueisreturned.
Notethatforbothoftheseroutines,thethreadidentifierobjectsareopaqueandcannotbeeasilyinspected.BecausethreadIDsareopaque
objects,theClanguageequivalenceoperator==shouldnotbeusedtocomparetwothreadIDsagainsteachother,ortocompareasinglethread
IDagainstanothervalue.
pthread_once (once_control, init_routine)
pthread_onceexecutestheinit_routineexactlyonceinaprocess.Thefirstcalltothisroutinebyanythreadintheprocessexecutesthegiven
init_routine,withoutparameters.Anysubsequentcallwillhavenoeffect.
Theinit_routineroutineistypicallyaninitializationroutine.
Theonce_controlparameterisasynchronizationcontrolstructurethatrequiresinitializationpriortocallingpthread_once.Forexample:
pthread_once_t once_control = PTHREAD_ONCE_INIT;
PthreadExercise1
GettingStartedandThreadManagementRoutines
Overview:
LogintoanLCclusterusingyourworkshopusernameandOTPtoken
Copytheexercisefilestoyourhomedirectory
FamiliarizeyourselfwithLC'sPthreadsenvironment
Writeasimple"HelloWorld"Pthreadsprogram
Successfullycompileyourprogram
Successfullyrunyourprogramseveraldifferentways
Review,compile,runand/ordebugsomerelatedPthreadsprograms(provided)
GOTOTHEEXERCISEHERE
MutexVariables
Overview
Mutexisanabbreviationfor"mutualexclusion".Mutexvariablesareoneoftheprimarymeansofimplementingthreadsynchronizationandfor
protectingshareddatawhenmultiplewritesoccur.
https://computing.llnl.gov/tutorials/pthreads/
15/28
10/25/2014
Amutexvariableactslikea"lock"protectingaccesstoashareddataresource.ThebasicconceptofamutexasusedinPthreadsisthatonlyone
threadcanlock(orown)amutexvariableatanygiventime.Thus,evenifseveralthreadstrytolockamutexonlyonethreadwillbesuccessful.
Nootherthreadcanownthatmutexuntiltheowningthreadunlocksthatmutex.Threadsmust"taketurns"accessingprotecteddata.
Mutexescanbeusedtoprevent"race"conditions.Anexampleofaraceconditioninvolvingabanktransactionisshownbelow:
Thread1
Thread2
Balance
Readbalance:$1000
$1000
Readbalance:$1000
$1000
Deposit$200
$1000
Deposit$200
$1000
Updatebalance$1000+$200
$1200
Updatebalance$1000+$200
$1200
Intheaboveexample,amutexshouldbeusedtolockthe"Balance"whileathreadisusingthisshareddataresource.
Veryoftentheactionperformedbyathreadowningamutexistheupdatingofglobalvariables.Thisisasafewaytoensurethatwhenseveral
threadsupdatethesamevariable,thefinalvalueisthesameaswhatitwouldbeifonlyonethreadperformedtheupdate.Thevariablesbeing
updatedbelongtoa"criticalsection".
Atypicalsequenceintheuseofamutexisasfollows:
Createandinitializeamutexvariable
Severalthreadsattempttolockthemutex
Onlyonesucceedsandthatthreadownsthemutex
Theownerthreadperformssomesetofactions
Theownerunlocksthemutex
Anotherthreadacquiresthemutexandrepeatstheprocess
Finallythemutexisdestroyed
Whenseveralthreadscompeteforamutex,thelosersblockatthatcallanunblockingcallisavailablewith"trylock"insteadofthe"lock"call.
Whenprotectingshareddata,itistheprogrammer'sresponsibilitytomakesureeverythreadthatneedstouseamutexdoesso.Forexample,if4
threadsareupdatingthesamedata,butonlyoneusesamutex,thedatacanstillbecorrupted.
MutexVariables
CreatingandDestroyingMutexes
Routines:
pthread_mutex_init (mutex,attr)
pthread_mutex_destroy (mutex)
pthread_mutexattr_init (attr)
pthread_mutexattr_destroy (attr)
Usage:
Mutexvariablesmustbedeclaredwithtypepthread_mutex_t,andmustbeinitializedbeforetheycanbeused.Therearetwowaystoinitializea
mutexvariable:
1. Statically,whenitisdeclared.Forexample:
pthread_mutex_t mymutex = PTHREAD_MUTEX_INITIALIZER;
2. Dynamically,withthepthread_mutex_init()routine.Thismethodpermitssettingmutexobjectattributes,attr.
Themutexisinitiallyunlocked.
Theattrobjectisusedtoestablishpropertiesforthemutexobject,andmustbeoftypepthread_mutexattr_tifused(maybespecifiedas
NULLtoacceptdefaults).ThePthreadsstandarddefinesthreeoptionalmutexattributes:
Protocol:Specifiestheprotocolusedtopreventpriorityinversionsforamutex.
Prioceiling:Specifiesthepriorityceilingofamutex.
Processshared:Specifiestheprocesssharingofamutex.
Notethatnotallimplementationsmayprovidethethreeoptionalmutexattributes.
https://computing.llnl.gov/tutorials/pthreads/
16/28
10/25/2014
Thepthread_mutexattr_init()andpthread_mutexattr_destroy()routinesareusedtocreateanddestroymutexattributeobjects
respectively.
pthread_mutex_destroy()shouldbeusedtofreeamutexobjectwhichisnolongerneeded.
MutexVariables
LockingandUnlockingMutexes
Routines:
pthread_mutex_lock (mutex)
pthread_mutex_trylock (mutex)
pthread_mutex_unlock (mutex)
Usage:
Thepthread_mutex_lock()routineisusedbyathreadtoacquirealockonthespecifiedmutexvariable.Ifthemutexisalreadylockedby
anotherthread,thiscallwillblockthecallingthreaduntilthemutexisunlocked.
pthread_mutex_trylock()willattempttolockamutex.However,ifthemutexisalreadylocked,theroutinewillreturnimmediatelywitha
"busy"errorcode.Thisroutinemaybeusefulinpreventingdeadlockconditions,asinapriorityinversionsituation.
pthread_mutex_unlock()willunlockamutexifcalledbytheowningthread.Callingthisroutineisrequiredafterathreadhascompleteditsuse
ofprotecteddataifotherthreadsaretoacquirethemutexfortheirworkwiththeprotecteddata.Anerrorwillbereturnedif:
Ifthemutexwasalreadyunlocked
Ifthemutexisownedbyanotherthread
Thereisnothing"magical"aboutmutexes...infacttheyareakintoa"gentlemen'sagreement"betweenparticipatingthreads.Itisuptothecode
writertoinsurethatthenecessarythreadsallmakethethemutexlockandunlockcallscorrectly.Thefollowingscenariodemonstratesalogical
error:
Thread 1
Lock
A=2
Unlock
Thread 2
Lock
A = A+1
Unlock
Thread 3
A = A*B
Question:Whenmorethanonethreadiswaitingforalockedmutex,whichthreadwillbegrantedthelockfirstafteritisreleased?
Answer
Example:UsingMutexes
ExampleCodeUsingMutexes
Thisexampleprogramillustratestheuseofmutexvariablesinathreadsprogramthatperformsadotproduct.Themain
dataismadeavailabletoallthreadsthroughagloballyaccessiblestructure.Eachthreadworksonadifferentpartofthe
data.Themainthreadwaitsforallthethreadstocompletetheircomputations,andthenitprintstheresultingsum.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
/*
The following structure contains the necessary information
to allow the function "dotprod" to access its input data and
place its output into the structure.
*/
typedef struct
{
double
*a;
double
*b;
double
sum;
int
veclen;
} DOTDATA;
/* Define globally accessible variables and a mutex */
#define NUMTHRDS 4
#define VECLEN 100
DOTDATA dotstr;
pthread_t callThd[NUMTHRDS];
https://computing.llnl.gov/tutorials/pthreads/
17/28
10/25/2014
/*
The function dotprod is activated when the thread is created.
All input to this routine is obtained from a structure
of type DOTDATA and all output from this function is written into
this structure. The benefit of this approach is apparent for the
multi-threaded program: when a thread is created we pass a single
argument to the activated function - typically this argument
is a thread number. All the other information required by the
function is accessed from the globally accessible structure.
*/
void *dotprod(void *arg)
{
/* Define and use local variables for convenience */
int i, start, end, len ;
long offset;
double mysum, *x, *y;
offset = (long)arg;
len = dotstr.veclen;
start = offset*len;
end = start + len;
x = dotstr.a;
y = dotstr.b;
/*
Perform the dot product and assign result
to the appropriate variable in the structure.
*/
mysum = 0;
for (i=start; i<end ; i++)
{
mysum += (x[i] * y[i]);
}
/*
Lock a mutex prior to updating the value in the shared
structure, and unlock it upon updating.
*/
pthread_mutex_lock (&mutexsum);
dotstr.sum += mysum;
pthread_mutex_unlock (&mutexsum);
}
pthread_exit((void*) 0);
/*
The main program creates threads which do all the work and then
print out result upon completion. Before creating the threads,
the input data is created. Since all threads update a shared structure,
we need a mutex for mutual exclusion. The main thread needs to wait for
all threads to complete, it waits for each one of the threads. We specify
a thread attribute value that allow the main thread to join with the
threads it creates. Note also that we free up handles when they are
no longer needed.
*/
int main (int argc, char *argv[])
{
long i;
double *a, *b;
void *status;
pthread_attr_t attr;
/* Assign storage and initialize values */
a = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
b = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
for (i=0; i<VECLEN*NUMTHRDS; i++)
{
a[i]=1.0;
b[i]=a[i];
}
dotstr.veclen = VECLEN;
dotstr.a = a;
dotstr.b = b;
dotstr.sum=0;
pthread_mutex_init(&mutexsum, NULL);
/* Create threads to perform the dotproduct */
https://computing.llnl.gov/tutorials/pthreads/
18/28
10/25/2014
Serialversion
Pthreadsversion
ConditionVariables
Overview
Conditionvariablesprovideyetanotherwayforthreadstosynchronize.Whilemutexesimplementsynchronizationbycontrollingthreadaccess
todata,conditionvariablesallowthreadstosynchronizebasedupontheactualvalueofdata.
Withoutconditionvariables,theprogrammerwouldneedtohavethreadscontinuallypolling(possiblyinacriticalsection),tocheckifthe
conditionismet.Thiscanbeveryresourceconsumingsincethethreadwouldbecontinuouslybusyinthisactivity.Aconditionvariableisaway
toachievethesamegoalwithoutpolling.
Aconditionvariableisalwaysusedinconjunctionwithamutexlock.
Arepresentativesequenceforusingconditionvariablesisshownbelow.
MainThread
Declareandinitializeglobaldata/variableswhichrequiresynchronization(suchas"count")
Declareandinitializeaconditionvariableobject
Declareandinitializeanassociatedmutex
CreatethreadsAandBtodowork
ThreadA
Doworkuptothepointwhereacertainconditionmust
occur(suchas"count"mustreachaspecifiedvalue)
Lockassociatedmutexandcheckvalueofaglobal
variable
Callpthread_cond_wait()toperformablockingwait
forsignalfromThreadB.Notethatacallto
pthread_cond_wait()automaticallyandatomically
unlockstheassociatedmutexvariablesothatitcanbe
usedbyThreadB.
Whensignalled,wakeup.Mutexisautomaticallyand
atomicallylocked.
Explicitlyunlockmutex
Continue
ThreadB
Dowork
Lockassociatedmutex
ChangethevalueoftheglobalvariablethatThreadAis
waitingupon.
CheckvalueoftheglobalThreadAwaitvariable.Ifit
fulfillsthedesiredcondition,signalThreadA.
Unlockmutex.
Continue
MainThread
Join/Continue
https://computing.llnl.gov/tutorials/pthreads/
19/28
10/25/2014
ConditionVariables
CreatingandDestroyingConditionVariables
Routines:
pthread_cond_init (condition,attr)
pthread_cond_destroy (condition)
pthread_condattr_init (attr)
pthread_condattr_destroy (attr)
Usage:
Conditionvariablesmustbedeclaredwithtypepthread_cond_t,andmustbeinitializedbeforetheycanbeused.Therearetwowaysto
initializeaconditionvariable:
1. Statically,whenitisdeclared.Forexample:
pthread_cond_t myconvar = PTHREAD_COND_INITIALIZER;
2. Dynamically,withthepthread_cond_init()routine.TheIDofthecreatedconditionvariableisreturnedtothecallingthreadthroughthe
conditionparameter.Thismethodpermitssettingconditionvariableobjectattributes,attr.
Theoptionalattrobjectisusedtosetconditionvariableattributes.Thereisonlyoneattributedefinedforconditionvariables:processshared,
whichallowstheconditionvariabletobeseenbythreadsinotherprocesses.Theattributeobject,ifused,mustbeoftypepthread_condattr_t
(maybespecifiedasNULLtoacceptdefaults).
Notethatnotallimplementationsmayprovidetheprocesssharedattribute.
Thepthread_condattr_init()andpthread_condattr_destroy()routinesareusedtocreateanddestroyconditionvariableattributeobjects.
pthread_cond_destroy()shouldbeusedtofreeaconditionvariablethatisnolongerneeded.
ConditionVariables
WaitingandSignalingonConditionVariables
Routines:
pthread_cond_wait (condition,mutex)
pthread_cond_signal (condition)
pthread_cond_broadcast (condition)
Usage:
pthread_cond_wait()blocksthecallingthreaduntilthespecifiedconditionissignalled.Thisroutineshouldbecalledwhilemutexislocked,
anditwillautomaticallyreleasethemutexwhileitwaits.Aftersignalisreceivedandthreadisawakened,mutexwillbeautomaticallylockedfor
usebythethread.Theprogrammeristhenresponsibleforunlockingmutexwhenthethreadisfinishedwithit.
Recommendation:UsingaWHILEloopinsteadofanIFstatement(seewatch_countroutineinexamplebelow)tocheckthewaitedfor
conditioncanhelpdealwithseveralpotentialproblems,suchas:
Ifseveralthreadsarewaitingforthesamewakeupsignal,theywilltaketurnsacquiringthemutex,andanyoneofthemcanthenmodify
theconditiontheyallwaitedfor.
Ifthethreadreceivedthesignalinerrorduetoaprogrambug
ThePthreadslibraryispermittedtoissuespuriouswakeupstoawaitingthreadwithoutviolatingthestandard.
Thepthread_cond_signal()routineisusedtosignal(orwakeup)anotherthreadwhichiswaitingontheconditionvariable.Itshouldbecalled
aftermutexislocked,andmustunlockmutexinorderforpthread_cond_wait()routinetocomplete.
Thepthread_cond_broadcast()routineshouldbeusedinsteadofpthread_cond_signal()ifmorethanonethreadisinablockingwaitstate.
Itisalogicalerrortocallpthread_cond_signal()beforecallingpthread_cond_wait().
Properlockingandunlockingoftheassociatedmutexvariableisessentialwhenusingtheseroutines.Forexample:
Failingtolockthemutexbeforecallingpthread_cond_wait()maycauseitNOTtoblock.
https://computing.llnl.gov/tutorials/pthreads/
20/28
10/25/2014
Failingtounlockthemutexaftercallingpthread_cond_signal()maynotallowamatchingpthread_cond_wait()routinetocomplete
(itwillremainblocked).
Example:UsingConditionVariables
ExampleCodeUsingConditionVariables
ThissimpleexamplecodedemonstratestheuseofseveralPthreadconditionvariableroutines.Themainroutinecreates
threethreads.Twoofthethreadsperformworkandupdatea"count"variable.Thethirdthreadwaitsuntilthecount
variablereachesaspecifiedvalue.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define NUM_THREADS 3
#define TCOUNT 10
#define COUNT_LIMIT 12
int
count = 0;
int
thread_ids[3] = {0,1,2};
pthread_mutex_t count_mutex;
pthread_cond_t count_threshold_cv;
void *inc_count(void *t)
{
int i;
long my_id = (long)t;
for (i=0; i<TCOUNT; i++) {
pthread_mutex_lock(&count_mutex);
count++;
/*
Check the value of count and signal waiting thread when condition is
reached. Note that this occurs while mutex is locked.
*/
if (count == COUNT_LIMIT) {
pthread_cond_signal(&count_threshold_cv);
printf("inc_count(): thread %ld, count = %d Threshold reached.\n",
my_id, count);
}
printf("inc_count(): thread %ld, count = %d, unlocking mutex\n",
my_id, count);
pthread_mutex_unlock(&count_mutex);
/*
Lock mutex and wait for signal. Note that the pthread_cond_wait
routine will automatically and atomically unlock mutex while it waits.
Also, note that if COUNT_LIMIT is reached before this routine is run by
the waiting thread, the loop will be skipped to prevent pthread_cond_wait
from never returning.
*/
pthread_mutex_lock(&count_mutex);
while (count<COUNT_LIMIT) {
pthread_cond_wait(&count_threshold_cv, &count_mutex);
printf("watch_count(): thread %ld Condition signal received.\n", my_id);
count += 125;
printf("watch_count(): thread %ld count now = %d.\n", my_id, count);
}
pthread_mutex_unlock(&count_mutex);
pthread_exit(NULL);
https://computing.llnl.gov/tutorials/pthreads/
21/28
10/25/2014
Monitoring,DebuggingandPerformanceAnalysisToolsforPthreads
MonitoringandDebuggingPthreads:
DebuggersvaryintheirabilitytohandlePthreads.TheTotalViewdebuggerisLC'srecommendeddebuggerforparallelprograms.Itiswell
suitedforbothmonitoringanddebuggingthreadedprograms.
AnexamplescreenshotfromaTotalViewsessionusingathreadedcodeisshownbelow.
1. StackTracePane:Displaysthecallstackofroutinesthattheselectedthreadisexecuting.
2. StatusBars:Showstatusinformationfortheselectedthreadanditsassociatedprocess.
3. StackFramePane:Showsaselectedthread'sstackvariables,registers,etc.
4. SourcePane:Showsthesourcecodefortheselectedthread.
5. RootWindowshowingallthreads
6. ThreadsPane:Showsthreadsassociatedwiththeselectedprocess
https://computing.llnl.gov/tutorials/pthreads/
22/28
10/25/2014
SeetheTotalViewDebuggertutorialfordetails.
TheLinuxpscommandprovidesseveralflagsforviewingthreadinformation.Someexamplesareshownbelow.Seethemanpagefordetails.
% ps -Lf
UID
PID PPID LWP C NLWP STIME TTY
blaise 22529 28240 22529 0
5 11:31 pts/53
blaise 22529 28240 22530 99
5 11:31 pts/53
blaise 22529 28240 22531 99
5 11:31 pts/53
blaise 22529 28240 22532 99
5 11:31 pts/53
blaise 22529 28240 22533 99
5 11:31 pts/53
% ps -T
PID SPID TTY
22529 22529 pts/53
22529 22530 pts/53
22529 22531 pts/53
22529 22532 pts/53
22529 22533 pts/53
TIME CMD
00:00:00 a.out
00:01:49 a.out
00:01:49 a.out
00:01:49 a.out
00:01:49 a.out
% ps -Lm
PID LWP TTY
22529
- pts/53
- 22529 - 22530 - 22531 - 22532 - 22533 -
TIME CMD
00:18:56 a.out
00:00:00 00:04:44 00:04:44 00:04:44 00:04:44 -
TIME CMD
00:00:00 a.out
00:01:24 a.out
00:01:24 a.out
00:01:24 a.out
00:01:24 a.out
LC'sLinuxclustersalsoprovidethetopcommandtomonitorprocessesonanode.Ifusedwiththe-Hflag,thethreadscontainedwithina
https://computing.llnl.gov/tutorials/pthreads/
23/28
10/25/2014
processwillbevisible.Anexampleofthetop -Hcommandisshownbelow.TheparentprocessisPID18010whichspawnedthreethreads,
shownasPIDs18012,18013and18014.
PerformanceAnalysisTools:
Thereareavarietyofperformanceanalysistoolsthatcanbeusedwiththreadedprograms.Searchingthewebwillturnupawealthof
information.
AtLC,thelistofsupportedcomputingtoolscanbefoundat:computing.llnl.gov/code/content/software_tools.php.
Thesetoolsvarysignificantlyintheircomplexity,functionalityandlearningcurve.Coveringthemindetailisbeyondthescopeofthistutorial.
Sometoolsworthinvestigating,specificallyforthreadedcodes,include:
Open|SpeedShop
TAU
PAPI
IntelVTuneAmplifier
ThreadSpotter
LLNLSpecificInformationandRecommendations
ThissectiondescribesdetailsspecifictoLivermoreComputing'ssystems.
Implementations:
AllLCproductionsystemsincludeaPthreadsimplementationthatfollowsdraft10(final)ofthePOSIXstandard.Thisisthepreferred
implementation.
Implementationsdifferinthemaximumnumberofthreadsthataprocessmaycreate.Theyalsodifferinthedefaultamountofthreadstack
space.
Compiling:
LCmaintainsanumberofcompilers,andusuallyseveraldifferentversionsofeachseetheLC'sSupportedCompilerswebpage.
ThecompilercommandsdescribedintheCompilingThreadedProgramssectionapplytoLCsystems.
MixingMPIwithPthreads:
ThisistheprimarymotivationforusingPthreadsatLC.
Design:
EachMPIprocesstypicallycreatesandthenmanagesNthreads,whereNmakesthebestuseoftheavailablecores/node.
FindingthebestvalueforNwillvarywiththeplatformandyourapplication'scharacteristics.
Ingeneral,theremaybeproblemsifmultiplethreadsmakeMPIcalls.Theprogrammayfailorbehaveunexpectedly.IfMPIcallsmustbe
madefromwithinathread,theyshouldbemadeonlybyonethread.
Compiling:
UsetheappropriateMPIcompilecommandfortheplatformandlanguageofchoice
BesuretoincludetherequiredPthreadsflagasshownintheCompilingThreadedProgramssection.
https://computing.llnl.gov/tutorials/pthreads/
24/28
10/25/2014
AnexamplecodethatusesbothMPIandPthreadsisavailablebelow.Theserial,threadsonly,MPIonlyandMPIwiththreadsversions
demonstrateonepossibleprogression.
Serial
Pthreadsonly
MPIonly
MPIwithpthreads
makefile
TopicsNotCovered
SeveralfeaturesofthePthreadsAPIarenotcoveredinthistutorial.Thesearelistedbelow.SeethePthreadLibraryRoutinesReferencesectionfor
moreinformation.
ThreadScheduling
Implementationswilldifferonhowthreadsarescheduledtorun.Inmostcases,thedefaultmechanismisadequate.
ThePthreadsAPIprovidesroutinestoexplicitlysetthreadschedulingpoliciesandprioritieswhichmayoverridethedefaultmechanisms.
TheAPIdoesnotrequireimplementationstosupportthesefeatures.
Keys:ThreadSpecificData
Asthreadscallandreturnfromdifferentroutines,thelocaldataonathread'sstackcomesandgoes.
Topreservestackdatayoucanusuallypassitasanargumentfromoneroutinetothenext,orelsestorethedatainaglobalvariable
associatedwithathread.
Pthreadsprovidesanother,possiblymoreconvenientandversatile,wayofaccomplishingthisthroughkeys.
MutexProtocolAttributesandMutexPriorityManagementforthehandlingof"priorityinversion"problems.
ConditionVariableSharingacrossprocesses
ThreadCancellation
ThreadsandSignals
Synchronizationconstructsbarriersandlocks
PthreadExercise2
Mutexes,ConditionVariablesandHybridMPIwithPthreads
Overview:
LogintotheLCworkshopcluster,ifyouarenotalreadyloggedin
Mutexes:reviewandruntheprovidedexamplecodes
Conditionvariables:reviewandruntheprovidedexamplecodes
HybridMPIwithPthreads:reviewandruntheprovidedexamplecodes
GOTOTHEEXERCISEHERE
Thiscompletesthetutorial.
Pleasecompletetheonlineevaluationformunlessyouaredoingtheexercise,inwhichcasepleasecompleteitattheendoftheexercise.
https://computing.llnl.gov/tutorials/pthreads/
25/28
10/25/2014
Wherewouldyouliketogonow?
Exercise
Agenda
Backtothetop
ReferencesandMoreInformation
Author:BlaiseBarney,LivermoreComputing.
POSIXStandard:www.unix.org/version3/ieee_std.html
"PthreadsProgramming".B.Nicholsetal.O'ReillyandAssociates.
"ThreadsPrimer".B.LewisandD.Berg.PrenticeHall
"ProgrammingWithPOSIXThreads".D.Butenhof.AddisonWesley
"ProgrammingWithThreads".S.Kleimanetal.PrenticeHall
AppendixA:PthreadLibraryRoutinesReference
Forconvenience,analphabeticallistofPthreadroutines,linkedtotheircorrespondingmanpage,isprovidedbelow.
pthread_atfork
pthread_attr_destroy
pthread_attr_getdetachstate
pthread_attr_getguardsize
pthread_attr_getinheritsched
pthread_attr_getschedparam
pthread_attr_getschedpolicy
pthread_attr_getscope
pthread_attr_getstack
pthread_attr_getstackaddr
pthread_attr_getstacksize
pthread_attr_init
pthread_attr_setdetachstate
pthread_attr_setguardsize
pthread_attr_setinheritsched
pthread_attr_setschedparam
pthread_attr_setschedpolicy
pthread_attr_setscope
pthread_attr_setstack
pthread_attr_setstackaddr
pthread_attr_setstacksize
pthread_barrier_destroy
pthread_barrier_init
pthread_barrier_wait
pthread_barrierattr_destroy
pthread_barrierattr_getpshared
pthread_barrierattr_init
pthread_barrierattr_setpshared
pthread_cancel
pthread_cleanup_pop
pthread_cleanup_push
pthread_cond_broadcast
pthread_cond_destroy
pthread_cond_init
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_condattr_destroy
pthread_condattr_getclock
pthread_condattr_getpshared
pthread_condattr_init
https://computing.llnl.gov/tutorials/pthreads/
26/28
10/25/2014
pthread_condattr_setclock
pthread_condattr_setpshared
pthread_create
pthread_detach
pthread_equal
pthread_exit
pthread_getconcurrency
pthread_getcpuclockid
pthread_getschedparam
pthread_getspecific
pthread_join
pthread_key_create
pthread_key_delete
pthread_kill
pthread_mutex_destroy
pthread_mutex_getprioceiling
pthread_mutex_init
pthread_mutex_lock
pthread_mutex_setprioceiling
pthread_mutex_timedlock
pthread_mutex_trylock
pthread_mutex_unlock
pthread_mutexattr_destroy
pthread_mutexattr_getprioceiling
pthread_mutexattr_getprotocol
pthread_mutexattr_getpshared
pthread_mutexattr_gettype
pthread_mutexattr_init
pthread_mutexattr_setprioceiling
pthread_mutexattr_setprotocol
pthread_mutexattr_setpshared
pthread_mutexattr_settype
pthread_once
pthread_rwlock_destroy
pthread_rwlock_init
pthread_rwlock_rdlock
pthread_rwlock_timedrdlock
pthread_rwlock_timedwrlock
pthread_rwlock_tryrdlock
pthread_rwlock_trywrlock
pthread_rwlock_unlock
pthread_rwlock_wrlock
pthread_rwlockattr_destroy
pthread_rwlockattr_getpshared
pthread_rwlockattr_init
pthread_rwlockattr_setpshared
pthread_self
pthread_setcancelstate
pthread_setcanceltype
pthread_setconcurrency
pthread_setschedparam
pthread_setschedprio
pthread_setspecific
pthread_sigmask
pthread_spin_destroy
pthread_spin_init
pthread_spin_lock
pthread_spin_trylock
pthread_spin_unlock
pthread_testcancel
https://computing.llnl.gov/tutorials/pthreads/
LastModified:07/24/201419:56:40blaiseb@llnl.gov
UCRLMI133316
https://computing.llnl.gov/tutorials/pthreads/
27/28
10/25/2014
https://computing.llnl.gov/tutorials/pthreads/
28/28