Sie sind auf Seite 1von 16

WhatissomeexistingdocumentationonLinuxmemorymanagement?

UlrichDrepper(theexglibcmaintainer)wroteanarticleseriescalled
"Whateveryprogrammershouldknowaboutmemory":
Part1:http://lwn.net/Articles/250967/
Part2:http://lwn.net/Articles/252125/
Part3:http://lwn.net/Articles/253361/
Part4:http://lwn.net/Articles/254445/
Part5:http://lwn.net/Articles/255364/
Part6:http://lwn.net/Articles/256433/
Part7:http://lwn.net/Articles/257209/
Part8:http://lwn.net/Articles/258154/
Part9:http://lwn.net/Articles/258188/
MelGorman'sbook"UnderstandingtheLinuxVirtualMemoryManager"is
availableonline:
http://kernel.org/doc/gorman/
Whatisvirtualmemory?
Virtualmemoryprovidesasoftwarecontrolledsetofmemoryaddresses,
allowingeachprocesstohaveitsownuniqueviewofacomputer'smemory.
Virtualaddressesonlymakesensewithinagivencontext,suchasa
specificprocess.Thesamevirtualaddresscansimultaneouslymean
differentthingsindifferentcontexts.
VirtualaddressesarethesizeofaCPUregister.On32bitsystemseach
processhas4gigabytesofvirtualaddressspacealltoitself,whichis
oftenmorememorythanthesystemactuallyhas.
Virtualaddressesareinterpretedbyaprocessor'sMemoryManagementUnit
(mmu),usingdatastructurescalledpagetableswhichmapvirtualaddress
rangestoassociatedcontent.
Virtualmemoryisusedtoimplementlazyallocation,swapping,filemapping,
copyonwritesharedmemory,defragmentation,andmore.
Fordetails,seeUlrichDrepper's"Whateveryprogrammershouldknow
aboutmemory,Part3:VirtualMemory":
http://lwn.net/Articles/253361/

Whatisphysicalmemory?
Physicalmemoryisstoragehardwarethatrecordsdatawithlowlatencyand
smallgranularity.Physicalmemoryaddressesarenumberssentacrossa
memorybustoidentifythespecificmemorycellwithinapieceof
storagehardwareassociatedwithagivenreadorwriteoperation.
ExamplesofstoragehardwareprovidingphysicalmemoryareDIMMs(DRAM),
SDmemorycards(flash),videocards(framebuffersandtexturememory),
networkcards(I/Obuffers),andsoon.
Onlythekernelusesphysicalmemoryaddressesdirectly.Userspace
programsexclusivelyusevirtualaddresses.
Fordetails,seetheARSTechnicaDRAMGuide:
http://arstechnica.com/paedia/r/ram_guide/ram_guide.part11.html
http://arstechnica.com/paedia/r/ram_guide/ram_guide.part21.html
http://arstechnica.com/paedia/r/ram_guide/ram_guide.part31.html
AndUlrichDrepper's"Whateveryprogrammershouldknowaboutmemory,
Part1":
http://lwn.net/Articles/250967/
WhatisaMemoryManagementUnit(MMU)?
ThememorymanagementunitisthepartoftheCPUthatinterpretsvirtual
addresses.Attemptstoread,write,orexecutememoryatvirtualaddresses
areeithertranslatedtocorrespondingphysicaladdresses,orelse
generateaninterrupt(pagefault)toallowsoftwaretorespondtothe
attemptedaccess.
Thisgiveseachprocessitsownvirtualmemoryaddressrange,whichis
limitedonlybyaddressspace(4gigabytesonmost32bitsystem),while
physicalmemoryislimitedbytheamountofavailablestoragehardware.
Physicalmemoryaddressesareuniqueinthesystem,virtualmemoryaddresses
areuniqueperprocess.
Whatarepagetables?
Pagetablesaredatastructureswhichcontainingaprocess'slistofmemory

mappingsandtrackassociatedresources.Eachprocesshasitsown
setofpagetables,andthekernelalsohasafewpagetableentriesfor
thingslikediskcache.
32bitLinuxsystemsusethreeleveltreestructurestorecordpagetables.
ThelevelsarethePageUpperDirectory(PUD),PageMiddleDirectory(PMD),
andPageTableEntry(PTE).(64bitLinuxcanuse4levelpagetables.)
Fordetails,see:
http://en.wikipedia.org/wiki/Page_table
Whatarememorymappings?
Amemorymappingisasetofpagetableentriesdescribingtheproperties
ofaconsecutivevirtualaddressrange.Eachmemorymappinghasa
startaddressandlength,permissions(suchaswhethertheprogramcan
read,write,orexecutefromthatmemory),andassociatedresources(such
asphysicalpages,swappages,filecontents,andsoon).
Creatingnewmemorymappingsallocatesvirtualmemory,butnotphysical
memory(exceptforasmallamountofphysicalmemoryneededtostorethe
pagetableitself).Physicalpagesareattachedtomemorymappingslater,
inresponsetopagefaults.Physicalmemoryisallocatedondemandbythe
pagefaulthandlerasnecessarytoresolvepagefaults.
Apagetablecanbethoughtofasadescriptionofasetofmemory
mappings.Eachmemorymappingcanbeanonymous,filebacked,devicebacked,
shared,orcopyonwrite.
Themmap()systemcalladdsanewmemorymappingtothecurrentprocess's
pagetables.Themunmap()systemcalldiscardsanexistingmapping.
Memorymappingscannotoverlap.Themmap()callreturnsanerrorifasked
tocreateoverlappingmemorymappings.
Virtualaddressrangesforwhichthereisnocurrentmemorymappingaresaid
tobe"unmapped",andattemptstoaccessthemgenerateapagefaultwhich
cannotbehandled.ThepagefaulthandlersendsaSegmentation
Violationsignal(SIGSEGV)totheprogramonanyaccesstounmapped
addresses.Thisisgenerallytheresultoffollowinga"wildpointer".
Notethatbydefault,Linuxintentionallyleavesthefirstfewkilobytes
(orevenmegabytes)ofeachprocess'svirtualaddressspaceunmapped,so

thatattemptstodereferencenullpointersgenerateanunhandledpage
faultresultinginanimmediateSIGSEGV,killingtheprocess.
Fordetails,seetheSingleUnixSpecificationversion4entryforthe
mmap()systemcall:
http://www.opengroup.org/onlinepubs/9699919799/functions/mmap.html
Andtheglibcmmap()documentation:
http://www.gnu.org/s/libc/manual/html_node/Memory_002dmappedI_002fO.html
Whataresharedpages?
MultiplePageTableEntriescanmapthesamephysicalpage.Accessthrough
thesevirtualaddresses(oftenindifferentprocesses)allshowthesame
contents,andchangestoitareimmediatelyvisibletoallusers.(Shared
writablemappingsareacommonhighperformanceinterprocesscommunication
mechanism.)
ThenumberofPTEsmappingeachphysicalpageistracked,andwhenthe
referencecountfallstozerothepageismovedtothefreememorylist.
Whatisananonymousmapping?
Anonymousmemoryisamemorymappingwithnofileordevicebackingit.
Thisishowprogramsallocatememoryfromtheoperatingsystemforuse
bythingslikethestackandheap.
Initially,ananonymousmappingonlyallocatesvirtualmemory.Thenew
mappingstartswitharedundantcopyonwritemappingofthezeropage.
(Thezeropageisasinglepageofphysicalmemoryfilledwithzeroes,
maintainedbytheoperatingsystem.)Everyvirtualpageoftheanonymous
mappingisattachedtothisexistingprezeroedpage,soattemptstoread
fromanywhereinthemappingreturnzeroedmemoryeventhoughnonew
physicalmemoryhasbeenallocatedtoityet.
Attemptstowritetothepagetriggerthenormalcopyonwritemechanismin
thepagefaulthandler,allocatingfreshmemoryonlywhenneededtoallow
thewritetoproceed.(Note,prezeroingoptimizationschangethe
implementationdetailshere,butthetheory'sthesame.)Thus"dirtying"
anonymouspagesallocatesphysicalmemory,theactualallocationcallonly
allocatesvirtualmemory.

Dirtyanonymouspagescanbewrittentoswapspace,butintheabsenceof
swaptheyremain"pinned"inphysicalmemory.
AnonymousmappingsmaybecreatedbypassingtheMAP_ANONYMOUSflagto
mmap().
Whatisafilebackedmapping?
Filebackedmappingsmirrorthecontentsofanexistingfile.Themapping
hassomeadministrativedatanotingwhichfiletomapfrom,andatwhich
offset,aswellaspermissionbitsindicatingwhetherthepagesmayberead,
written,orexecuted.
Whenpagefaultsattachnewphysicalpagestosuchamapping,thecontentsof
thosepagesisinitializedbyreadingthecontentsofthefilebeingmapped,
attheappropriateoffsetforthatpage.
Thesephysicalpagesareusuallysharedwiththepagecache,thekernel's
diskcacheoffilecontents.Thekernelcachesthecontentsoffiles
whenthepageisread,sosharingthosecachepageswiththeprocessreduces
thetotalnumberofphysicalpagesrequiredbythesystem.
WritestofilemappingscreatedwiththeMAP_SHAREDflagupdatethepage
cachepages,makingtheupdatedfilecontentsimmediatelyvisibletoother
processesusingthefile,andeventuallythecachepageswillbeflushed
todiskupdatingtheondiskcopyofthefile.
WritestofilemappingscreatedwiththeMAP_PRIVATEflagperforma
copyonwrite,allocatinganewlocalcopyofthepagetostorethe
changes.Thesechangesarenotmadevisibletootherprocesses,anddo
notupdatetheondiskcopyofthefile.
NotethatthismeanswritestoMAP_SHAREDpagesdonotallocateadditonal
physicalpages(thepagewasalreadyfaultedintothepagecachebythe
read,andthedatacanbeflushedbacktothefileifthephysicalpage
isneededelsewhere),butwritestoMAP_PRIVATEpagesdo(thecopyin
thepagecacheandthelocalcopytheprogramneedsdiverge,sotwopages
areneededtostorethem,andflushingthepagecachecopybacktodisk
won'tfreeupthelocalcopyofthechangedcontents).
Whatisthepagecache?
Thepagecacheisthekernel'cacheoffilecontents.It'sthemain
userofvirtualmemorythatdoesn'tbelongtoaspecificprocess.

See"Whatisafilebackedmapping"and"Whatisfreememory"formoreinfo.
WhatisCPUcache?
TheCPUcacheisaverysmallamountofveryfastmemorybuiltintoa
processor,containingtemporarycopiesofdatatoreduceprocessinglatency.
TheL1cacheisatinyamountofmemory(generallybetween1kand64k)
wireddirectlyintotheprocessorthatcanbeaccessedinasingleclock
cycle.TheL2cacheisalargeramountofmemory(uptoseveral
megabytes)adjacenttotheprocessor,whichcanbeaccessedinasmall
numberofclockcycles.Accesstouncachedmemory(acrossthememorybus)
cantakedozens,hundreds,oreventhousandsofclockcycles.
(NotethatlatencyistheissueCPUcacheaddresses,notthroughput.The
memorybuscanprovideaconstantstreamofmemory,buttakesawhileto
startdoingso.)
Fordetails,seeUlrichDrepper's"Whateveryprogrammershouldknow
aboutmemory,Part2":
http://lwn.net/Articles/252125/
WhatisaTranslationLookasideBuffer(TLB)?
TheTLBisacachefortheMMU.AllmemoryintheCPU'sL1cachemust
haveanassociatedTLBentry,andinvalidatingaTLBentryflushesthe
associatedcacheline(s).
TheTLBisasmallfixedsizearrayofrecentlyusedpages,whichthe
CPUchecksoneachmemoryaccess.Itlistsafewofthevirtualaddress
rangestowhichphysicalpagesarecurrentlyassigned.
AccessestovirtualaddresseslistedintheTLBgodirectlythroughtothe
associatedphysicalmemory(orcachepages)withoutgeneratingpagefaults
(assumingthepagepermissionsallowthatcategoryofaccess).Accesses
tovirtualaddressesnotlistedintheTLB(a"TLBmiss")triggerapage
tablelookup,whichisperformedeitherbyhardware,orbythepagefault
handler,dependingonprocessortype.
Fordetails,see:
http://en.wikipedia.org/wiki/Translation_lookaside_buffer

1995interviewwithLinusTorvaldsdescribingthei386,PPC,andAlphaTLBs:
http://www.linuxjournal.com/article/36
Whatisapagefaulthandler?
Apagefaulthandlerisaninterruptroutine,calledbytheMemory
ManagementUnitinresponseanattempttoaccessvirtualmemorywhich
didnotimmediatelysucceed.
Whenaprogramattemptstoread,write,orexecutememoryinapagethat
hasn'tgottheappropriatepermissionbitssetinitspagetableentry
toallowthattypeofaccess,theinstructiongeneratesaninterrupt.This
callsthepagefaulthandlertoexaminestheregistersandpagetablesof
theinterruptedprocessanddeterminewhatactiontotaketohandle
thefault.
Thepagefaulthandlermayrespondtoapagefaultinthreeways:
1)Thepagefaulthandlercanresolvethefaultbyimmediatelyattachinga
pageofphysicalmemorytotheappropriatepagetableentry,adjusting
theentry,andresumingtheinterruptedinstruction.Thisiscalleda
"softfault".
2)Whenthefaulthandlercan'timmediatelyresolvethefault,itmay
suspendtheinterruptedprocessandswitchtoanotherwhilethesystem
workstoresolvetheissue.Thisiscalleda"hardfault",andresults
whenanI/Ooperationmostbeperformedtopreparethephysicalpage
neededtoresolvethefault.
3)Ifthepagefaulthandlercan'tresolvethefault,itsendsasignal
(SIGSEGV)totheprocess,informingitofthefailure.Althoughaprocess
caninstallaSIGSEGVhandler(debuggersandemulatorstendtodothis),
thedefaultbehaviorofanunhandledSIGSEGVistokilltheprocesswith
themessage"buserror".
Apagefaultthatoccursinaninterrupthandleriscalleda"doublefault",
andusuallypanicsthekernel.Thedoublefaulthandlercallsthekernel's
panic()functiontoprinterrormessagestohelpdiagnosetheproblem.
Theprocesstobekilledforaccessingmemoryitshouldn'tisthe
kernelitself,andthusthesystemistooconfusedtocontinue.
http://en.wikipedia.org/wiki/Double_fault

Atriplefaultcannotbehandledinsoftware.Ifapagefaultoccurrsin
thedoublefaulthandler,themachineimmediatelyreboots.
http://en.wikipedia.org/wiki/Triple_fault
Howdoesthepagefaulthandlerallocatephysicalmemory?
TheLinuxkerneluseslazy(ondemand)allocationofphysicalpages,
deferringtheallocationuntilnecessaryandavoidingallocatingphysical
pageswhichwillneveractuallybeused.
Memorymappingsgenerallystartoutwithnophysicalpagesattached.They
definevirtualaddressrangeswithoutanyassociatedphysicalmemory.So
malloc()andsimilarallocatespace,buttheactualmemoryisallocated
laterbythepagefaulthandler.
Virtualpageswithnoassociatedphysicalpagewillhavetheread,
write,andexecutebitsdisabledintheirpagetableentries.Thiscauses
anyaccesstothataddresstogenerateapagefault,interruptingthe
programandcallingthepagefaulthandler.
Whenthepagefaulthandlerneedstoallocatephysicalmemorytohandle
apagefault,itzeroesafreephysicalpage(orgrabsapagefromapool
ofprezeroedpages),attachesthatpageofmemorytothePageTableEntry
associatedwiththefault,updatesthatPTEtoallowtheappropriateaccess,
andresumesthefaultinginstruction.
Notethatimplementingthisrequirestwosetsofpagetableflagsfor
read,write,execute,andshare.TheVM_READ,VM_WRITE,VM_EXEC,and
VM_SHAREDflags(inlinux/mm.h)areusedbytheMMUtogeneratefaults.
TheVM_MAYREAD,VM_MAYWRITE,VM_MAYEXEC,andVM_MAYSHAREflagsareusedby
thepagefaulthandlertodeterminewhethertheattemptedaccesswaslegal
andthusthefaulthandlershouldadjustthePTEtoresolvethefaultand
allowtheprocesstocontinue.
Howdoesforkwork?
Thefork()systemcallcreatesanewprocessbycopyinganexisting
process.Anewprocessiscreatedwithacopyofthepagetables
thatcallsfork().Thesepagetablesareallcopyonwritemappings
sharingtheexistingphysicalpagesbetweenparentandchild.
Howdoesexecwork?

Theexec()systemcallexecutesanewfileinthecurrentprocess
context.Itblankstheprocess'scurrentpagetable,discardingall
existingmappings,andreplacesthemwithafreshpagetablecontaining
asmallnumberofnewmappings,includinganexecutablemmap()ofthenew
filepassedtotheexec()call,asmallamountofadministrativespace
containingtheenvironmentvariablesandcommandlineargumentspassed
intothenewprogram,anewprocessstack,andsoon.
ThenormalwaytolaunchanewprocessinUnixlikesystemsistocall
fork(),followedimmediatelybyacalltoexec()inthenewprocess.
Thusfork()copiestheparentprocess'sexistingprocess'smemorymappings
intothenewprocess,thenexec()immediatelydiscardsthemagain.
Becausetheseweresharedmappings,thefork()allocatesalotofvirtual
spacebutconsumesveryfewnewphysicalpages.
Howdosharedlibrarieswork?
Onlystaticallylinkedexecutablesareexecuteddirectly.Sharedlibraries
areexecutedbythedynamiclinker(eitherldlinux.so.2orlduClibc.so.0),
whichdespitethenameisastaticallylinkedexecutablethatworksabit
like#!/bin/shor#!usr/bin/perlinshellscripts.It'sthebinarythat's
launchedtorunthisprogram,andthepathtothisprogramisfedtoitas
itsfirstargument.
Thedynamiclinkermmap()stheexecutablefiles,andanysharedlibraries
itneeds,usingtheMAP_PRIVATEflag.Thisallowsittowritetothose
pagestoperformthedynamiclinkingfixupsallowingtheexecutable'scalls
outtothesharedlibrarycodetoconnect.(Itcallsmprotect()tosetthe
pagesreadonlybeforehandingcontrolovertothelinkedexecutable.)
Thedynamiclinkertracesthroughvariouslistsofcallsintheprogram's
ELFtables,looksuptheappropriatefunctionpointerforeachone,and
writesthatpointertothecallsiteinthememorymapping.
Pagesthedynamiclinkerwritestoareessentiallyconvertedtoanonymous
pagesbybreakingthecopyonwrite.Thesenew"dirtied"pagesallocate
physicalmemoryvisibleonlytothisprocess.Thuskeepingtoaminimum
thenumberofdirtiedpagesallowstheresttoremainshared,andthussaves
memory.
Sharedlibrariesarenormallycompiledwiththefpicflag(Position
IndependentCode),whichcreatesanobjecttablecontainingallthe
referencestoexternalcallstodataandfunctions.Insteadofreaching
outandtouchingthesharedobjectsdirectly,thecodebouncesoffthis

table.Thismakesthecodeslightlylargerbyinsertinganextrajump
orextraload,buttheadvantageisthatalltheexternalreferences
modifiedbythelinkeraregroupedtogetherintoasmallnumberofpages.
Sothebinaryisslightlylarger,butmoreofthepagesaresharedsince
thenumberofphysicalpagesdirtiedbythedynamiclinkerissmallereach
timethesharedobjectisused.
Normallyonlysharedlibrariesarecompiledthisway,butsomeprograms
(suchasbusybox)whichthesystemexpectstorunmanyinstancesofmay
alsobenefitfromtheincreasedsharingmorethantheysufferfromthe
increasedsize.
Notethatstaticallylinkedprogramshavenofixupsappliedtothem,and
thusnoprivateexecutablepages.Everypageoftheirexecutablemapping
remainsshared.Theyalsospawnfaster,sincethere'snodynamiclinker
performingfixups.Thusinsomecircumstances,staticlinkingisactually
moreefficientthandynamiclinking.(Strangebuttrue.)
Howdoescopyonwritework?
Iftwopagetableentriespointtothesamephysicalpage,thecontents
ofthatpageshowupinbothlocationswhenread.Areferencecounter
associatedwiththepagetrackshowmanypagetableentriespointtothat
page.Eachofthosepagetableentrieshasreadpermission(butnot
writepermissions)forthepage.
Attempstowritetothepagegenerateapagefault.Thepagefaulthandler
allocatesanewphysicalpage,copiesthecontentsofthesharedpageinto
thenewpage,attachesthenewpagetothefaultingpagetableentry,sets
theupdatedPTEwriteable,decrementsthecountontheoldsharedpage
(possiblyremovingitssharedstatusifthereferencecountfallsto1),
andresumesthefaultingprocessallowingthewritetogothroughtothe
newnonsharedcopyofthepage.
Thisisaformoflazyallocation,deferringmemoryallocationuntil
thenewmemoryisactuallyused.
Whatarecleanpages?
Cleanpageshavecopiesoftheirdatastoredelsewhere,suchasinswap
spaceorinafile.Thusthephysicalmemorystoringthatinformationmay
bereclaimedanreusedelsewherebydetachingthephysicalpagefromthe
associatedPageTableEntry.Whenthepage'scontentsareneededagain,

anewphysicalpagemaybeallocatedanditscontentsreadfromthestored
copy.
Whatareactivepages?
Activepagesarepagetableentriesthathaveassociatedphysicalpages
whichhavebeenusedrecently.
Thesystemcantrackactivepagesbyremovingtheread,write,andexecute
bitsfrompagetableentriesbutleavingtheassociatedphysicalpage
stillattached,thentakingasoftfaultthenexttimethatpageis
accessed.Thefaulthandlercancheaplyswitchtheappropriateaccess
bitbackonandresumethefaultinginstruction,therebyrecordingthat
thepageiscurrentlyinuse.
Pageswhichareactivearepoorcandidatesforpagestealing,evenif
theyareclean,becausetheprocessusingthemwillquicklyfaultanew
physicalpagebackinagainifthecurrentoneisreclaimed.
Notethatreadingalotoffilesystemdatamarkscachepagesasactive,
sincethey'rerecentlyused.Thisnotonlycausesthepagecacheto
allocatelotsofphysicalpages,butpreventsthosepagesfrombeing
reclaimedsincetheywereusedmorerecentlythanotherpagesinthe
system.
Whatarefreepages?
Whenapage'sreferencecountgoestozero,thepageisaddedtothefree
listinsidethekernel.Thesefreepagesarenotcurrentlyusedforany
purpose,andareessentiallywasteduntilsomeuseisfoundforthem.
Afreshlybootedsystemstartswithlotsoffreepages.Freepagesalso
occurwhenprocessesexit(),orwhenmunmap()discardsamappingandthe
privatepagesassociatedwithit.
Thekerneltriestominimizethenumberoffreepagesinthesystem.
Instead,ittriestofindusesforthesepageswhichimproveperformanceof
existingprocesses,butwhichleavesthemeasilyreclaimableifthememory
isneededforanotherpurpose.
Forexample,thepagecachekeepsaroundcopiesoffilecontentslong
aftertheywerelastaccessed,potentiallyavoidinganexpensivedisk
accessifthatfileisreadagaininfuture.Ifthosepagesareneeded
forotherpurposes,thecachedcontentscaneasilybediscardedandthe

pagesreallocated.
Whatispagestealing?
Pagestealingisaresponsetoashortageoffreepages,by"stealing"
existingallocatedphysicalpagesfromtheircurrentusers.It'sa
statisticalmethodofeffectivelyobtainingextraphysicalpagesby
identifyingexistingallocationsunlikelytobeusedagaininthe
nearfutureandrecyclingthem.
Pagestealingremovesexistingphysicalpagesfromtheirmappings,disposes
oftheircurrentcontents(oftenbywritingthemtodisk),andreusesthe
memoryelsewhere.Iftheoriginaluserneedstheirpageback,anew
physicalpageisallocatedandtheoldcontentsloadedintothenewpage.
Pagestealinglooksforinactivepages,sinceactiveoneswouldprobably
justbefaultedbackinagainimmediately.Cleaninactivepagesarealmost
asgoodasfreepages,becausetheircurrentcontentsarealreadycopied
somewhereelseandcanbediscardedwithoutevenperforminganyI/O.
DirtypagesarecleanedbyschedulingI/Otowritethemtobackingstore
(swapforanonymouspages,tothemappedfileforsharedfilebacked
mappings).
Pagestealingattemptstodeterminewhichexistingphysicalpagesareleast
likelytobeneededagainsoon,meaningitstryingtopredictthefuture
actionsoftheprocessesusingthosepages.Itdoessothroughvarious
heuristics,whichcanneverbeperfect.
Whatisa"workingset"ofpages?
Aworkingsetisthesetofmemorychunksrequiredtocompletean
operation.Forexample,theCPUattemptstokeepthesetofcache
linesrequiredfortightinnerloopsinL1cacheuntiltheloopcompletes.
Itattemptstokeepthesetoffrequentlyusedfunctionsfromvarious
partsofaprogram(includingsharedlibraries)intheL2cache.Itdoes
sobothbyprefetchingcachelinesitpredictsitmayneedsoon,andby
makingdecisionsaboutwhichcachelinestodiscardandwhichtokeepwhen
makingspacetoloadnewcachelines.
Thepagefaulthandlerattemptstokeepeachcurrentlyrunningprocess's
workingsetofpagesinphysicalmemoryuntiltheprocessblocksawaiting
inputorexits.Unusedportionsofprogramcodemayneverevenbeloaded
onagivenprogramrun(suchasan"options"menuforaprogramthatisn't
currentlybeingconfigured,orportionsofgenericsharedlibrarieswhich

thisprogramdoesn'tactuallyuse).
Theworkingsetisdetermineddynamicallyatruntime,andcanchange
overtimeasaprogramdoesdifferentthings.
Theobjectiveofpagestealingistokeepthe"workingset"ofpagesin
fastphysicalmemory,allowingprocessesto"racetoquiescence"where
thesystemcompletesitscurrenttasksquicklyandsettlesdowninto
anidlestatewaitingforthenextthingtodo.Fromthispointofview,
physicalmemorycanbeseenasacachebothforswappagesandfor
executablesinthefilesystem.Thetaskofkeepingtheworkingsetin
physicalmemory(andavoidingpagefaultsthattriggerI/O)isanalogous
totheCPU'staskofkeepingtheappropriatecontentsinL1andL2caches.
Whatisthrashing?
Inlowmemorysituations,eachnewallocationinvolvesstealinganinuse
pagefromelsewhere,savingitscurrentcontents,andloadingnewcontents.
Whenthatpageisagainreferenced,anotherpagemustbestolentoreplace
it,savingthenewcontentsandreloadingtheoldcontents.
Itessentiallymeansthattheworkingsetrequiredtoservicethemain
loopsoftheprogramsthesystemisrunningarelargerthanavailable
physicalmemory,eitherbecausephysicalmemoryistiedupdoingsomething
elseorbecausetheworkingsetisjustthatbig.
ThiscanleadtoastatewheretheCPUgeneratesaconstantstreamof
pagefaults,andspendsmostofitstimesittingidle,waitingforI/Oto
servicethosepagefaults.
Thisisoftencalled"swapthrashing",andinsomewaysistheresultof
afailureofthesystem'sswapfile.
Iftheswapfileistoosmall(orentirelyabsent),thesystemcanonly
stealpagesfromfilebackedmappings.Sinceeveryexecutableprogram
andsharedlibraryisafilebackedmapping,thismeansthesystemyanks
executablepages,whichisgenerallyfaultsbackinfairlyrapidlysince
theytendtogetusedalot.Thiscanquicklyleadtothrasing.
Theotherwaytoencourageswapthrashingisbyhavingtoolargeofaswap
file,sothatprogramsthatqueryavailablememoryseehugeamountsofswap
spaceandtrytouseit.Thesystem'savailablephysicalmemoryandI/O
bandwidthdon'tchangewiththesizeoftheswapfile,soattemptstouse
anysignificantportionofthatswapspaceresultmemoryaccessesoccuringat

diskI/Ospeed(fourordersofmagnitudeslowerthanmainmemory,stretching
each1/10thofasecondouttoabouttwominutes).
WhatistheOutOfMemory(OOM)killer?
Ifthesystemevertrulyranoutofphysicalmemory,itcouldreachastate
whereeveryprocessiswaitingforsomeotherprocesstoreleaseapage
beforeitcouldcontinue.Thisdeadlocksituationwouldfreezethesystem.
Beforethishappened,thesystemwouldstartthrashing,whereitwould
slowitselftoacrawlbyspendingallitstimeconstantlystealingpages
onlytostealthembackagainimmediately.Thissituationisalmostas
badastruedeadlock,slowingresponsetimetouselesslevels(fiveorten
minutelatencyonnormallyinstantaneousresponsesisnotunusualduring
swapthrashing;thisisassumingyouroperationdoesnottimeoutinstead).
Asystemthatentersswapthrashingmaytakehourstorecover(assuming
abacklogofdemandsdoesnotemergeasitfailstoservicethem,preventing
itfromeverrecovering).Oritcantakejustaslongtoproceedto
atruedeadlock(wherethefloodofswapI/OstopsbecausetheCPUispegged
at100%searchingforthenextpagetosteal,neverfindingone,andthus
stopsschedulingnewI/O).
Toavoideithersituation,LinuxintroducedtheOOMkiller.Whenit
detectsthesystemhasenteredswapthrashing,itheuristicallydetermines
aprocesstokilltofreeuppages.Itcanalsobeconfiguredtoreboot
theentiresysteminsteadofselectingaspecificprocesstokill.
TheOOMkiller'sprocesskillingcapabilityisareasonablewaytodealwith
runawayprocessesand"forkbombs",butintheabsenceofaclearly
malfunctioningprocessthatistruly"atfault",killinganyprocessis
oftenunacceptable.
NotethattheOOMkillerdoesn'twaitforatruememoryexhaustionto
deadlockthesystem,bothbecausethesystemiseffectivelydownwhile
thrashing,andbecauseaparalyzedsystemmightnotbeabletoruneven
theOOMkiller.
TheOOMkiller'sprocesskillingheuristicsareareasonablewaytodealwith
runawayprocessesand"forkbombs",butintheabsenceofaclearly
malfunctioningprocessthatistruly"atfault",killinganyprocessis
oftenunacceptable.Developersoftenargueaboutthechoiceofprocesses
tokill,andexactlywhenthethrashingisbadenoughtotriggertheOOM
killerandwhentoallowthesystemtoattempttoworkitswaythrough

torecovery.Bothoftheseheuristicsarebytheirnatureimperfect,
becausetheyattempttopredictthefuture.
Ingeneral,developerstrytoavoidtriggeringtheOOMkiller,andtreat
itsoccurrenceastheuserspaceequivalentofakernelpanic().Thesystem
gotintoanuntenablestate,itmightbegoodtofindoutwhyandprevent
itsrecurrence.
Whyis"strictovercommit"adumbidea?
Peoplewhodon'tunderstandhowvirtualmemoryworksofteninsiston
trackingtherelationshipbetweenvirtualandphysicalmemory,andattempting
toenforcesomecorrespondencebetweenthem(whenthereisn'tone),instead
ofcontrollingtheirprograms'behavior.
Manycommonunixprogrammingidiomscreatelargevirtualmemory
rangeswiththepotentialtoconsumealotofphysicalmemory,but
neverrealizethatpotential.Linuxallowsthesystemto"overcommit"
memory,creatingmemorymappingsthatpromisemorephysicalmemorythan
thesystemcouldactuallydeliver.
Forexample,thefork/execcombocreatestransientvirtualmemoryusage
spikes,whichgoawayagainalmostimmediatelywithouteverbreakingthe
copyonwritestatusofmostofthepagesintheforkedpagetables.Thus
ifalargeprocessforksoffasmallerprocess,enormousphysicalmemory
demandsthreatentohappen(asfarasovercommitisconcerned),butnever
materialize.
Dynamiclinkingraisessimilarissues:thedynamiclinkermapsexecutable
filesandsharedlibrariesMAP_PRIVATE,whichallowsittowritetothose
pagestoperformthedynamiclinkingfixupsallowingthesharedlibrary
callstoconnecttothesharedlibrary.Intheory,therecouldbea
calltofunctionsordatainasharedlibrarywithineverypageofthe
executable(andthustheentiremappingcouldbeconvertedtoanonymous
memorybythecopyonwriteactionsofthedynamiclinker).Andsince
sharedlibrariescancallothersharedlibraries,thosecouldrequire
privatephysicalmemoryfortheirentiremappingtoo.
Inreality,thatdoesn'thappen.Itwouldbeincrediblyinefficientand
defeatthepurposeofusingsharedlibrariesinthefirstplace.Most
sharedlibrariesarecompiledasPositionIndependentCode(PIC),and
someexecutablesarePositionIndependentExecutalbles(PIE),which
groups

Peoplewhodon'tunderstandhowvirtualmemoryworksofteninsiston
trackingtherelationshipbetweenvirtualandphysicalmemory,andattempting
toenforcesomecorrespondencebetweenthem(whenthereisn'tone).Instead
ofdesigningtheirprogramswithpredictableandcontrollablebehavior,they
trytomaketheoperatingsystempredictthefuture.Thisdoesn'twork.
Strictovercommitencouragestheadditionofgigabytes(eventerabytes)
ofswapspace,whichleadstoswapthrasingifitiseverused.Systems
withlargeamountsofswapspacecanliterallythrashfordays,duringwhich
theyperformlesscomputationthantheycouldperforminasingleminute
duringnonthrashingoperation(suchasiftheOOMkillertriggereda
reboot).
Whatistheappealofhugepages?
TheCPUhasalimitednumberofTLBentries.AhugepageallowsasingleTLB
entrytotranslateaddressesforalargeamountofcontiguousphysical
memory.(Forexample,theentireLinuxkernelfitswithinasingle2
megabyte"hugepage"onx86systems.Keepingtheentirekernelina
singleTLBentrymeansthatcallingintothekerneldoesn'tflushthe
userspacemappingsoutoftheTLB.)
UsinghugepagesmeanstheMMUspendslesstimewalkingpagetablestorefill
theTLB,andcanleadtoabouta10%performanceincreaseifused
correctly.
Whatarememoryzonesand"highmemory"?
http://kerneltrap.org/node/2450
http://linuxmm.org/HighMemory
http://www.cs.columbia.edu/~smb/classes/s064118/l19.pdf
http://book.opensourceproject.org.cn/kernel/kernelpri/opensource/0131181637/ch04lev1sec2.html

Das könnte Ihnen auch gefallen