Sie sind auf Seite 1von 55

Chapter13:QueryProcessing

Aug10,2006

DatabaseSystemConcepts,5thEd.
Silberschatz,KorthandSudarshan Seewww.dbbook.comforconditionsonreuse

Chapter13:QueryProcessing
s Overview s MeasuresofQueryCost s SelectionOperation s Sorting s JoinOperation s OtherOperations s EvaluationofExpressions

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

BasicStepsinQueryProcessing
1. Parsingandtranslation 2. Optimization 3. Evaluation

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

BasicStepsinQueryProcessing (Cont.)
s Parsingandtranslation
q

translatethequeryintoitsinternalform.Thisisthentranslatedinto relationalalgebra. Parsercheckssyntax,verifiesrelations Thequeryexecutionenginetakesaqueryevaluationplan,executes thatplan,andreturnstheanswerstothequery.

s Evaluation
q

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

BasicStepsinQueryProcessing: Optimization
s Arelationalalgebraexpressionmayhavemanyequivalentexpressions
q

E.g.,balance<2500(balance(account))isequivalentto balance(balance<2500(account))

s Eachrelationalalgebraoperationcanbeevaluatedusingoneofseveral

differentalgorithms
q

Correspondingly,arelationalalgebraexpressioncanbeevaluatedin manyways.

s Annotatedexpressionspecifyingdetailedevaluationstrategyiscalledan

evaluationplan.
q q

E.g.,canuseanindexonbalancetofindaccountswithbalance<2500, orcanperformcompleterelationscananddiscardaccountswith balance2500

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

BasicSteps:Optimization(Cont.)
s QueryOptimization:Amongstallequivalentevaluationplanschoose

theonewithlowestcost.
q

Costisestimatedusingstatisticalinformationfromthe databasecatalog

e.g.numberoftuplesineachrelation,sizeoftuples,etc.

s Inthischapterwestudy
q q q

Howtomeasurequerycosts Algorithmsforevaluatingrelationalalgebraoperations Howtocombinealgorithmsforindividualoperationsinorderto evaluateacompleteexpression Westudyhowtooptimizequeries,thatis,howtofindan evaluationplanwithlowestestimatedcost

s InChapter14
q

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

MeasuresofQueryCost
s Costisgenerallymeasuredastotalelapsedtimeforanswering

query
q

Manyfactorscontributetotimecost

diskaccesses,CPU,orevennetworkcommunication

s Typicallydiskaccessisthepredominantcost,andisalsorelatively

easytoestimate.Measuredbytakingintoaccount
q

Numberofseeks*averageseekcost

+Numberofblocksread*averageblockreadcost +Numberofblockswritten*averageblockwritecost

Costtowriteablockisgreaterthancosttoreadablock dataisreadbackafterbeingwrittentoensurethatthe writewassuccessful

Assumption:singledisk

Canmodifyformulaeformultipledisks/RAIDarrays Orjustusesinglediskformulae,butinterpretthemas measuringresourceconsumptioninsteadoftime


13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

MeasuresofQueryCost(Cont.)
s Forsimplicitywejustusethenumberofblocktransfersfromdiskandthe

numberofseeksasthecostmeasures q tTtimetotransferoneblock
q q

tStimeforoneseek CostforbblocktransfersplusSseeks b*tT+S*tS RealsystemsdotakeCPUcostintoaccount

s WeignoreCPUcostsforsimplicity
q

s Wedonotincludecosttowritingoutputtodiskinourcostformulae s SeveralalgorithmscanreducediskIObyusingextrabufferspace
q

Amountofrealmemoryavailabletobufferdependsonotherconcurrent queriesandOSprocesses,knownonlyduringexecution

Weoftenuseworstcaseestimates,assumingonlytheminimum amountofmemoryneededfortheoperationisavailable

s Requireddatamaybebufferresidentalready,avoidingdiskI/O
q

Buthardtotakeintoaccountforcostestimation
13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

SelectionOperation
s Filescansearchalgorithmsthatlocateandretrieverecordsthat

fulfillaselectioncondition.

s AlgorithmA1(linearsearch).Scaneachfileblockandtestallrecords

toseewhethertheysatisfytheselectioncondition.
q

Costestimate=brblocktransfers+1seek

brdenotesnumberofblockscontainingrecordsfromrelationr
cost=(br/2)blocktransfers+1seek selectionconditionor orderingofrecordsinthefile,or availabilityofindices

Ifselectionisonakeyattribute,canstoponfindingrecord

Linearsearchcanbeappliedregardlessof

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

SelectionOperation(Cont.)
s A2(binarysearch).Applicableifselectionisanequality

comparisonontheattributeonwhichfileisordered.
q q

Assumethattheblocksofarelationarestoredcontiguously Costestimate(numberofdiskblockstobescanned):

costoflocatingthefirsttuplebyabinarysearchonthe blocks log2(br)*(tT+tS) Iftherearemultiplerecordssatisfyingselection Addtransfercostofthenumberofblockscontaining recordsthatsatisfyselectioncondition WillseehowtoestimatethiscostinChapter14

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

SelectionsUsingIndices
s Indexscansearchalgorithmsthatuseanindex
q

selectionconditionmustbeonsearchkeyofindex.

s A3(primaryindexoncandidatekey,equality).Retrieveasinglerecord

thatsatisfiesthecorrespondingequalitycondition q Cost=(hi+1)*(tT+tS)
q

s A4(primaryindexonnonkey,equality)Retrievemultiplerecords.

Recordswillbeonconsecutiveblocks

Letb=numberofblockscontainingmatchingrecords q Cost=hi*(tT+tS)+tS+tT*b
s A5(equalityonsearchkeyofsecondaryindex).
q

Retrieveasinglerecordifthesearchkeyisacandidatekey Cost=(h +1)*(t +t ) i T S Retrievemultiplerecordsifsearchkeyisnotacandidatekey


eachofnmatchingrecordsmaybeonadifferentblock Cost=(hi+n)*(tT+tS) Canbeveryexpensive!


13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

SelectionsInvolvingComparisons
s CanimplementselectionsoftheformAV(r)orAV(r)byusing
q q

alinearfilescanorbinarysearch, orbyusingindicesinthefollowingways:

s A6(primaryindex,comparison).(RelationissortedonA)

ForAV(r)useindextofindfirsttuplevandscanrelation sequentiallyfromthere ForAV(r)justscanrelationsequentiallytillfirsttuple>v;donotuse index ForAV(r)useindextofindfirstindexentryvandscanindex sequentiallyfromthere,tofindpointerstorecords. ForAV(r)justscanleafpagesofindexfindingpointerstorecords, tillfirstentry>v Ineithercase,retrieverecordsthatarepointedto requiresanI/Oforeachrecord Linearfilescanmaybecheaper

s A7(secondaryindex,comparison).

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ImplementationofComplexSelections
s Conjunction:12...n(r) s A8(conjunctiveselectionusingoneindex).
q

SelectacombinationofiandalgorithmsA1throughA7that resultsintheleastcostfori(r). Testotherconditionsontupleafterfetchingitintomemorybuffer. Useappropriatecomposite(multiplekey)indexifavailable. Requiresindiceswithrecordpointers. Usecorrespondingindexforeachcondition,andtakeintersection ofalltheobtainedsetsofrecordpointers. Thenfetchrecordsfromfile Ifsomeconditionsdonothaveappropriateindices,applytestin memory.
13.<number> Silberschatz,KorthandSudarshan

s A9(conjunctiveselectionusingmultiplekeyindex).
q

s A10(conjunctiveselectionbyintersectionofidentifiers).
q q

q q

DatabaseSystemConcepts5thEdition.

AlgorithmsforComplexSelections
s Disjunction:12...n(r). s A11(disjunctiveselectionbyunionofidentifiers).
q

Applicableifallconditionshaveavailableindices.

Otherwiseuselinearscan.

Usecorrespondingindexforeachcondition,andtakeunionofallthe obtainedsetsofrecordpointers. Thenfetchrecordsfromfile Uselinearscanonfile Ifveryfewrecordssatisfy,andanindexisapplicableto

s Negation:(r)
q q

Findsatisfyingrecordsusingindexandfetchfromfile

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Sorting
s Wemaybuildanindexontherelation,andthenusetheindextoread

therelationinsortedorder.Mayleadtoonediskblockaccessfor eachtuple. Forrelationsthatdontfitinmemory,external sortmergeisagoodchoice.

s Forrelationsthatfitinmemory,techniqueslikequicksortcanbeused.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ExternalSortMerge
LetMdenotememorysize(inpages).
1. Createsortedruns.Letibe0initially.

Repeatedlydothefollowingtilltheendoftherelation:
(a)ReadMblocksofrelationintomemory (b)Sorttheinmemoryblocks LetthefinalvalueofibeN (c)WritesorteddatatorunRi;incrementi.

2. Mergetheruns(nextslide)..

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ExternalSortMerge(Cont.)
1. Mergetheruns(Nwaymerge).Weassume(fornow)thatN< M.
1.

UseNblocksofmemorytobufferinputruns,and1block tobufferoutput.Readthefirstblockofeachrunintoits bufferpage repeat


1. 2. 3.

2.

Selectthefirstrecord(insortorder)amongallbuffer pages Writetherecordtotheoutputbuffer.Iftheoutput bufferisfullwriteittodisk. Deletetherecordfromitsinputbufferpage. Ifthebufferpagebecomesemptythen readthenextblock(ifany)oftherunintothebuffer.

3.

untilallinputbufferpagesareempty:
13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

ExternalSortMerge(Cont.)
s IfNM,severalmergepassesarerequired.
q q

Ineachpass,contiguousgroupsofM1runsaremerged. ApassreducesthenumberofrunsbyafactorofM1,and createsrunslongerbythesamefactor.

E.g.IfM=11,andthereare90runs,onepassreduces thenumberofrunsto9,each10timesthesizeofthe initialruns

Repeatedpassesareperformedtillallrunshavebeen mergedintoone.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Example:ExternalSortingUsingSortMerge

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ExternalMergeSort(Cont.)
s Costanalysis:
q q

Totalnumberofmergepassesrequired:logM1(br/M). Blocktransfersforinitialruncreationaswellasineach passis2br

forfinalpass,wedontcountwritecost weignorefinalwritecostforalloperationssincethe outputofanoperationmaybesenttotheparent operationwithoutbeingwrittentodisk

Thustotalnumberofblocktransfersforexternalsorting: br(2logM1(br/M)+1)

Seeks:nextslide

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ExternalMergeSort(Cont.)
s Costofseeks
q

Duringrungeneration:oneseektoreadeachrunandoneseekto writeeachrun

2br/M
Buffersize:bb(read/writebbblocksatatime) Need2br/bbseeksforeachmergepass

Duringthemergephase

exceptthefinalonewhichdoesnotrequireawrite

Totalnumberofseeks: 2br/M+br/bb(2logM1(br/M)1)

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

JoinOperation
s Severaldifferentalgorithmstoimplementjoins
q q q q q

Nestedloopjoin Blocknestedloopjoin Indexednestedloopjoin Mergejoin Hashjoin

s Choicebasedoncostestimate s Examplesusethefollowinginformation
q q

Numberofrecordsofcustomer:10,000depositor:5000 Numberofblocksofcustomer:400depositor:100

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

NestedLoopJoin
s Tocomputethethetajoinrs

foreachtupletrinrdobegin

foreachtupletsinsdobegin

iftheydo,addtrtstotheresult. end end

testpair(tr,ts)toseeiftheysatisfythejoincondition

s riscalledtheouterrelationandstheinnerrelationofthejoin. s Requiresnoindicesandcanbeusedwithanykindofjoincondition. s Expensivesinceitexamineseverypairoftuplesinthetworelations.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

NestedLoopJoin(Cont.)
s

Intheworstcase,ifthereisenoughmemoryonlytoholdoneblockofeach relation,theestimatedcostis nrbs+br blocktransfers,plus nr+br seeks Ifthesmallerrelationfitsentirelyinmemory,usethatastheinnerrelation.


q

Reducescosttobr+bsblocktransfersand2seeks withdepositorasouterrelation:

Assumingworstcasememoryavailabilitycostestimateis
q

5000400+100=2,000,100blocktransfers, 5000+100=5100seeks 10000100+400=1,000,400blocktransfersand10,400seeks

withcustomerastheouterrelation

s s

Ifsmallerrelation(depositor)fitsentirelyinmemory,thecostestimatewillbe500 blocktransfers. Blocknestedloopsalgorithm(nextslide)ispreferable.


13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

BlockNestedLoopJoin
s Variantofnestedloopjoininwhicheveryblockofinnerrelationis

pairedwitheveryblockofouterrelation. foreachblockBrofrdobegin foreachblockBsofsdobegin

foreachtupletrinBrdobegin

foreachtupletsinBsdobegin

iftheydo,addtrtstotheresult. end end end end

Checkif(tr,ts)satisfythejoincondition

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

BlockNestedLoopJoin(Cont.)
s Worstcaseestimate:brbs+brblocktransfers+2*brseeks
q

Eachblockintheinnerrelationsisreadonceforeachblockinthe outerrelation(insteadofonceforeachtupleintheouterrelation

s Bestcase:br+bsblocktransfers+2seeks. s Improvementstonestedloopandblocknestedloopalgorithms:
q

Inblocknestedloop,useM2diskblocksasblockingunitfor outerrelations,whereM=memorysizeinblocks;useremaining twoblockstobufferinnerrelationandoutput

Cost=br/(M2)bs+brblocktransfers+ 2br/(M2)seeks

q q q

Ifequijoinattributeformsakeyoninnerrelation,stopinnerloop onfirstmatch Scaninnerloopforwardandbackwardalternately,tomakeuseof theblocksremaininginbuffer(withLRUreplacement) Useindexoninnerrelationifavailable(nextslide)


13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

IndexedNestedLoopJoin
s Indexlookupscanreplacefilescansif
q q

joinisanequijoinornaturaljoinand anindexisavailableontheinnerrelationsjoinattribute

Canconstructanindexjusttocomputeajoin.

s Foreachtupletrintheouterrelationr,usetheindextolookuptuplesins

thatsatisfythejoinconditionwithtupletr. inr,weperformanindexlookupons.

s Worstcase:bufferhasspaceforonlyonepageofr,and,foreachtuple s Costofthejoin:br(tT+tS)+nrc
q

Wherecisthecostoftraversingindexandfetchingallmatchings tuplesforonetupleorr ccanbeestimatedascostofasingleselectiononsusingthejoin condition.

s Ifindicesareavailableonjoinattributesofbothrands,

usetherelationwithfewertuplesastheouterrelation.
13.<number>

DatabaseSystemConcepts5thEdition.

Silberschatz,KorthandSudarshan

ExampleofNestedLoopJoinCosts
s Computedepositorcustomer,withdepositorastheouterrelation. s LetcustomerhaveaprimaryB+treeindexonthejoinattribute

customername,whichcontains20entriesineachindexnode. moreaccessisneededtofindtheactualdata

s Sincecustomerhas10,000tuples,theheightofthetreeis4,andone s depositorhas5000tuples s Costofblocknestedloopsjoin


q

400*100+100=40,100blocktransfers+2*100=200seeks

assumingworstcasememory maybesignificantlylesswithmorememory

s Costofindexednestedloopsjoin
q q

100+5000*5=25,100blocktransfersandseeks. CPUcostlikelytobelessthanthatforblocknestedloopsjoin

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

MergeJoin
1. 2.

Sortbothrelationsontheirjoinattribute(ifnotalreadysortedonthejoin attributes). Mergethesortedrelationstojointhem


1. 2. 3.

Joinstepissimilartothemergestageofthesortmergealgorithm. Maindifferenceishandlingofduplicatevaluesinjoinattributeevery pairwithsamevalueonjoinattributemustbematched Detailedalgorithminbook

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

MergeJoin(Cont.)
s Canbeusedonlyforequijoinsandnaturaljoins s Eachblockneedstobereadonlyonce(assumingalltuplesforanygiven

valueofthejoinattributesfitinmemory

s Thusthecostofmergejoinis:
q

br+bsblocktransfers+br/bb+bs/bbseeks +thecostofsortingifrelationsareunsorted.

s hybridmergejoin:Ifonerelationissorted,andtheotherhasa

secondaryB+treeindexonthejoinattribute
q q q

MergethesortedrelationwiththeleafentriesoftheB+tree. Sorttheresultontheaddressesoftheunsortedrelationstuples Scantheunsortedrelationinphysicaladdressorderandmergewith previousresult,toreplaceaddressesbytheactualtuples

Sequentialscanmoreefficientthanrandomlookup

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

HashJoin
s Applicableforequijoinsandnaturaljoins. s Ahashfunctionhisusedtopartitiontuplesofbothrelations
q

Intuition:partitionsfitinmemory

s hmapsJoinAttrsvaluesto{0,1,...,n},whereJoinAttrsdenotesthe

commonattributesofrandsusedinthenaturaljoin.
q

r0,r1,...,rndenotepartitionsofrtuples

Eachtupletrrisputinpartitionriwherei=h(tr[JoinAttrs]).

r0,,r1...,rndenotespartitionsofstuples

Eachtupletssisputinpartitionsi,wherei=h(ts[JoinAttrs]).

s Note:Inbook,r isdenotedasHri,s isdenotedasHsiand i i

nisdenotedasnh.
DatabaseSystemConcepts5thEdition. 13.<number> Silberschatz,KorthandSudarshan

HashJoin(Cont.)

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

HashJoin(Cont.)
s rtuplesinrineedonlytobecomparedwithstuplesinsiNeed

notbecomparedwithstuplesinanyotherpartition,since:
q

anrtupleandanstuplethatsatisfythejoinconditionwill havethesamevalueforthejoinattributes. Ifthatvalueishashedtosomevaluei,thertuplehastobein riandthestupleinsi.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

HashJoinAlgorithm
Thehashjoinofrandsiscomputedasfollows.
1. Partitiontherelationsusinghashingfunctionh.
1. 2.

Whenpartitioningarelation,oneblockofmemoryisreservedasthe outputbufferforeachpartition,andoneblockforinput Ifextramemoryisavailable,allocatebbblocksasbufferforinputand eachoutput

2. Partitionrsimilarly. 3.nextslide..

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

HashJoin(Cont.)
HashJoinAlgorithm(cont)
1. Foreachpartitioni:
(a)

Loadsiintomemoryandbuildaninmemoryhashindexonit usingthejoinattribute.
q

Thishashindexusesadifferenthashfunctionthantheearlier oneh. Foreachtupletrprobetheinmemoryhashindextofindall matchingtuplestsinsi q Foreachmatchingtupletsinsi


q

(b)

Readthetuplesinrifromthediskonebyone.
q

outputtheconcatenationoftheattributesoftrandts

Relationsiscalledthebuildinputand riscalledtheprobeinput.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

HashJoinalgorithm(Cont.)
s Thevaluenandthehashfunctionhischosensuchthateachsi

shouldfitinmemory.
q

Typicallynischosenasbs/M*fwherefisafudgefactor, typicallyaround1.2 Theproberelationpartitionssineednotfitinmemory

s Recursivepartitioningrequiredifnumberofpartitionsnisgreater

thannumberofpagesMofmemory.
q q

insteadofpartitioningnways,useM1partitionsfors FurtherpartitiontheM1partitionsusingadifferenthash function Usesamepartitioningmethodonr Rarelyrequired:e.g.,recursivepartitioningnotneededfor relationsof1GBorlesswithmemorysizeof2MB,withblocksize of4KB.


13.<number> Silberschatz,KorthandSudarshan

q q

DatabaseSystemConcepts5thEdition.

HandlingofOverflows
s Partitioningissaidtobeskewedifsomepartitionshavesignificantly

moretuplesthansomeothers Reasonscouldbe
q q

s Hashtableoverflowoccursinpartitionsiifsidoesnotfitinmemory.

Manytuplesinswithsamevalueforjoinattributes Badhashfunction Partitionsiisfurtherpartitionedusingdifferenthashfunction. Partitionrimustbesimilarlypartitioned.

s Overflowresolutioncanbedoneinbuildphase
q q

s Overflowavoidanceperformspartitioningcarefullytoavoidoverflows

duringbuildphase
q

E.g.partitionbuildrelationintomanypartitions,thencombinethem Fallbackoption:useblocknestedloopsjoinonoverflowedpartitions
13.<number> Silberschatz,KorthandSudarshan

s Bothapproachesfailwithlargenumbersofduplicates
q

DatabaseSystemConcepts5thEdition.

CostofHashJoin
s Ifrecursivepartitioningisnotrequired:costofhashjoinis

3(br+bs)+4nhblocktransfers+ 2(br/bb+bs/bb)seeks

s Ifrecursivepartitioningrequired:
q

numberofpassesrequiredforpartitioningbuildrelation sislogM1(bs)1 besttochoosethesmallerrelationasthebuildrelation. Totalcostestimateis: 2(br+bslogM1(bs)1+br+bsblocktransfers+ 2(br/bb+bs/bb)logM1(bs)1seeks

q q

s Iftheentirebuildinputcanbekeptinmainmemorynopartitioningis

required
q

Costestimategoesdowntobr+bs.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ExampleofCostofHashJoin
customerdepositor
s Assumethatmemorysizeis20blocks s bdepositor=100andbcustomer=400. s depositoristobeusedasbuildinput.Partitionitintofivepartitions,each

ofsize20blocks.Thispartitioningcanbedoneinonepass. doneinonepass.
q

s Similarly,partitioncustomerintofivepartitions,eachofsize80.Thisisalso s Thereforetotalcost,ignoringcostofwritingpartiallyfilledblocks:

3(100+400)=1500blocktransfers+ 2(100/3+400/3)=336seeks

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

HybridHashJoin
s Usefulwhenmemorysizedarerelativelylarge,andthebuildinputisbigger

thanmemory.

s Mainfeatureofhybridhashjoin:

Keepthefirstpartitionofthebuildrelationinmemory.
s E.g.Withmemorysizeof25blocks,depositorcanbepartitionedintofive

partitions,eachofsize20blocks.
q

Divisionofmemory:

Thefirstpartitionoccupies20blocksofmemory 1blockisusedforinput,and1blockeachforbufferingtheother4 partitions.

s customerissimilarlypartitionedintofivepartitionseachofsize80
q

thefirstisusedrightawayforprobing,insteadofbeingwrittenout

s Costof3(80+320)+20+80=1300blocktransfersfor

hybridhashjoin,insteadof1500withplainhashjoin. s HybridhashjoinmostusefulifM>> bs
DatabaseSystemConcepts5thEdition. 13.<number>

Silberschatz,KorthandSudarshan

ComplexJoins
s Joinwithaconjunctivecondition:

r12...ns
q q

Eitherusenestedloops/blocknestedloops,or Computetheresultofoneofthesimplerjoinsris

finalresultcomprisesthosetuplesintheintermediateresult thatsatisfytheremainingconditions
1...i1i+1...n

s Joinwithadisjunctivecondition

r12...ns

q q

Eitherusenestedloops/blocknestedloops,or Computeastheunionoftherecordsinindividualjoinsris: (r1s)(r2s)...(rns) (appliesonlytothesetversionofunion!)


13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

OtherOperations
s Duplicateeliminationcanbeimplementedviahashingorsorting.
q

Onsortingduplicateswillcomeadjacenttoeachother,andallbut onesetofduplicatescanbedeleted. Optimization:duplicatescanbedeletedduringrungenerationaswell asatintermediatemergestepsinexternalsortmerge. Hashingissimilarduplicateswillcomeintothesamebucket. performprojectiononeachtuple followedbyduplicateelimination.

s Projection:
q q

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

OtherOperations:Aggregation
s Aggregationcanbeimplementedinamannersimilartoduplicate

elimination.
q

Sortingorhashingcanbeusedtobringtuplesinthesamegroup together,andthentheaggregatefunctionscanbeappliedoneach group. Optimization:combinetuplesinthesamegroupduringrun generationandintermediatemerges,bycomputingpartial aggregatevalues

Forcount,min,max,sum:keepaggregatevaluesontuples foundsofarinthegroup. Whencombiningpartialaggregateforcount,addupthe aggregates

Foravg,keepsumandcount,anddividesumbycountatthe end

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

OtherOperations:SetOperations
s s

Setoperations(,and):caneitherusevariantofmergejoinafter sorting,orvariantofhashjoin. E.g.,Setoperationsusinghashing: 1. Partitionbothrelationsusingthesamehashfunction 2. Processeachpartitioniasfollows. 1. Usingadifferenthashingfunction,buildaninmemoryhashindex onri.


2.

Processsiasfollows q rs: 1. Addtuplesinsitothehashindexiftheyarenotalreadyinit.


2.

q rs: 1. outputtuplesinsitotheresultiftheyarealreadythereinthe hashindex q rs: 1. foreachtupleinsi,ifitisthereinthehashindex,deleteit fromtheindex. 2. Atendofsiaddremainingtuplesinthehashindextothe result.


DatabaseSystemConcepts5thEdition. 13.<number> Silberschatz,KorthandSudarshan

Atendofsiaddthetuplesinthehashindextotheresult.

OtherOperations:OuterJoin
s Outerjoincanbecomputedeitheras
q q

Ajoinfollowedbyadditionofnullpaddednonparticipatingtuples. bymodifyingthejoinalgorithms. Inrs,nonparticipatingtuplesarethoseinrR(rs) Modifymergejointocomputers:Duringmerging,forevery tupletrfromrthatdonotmatchanytupleins,outputtrpaddedwith nulls. Rightouterjoinandfullouterjoincanbecomputedsimilarly. Ifrisproberelation,outputnonmatchingrtuplespaddedwithnulls Ifrisbuildrelation,whenprobingkeeptrackofwhich rtuplesmatchedstuples.

s Modifyingmergejointocomputers
q q

s Modifyinghashjointocomputers
q q

Atendofsioutputnonmatchedrtuplespaddedwithnulls
13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

EvaluationofExpressions
s Sofar:wehaveseenalgorithmsforindividualoperations s Alternativesforevaluatinganentireexpressiontree
q

Materialization:generateresultsofanexpressionwhoseinputs arerelationsorarealreadycomputed,materialize(store)iton disk.Repeat. Pipelining:passontuplestoparentoperationsevenasan operationisbeingexecuted

s Westudyabovealternativesinmoredetail

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Materialization
s Materializedevaluation:evaluateoneoperationatatime,

startingatthelowestlevel.Useintermediateresults materializedintotemporaryrelationstoevaluatenextlevel operations.

s E.g.,infigurebelow,computeandstore

balance<2500 (account )

thencomputethestoreitsjoinwithcustomer,andfinally computetheprojectionsoncustomername.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Materialization(Cont.)
s Materializedevaluationisalwaysapplicable s Costofwritingresultstodiskandreadingthembackcanbequitehigh
q

Ourcostformulasforoperationsignorecostofwritingresultsto disk,so

Overallcost=Sumofcostsofindividualoperations+ costofwritingintermediateresultstodisk

s Doublebuffering:usetwooutputbuffersforeachoperation,whenone

isfullwriteittodiskwhiletheotherisgettingfilled
q

Allowsoverlapofdiskwriteswithcomputationandreduces executiontime

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Pipelining
s Pipelinedevaluation:evaluateseveraloperationssimultaneously,

passingtheresultsofoneoperationontothenext.

s E.g.,inpreviousexpressiontree,dontstoreresultof

balance< 2500 (account )


instead,passtuplesdirectlytothejoin..Similarly,dontstoreresultof join,passtuplesdirectlytoprojection.

s Muchcheaperthanmaterialization:noneedtostoreatemporaryrelation

todisk.

s Pipeliningmaynotalwaysbepossiblee.g.,sort,hashjoin. s Forpipeliningtobeeffective,useevaluationalgorithmsthatgenerate

outputtuplesevenastuplesarereceivedforinputstotheoperation. driven

s Pipelinescanbeexecutedintwoways:demanddrivenandproducer

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Pipelining(Cont.)
s Indemanddrivenorlazyevaluation
q q

systemrepeatedlyrequestsnexttuplefromtopleveloperation Eachoperationrequestsnexttuplefromchildrenoperationsas required,inordertooutputitsnexttuple Inbetweencalls,operationhastomaintainstatesoitknowswhat toreturnnext Operatorsproducetupleseagerlyandpassthemuptotheirparents

s Inproducerdrivenoreagerpipelining
q

Buffermaintainedbetweenoperators,childputstuplesinbuffer, parentremovestuplesfrombuffer ifbufferisfull,childwaitstillthereisspaceinthebuffer,andthen generatesmoretuples

Systemschedulesoperationsthathavespaceinoutputbufferand canprocessmoreinputtuples

s Alternativename:pullandpushmodelsofpipelining
DatabaseSystemConcepts5thEdition. 13.<number> Silberschatz,KorthandSudarshan

Pipelining(Cont.)
s Implementationofdemanddrivenpipelining
q

Eachoperationisimplementedasaniteratorimplementingthe followingoperations

open() E.g.filescan:initializefilescan state:pointertobeginningoffile E.g.mergejoin:sortrelations; state:pointerstobeginningofsortedrelations

next() E.g.forfilescan:Outputnexttuple,andadvanceandstore filepointer E.g.formergejoin:continuewithmergefromearlierstate till nextoutputtupleisfound.Savepointersasiteratorstate.

close()

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

EvaluationAlgorithmsforPipelining
s Somealgorithmsarenotabletooutputresultsevenastheygetinput

tuples
q q

E.g.mergejoin,orhashjoin intermediateresultswrittentodiskandthenreadback

s Algorithmvariantstogenerate(atleastsome)resultsonthefly,asinput

tuplesarereadin
q

E.g.hybridhashjoingeneratesoutputtuplesevenasproberelation tuplesintheinmemorypartition(partition0)arereadin Pipelinedjointechnique:Hybridhashjoin,modifiedtobuffer partition0tuplesofbothrelationsinmemory,readingthemasthey becomeavailable,andoutputresultsofanymatchesbetween partition0tuples

Whenanewr0tupleisfound,matchitwithexistings0tuples, outputmatches,andsaveitinr0 Symmetricallyfors0tuples


13.<number> Silberschatz,KorthandSudarshan

DatabaseSystemConcepts5thEdition.

EndofChapter

DatabaseSystemConcepts,5thEd.
Silberschatz,KorthandSudarshan Seewww.dbbook.comforconditionsonreuse

Figure13.2

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

ComplexJoins
s Joininvolvingthreerelations:loandepositorcustomer s Strategy1.Computedepositorcustomer;useresulttocompute

loan(depositorcustomer) withcustomer.

s Strategy2.Computerloandepositorfirst,andthenjointheresult s Strategy3.Performthepairofjoinsatonce.Buildandindexon

loanforloannumber,andoncustomerforcustomername.
q q

Foreachtupletindepositor,lookupthecorrespondingtuples incustomerandthecorrespondingtuplesinloan. Eachtupleofdepositisexaminedexactlyonce.

s Strategy3combinestwooperationsintoonespecialpurpose

operationthatismoreefficientthanimplementingtwojoinsoftwo relations.

DatabaseSystemConcepts5thEdition.

13.<number>

Silberschatz,KorthandSudarshan

Das könnte Ihnen auch gefallen