Beruflich Dokumente
Kultur Dokumente
GroupMembers
(inorderofpresentation)
[Overview,ID3]
[C4.5]
[PaperSLIQ]
References
y Overview,ID3 y http://en.wikipedia.org/wiki/Decision_Trees y http://www.cise.ufl.edu/~ddd/cap6635/Fall97/Short papers/2.htm y http://www.cis.temple.edu/~ingargio/cis587/readings/i d3c45.html y http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/0 6prop/id3/id3.html y http://www.autonlab.org/tutorials/dtree.html y http://www.autonlab.org/tutorials/infogain.html y http://www.rulequest.com/see5comparison.html
Whatisadecisiontree?
y General y Agraph/modelthathelpsmakedecisions
Whatisadecisiontree?[cont]
y DataMining y Apredictive modelusedtoclassifydata. y Asetofattributesthataretestedtopredictoutcome [class].
y
Howdowecreateone?
y Weneeddata. y CLS/ID3Requirements y Attributevaluedescription
y
y y
Predefinedclasses
y
Discreteclasses
y y y
SufficientTestData
y
PlayBaseball?
Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong Playball No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No
Algorithms
y y y y
sleep
income
forest
BlueisC5.GrayisC4.5
CLS
y C=trainingdata y Step1:Ifall[records]inCarepositive,thencreateYESnode andhalt. y Ifall[records]inCarenegative,createaNOnodeandhalt. y Otherwiseselectan[attribute]withvaluesv1,...,vn andcreatea decisionnode. y Step2:Partitionthetraining[records]inCintosubsetsC1,C2, ...,Cn accordingtothevaluesofV[v1,,vn]. y Step3:applythealgorithmrecursivelytoeachofthesetsCi. y Note,thetrainer(theexpert)decideswhich[attribute]toselect.
ID3
y ExtendedfromCLS y Addsattributeselectionheuristic. y Searchesthroughdataandselectsthebestattribute, theonethatbestseparatesthedata.
y y
ID3AttributeSelection
y InformationGain y Measureshowwellanattributeseparatesdatainto classes. y Selectattributewithhighestinformationgain=most usefulforclassification=mostinformative=good decision y Todefineinfogain,weneedentropy[frominformation theory=quantifyinformation] y Entropy y Measurestheamountofinformationinanattribute
Entropy
y Entropy(S)= p(I)log2p(I) y S=setofsamples y I=valueoftheclassattribute y Example y Shas14samples,9=YES,5=NO y Entropy(S)= p(I)log2p(I) y (p(YES)log2p(YES) +p(NO)log2p(NO)) y ((9/14)log2(9/14)+(5/14)log2(5/14)) y (.642(.637)+.357(1.485))=0.94 y IfShadequaldistribution,7=YES,7=NO y Entropy(S)=1,interpretedastotallyrandom y IfShad,14=YES,0=NO y Entropy(S)=0,interpretedasperfectlyclassified
InformationGain
y Gain(S,A)=InfogainofSduetoattributeA y Gain(S,A)=Entropy(S) Entropy(S,A) y Entropy(S,A)= ((|Sv|/|S|)*Entropy(Sv)) y acrossallpossiblevaluesvofattributeA y Sv =subsetofSforwhichattributeAhasvaluev y |Sv|=numberofelementsinSv y |S|=numberofelementsinS
InformationGainExample
y ExampleusingWind attributefromdataset y WindcanbeWeakorStrong,|Wind|=14 y |Wind=Weak|=8,YES=6,NO=2 y |Wind=Strong|=6,YES=3,NO=3 y Entropy(S,Wind)= ((|Sv|/|S|)*Entropy(Sv)) y =((8/14)Entropy(Sweak)+(6/14)Entropy(Sstrong))
y Entropy(Sweak)=((6/8)log2(6/8)+(2/8)log2(2/8))=0.811 y Entropy(Sstrong)=((3/6)log2(3/6)+(3/6)log2(3/6)=1.0
ID3
UseMajorityvoting
Gain Ratios
y GainratiosareusedtoaddressthebiasoftheID3 algorithm.
GainRatio A D = Gain A SplitInfoA D
FromProfessorAnitaWasilewska's examplesides
=.643(0.64)+(.357)(1.49)=.944 yI(Pi,Ni) = (0/(0+1))Log2*(0/(0+1))(1/(0+1))log2(1/(0+1)) =0(infinite)+(1)(0)=0 yE(rec) =I(Pr1,Nr1) +I(Pr2,Nr2) +...=0 yGain(rec) =.944 0 =.944 ySplitInforec(Root) =14*(1/14*Log2(1/14))= 14*0.271953923=3.807354922 yGainRatiorec(Root) =.944 /3.807354922 =0.248
=.643(0.64)+(.357)(1.49)=.944 yI(P1,N1)= (6/(6+1))*Log2(6/(6+1))(1/(6+1))*Log2(1/(6+q))= .591 yI(P2,N2)=(3/(3+4))*Log2(3/(3+4))(4/(3+4))*Log2(4/(3+4))= .987 yE(Student) =(((6+1)/14)*.591)=.266+ (((3+4)/14)*.987)= .493 = .789 yGain(Student) =.944 .789 =.155 ySplitInfoStudent(Root) =7/14*Log2(7/14) 7/14*Log2(7/14)=1 yGainRatioStudent(Root) =.155 /1 =0.155
=.643(0.64)+(.357)(1.49)=.944 yI(P1,N1)= (2/(2+2))*Log2(2/(2+2))(2/(2+2))*Log2(2/(2+2))= 1 yI(P2,N2)=(4/(4+2))*Log2(4/(4+2))(2/(4+2))*Log2(2/(4+2))= .918 yI(P3,N3)=(3/(3+1))*Log2(3/(3+1))(1/(3+1))*Log2(1/(3+1))= .811 yE(Income) =(((2+2)/14)*1)=.286+ (((4+2)/14)*.918)= .393+ (((3+1)/14)*.811)=.232 = .911 yGain(Income) =.944 .911 =.033 ySplitInfoIncome(Root) =4/14*Log2(4/14) 6/14*Log2(6/14) 4/14*Log2(4/14)=1.557 yGainRatioIncome(Root) =.033 /1.557 =0.0212
C4.5
AnextensionofID3.Thealgorithmisverysimilarwiththefollowingdifferences:
y UsesGainRatio tofindattributetosplitinsteadofjustGain
y Canhandlecontinuousattributes Firstthetableissortedbythecontinuousattribute,A.Athreshold,h,fromtheattributelistis chosensplittingA intoA h,A >h.Theh isusedistheonethatmaximizestheGainRatio. yCanbuildwithtrainingsetswithunknownattributes. Onlyrecordswithdefinedvaluesareconsideredforthegainratio yCanusethetestdatawithunknownattributes. Whenanattributeismissingweestimateitsvaluebytheprobabilityofthevariousresults.
MeanandvarianceforaBernoullitrial:
p,p(1p)
Pr[ z X z ] = c
TakenfromDr.GregoryPiatetskyShapiro'sslides
Pr [ z X z ]= 1 2 Pr [ X z ]
Transforming f
f p p (1 p ) / N
yTransformedvalueforf:
(i.e.subtractthemeananddividebythestandarddeviation)
yResultingequation:
Pr z
f p z=c p 1 p / N
ySolvingforp:
p= f
z2 z 2N
f f2 N N
z2 / 1 2 4N
z2 N
TakenfromDr.GregoryPiatetskyShapiro'sslides
C4.5 methods
yErrorestimateforsubtree isweightedsumoferrorestimatesforallitsleaves yErrorestimateforanode(upperbound):
e= f
z 2N
f f N N
2 2
4N
/ 1
z N
TakenfromDr.GregoryPiatetskyShapiro'sslides
C 4.5 pruning
Color red red red red red red white blue blue blue blue blue blue blue blue blue Class 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1
BasedonnumbersfromJ.R.Quinlan'sslides
C5.0/See5
RossQuinlancreatorofID3andC4.5wentontocreateacommercialimprovementon thisalgorithmscalledC5.0forUnix/LinuxandSee5forWindows
y Speed C5.0isordersofmagnitudefasterthanC4.5 y Memoryusage C5.0ismorememoryefficientthanC4.5 y Smallerdecisiontrees C5.0getssimilarresultstoC4.5withconsiderablysmaller
References
y Quinlan,J.R.C4.5:ProgramsforMachineLearning.MorganKaufmann
Publishers,1993. y J.R.Quinlan.Improveduseofcontinuousattributesinc4.5.Journalof ArtificialIntelligenceResearch,4:7790,1996. y C4.5andBeyond.C4.5andBeyond www.cs.uvm.edu/~xwu/kdd/Slides/C4.5byRossQuinlanforICDM06.pdf, 2006 yWasilewska,Anita:DecisionTreeExamples. www.cs.sunysb.edu/~cse634/examplesdtree.pdf yWorld:C4.5algorithm. http://en.wikipedia.org/wiki/C4.5_algorithm yPiatetskyShapiro,Gregory:MachineLearninginRealWorld:C4.5, http://www.kdnuggets.com/data_mining_course/