Beruflich Dokumente
Kultur Dokumente
Slides for
ETHEM ALPAYDIN
© The MIT Press, 2010
In preparation of these slides, I have benefited from slides alpaydin@boun.edu.tr
prepared by:
h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
E. Alpaydin (Intro. to Machine Learning),
D. Bouchaffra and V. Murino (PaGern Classification and Scene
Analysis),
R. Gutierrez-Osuna (Texas A&M)
A. Moore (CMU)
Probability and Inference
Result of tossing a coin is ∈ {Heads,Tails}
Random var X ∈{1,0}
Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)
Sample: X = {xt }Nt =1
EsPmaPon: po = # {Heads}/#{Tosses} = ∑t xt / N
PredicPon of next toss:
Heads if po > ½, Tails otherwise
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 3
Game
You record the following tosses:
{H, T, T, T, H, T, T, T, T, H, H, T, H, T, T, H, H, T, T, H?}
You win if you get the next toss right.
What do you guess?
You win 10TL and lose 5TL if you guess the next toss right.
How do you compute your earnings?
What do you guess?
Based on maximizing your earnings?
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 4
ClassificaPon
Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
Input: x = [x1,x2]T ,Output: C {0,1}
PredicPon:
⎧C = 1 if P(C = 1|x1 ,x 2 ) > 0.5
choose ⎨
⎩C = 0 otherwise
or
⎧C = 1 if P(C = 1|x1 ,x 2 ) > P(C = 0 |x1 ,x 2 )
choose ⎨
⎩C = 0 otherwise
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 5
Bayes’ Rule prior likelihood
posterior
P(C ) p(x|C )
P(C |x ) =
p(x )
evidence
P (C = 0) + P (C = 1) = 1
p(x ) = p(x|C = 1)P (C = 1) + p(x|C = 0)P (C = 0)
p(C = 0| x ) + P (C = 1| x ) = 1
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 6
Game
P(x|
hamsi)
short tall
white 0.6 0.1 You caught a tall and
gray 0.2 0.1 white fish.
P(x|lufer) Is it hamsi or lufer?
short tall
white 0.05 0.2
gray 0.05 0.7
Lecture Notes for E Alpaydın 2010 Introduction to
Machine Learning 2e © The MIT Press (V1.0) 7
Bayes’ Rule: K>2 Classes
p(x|C i )P (C i )
P (C i | x ) =
p(x )
p(x|C i )P (C i )
= K
∑ p(x|C k )P(C k )
k =1
K
P (C i ) ≥ 0 and ∑ P (C i ) = 1
i =1
choose C i if P (C i |x ) = max k P (C k |x )
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 8
Losses and Risks
AcPons: αi
Loss of αi when the state is Ck : λik
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 9
Losses and Risks
AcPons: αi
Loss of αi when the state is Ck : λik
Expected risk (Duda and Hart, 1973)
K
R(α i |x ) = ∑ λik P (C k |x )
k =1
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 10
Losses and Risks: 0/1 Loss
⎧0 if i = k
λik = ⎨
⎩1 if i ≠ k
K
R (α i | x ) = ∑ λik P (C k | x )
k =1
= ∑ P (C k | x )
k ≠i
= 1 − P (C i | x )
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 11
Losses and Risks: Reject
⎧0 if i = k
⎪
λik = ⎨λ if i = K + 1 , 0 < λ < 1
⎪1 otherwise
⎩
K
R (α K+1 |x ) = ∑ λ P (Ck |x ) = λ
k=1
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 12
Discriminant FuncPons
choose C i if gi (x ) = max k gk (x ) gi (x ), i = 1,…, K
⎧− R(α i |x )
⎪
gi (x ) = ⎨P (C i |x )
⎪p(x |C )P (C )
⎩ i i
Ri = {x|gi (x ) = max k gk (x )}
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 13
K=2 Classes
Dichotomizer (K=2) vs Polychotomizer (K>2)
g(x) = g1(x) – g2(x)
⎧C1 if g(x ) > 0
choose ⎨
⎩C 2 otherwise
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 14
UPlity Theory
Prob of state k given exidence x: P (Sk|x)
UPlity of αi when state is k: Uik
Expected uPlity:
EU (α i |x ) = ∑ Uik P (Sk |x )
k
Choose αi if EU (α i |x ) = max EU (α j |x )
j
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 16
AssociaPon measures
Support (X → Y):
# {customers who bought X and Y }
P(X ,Y ) =
# {customers }
Confidence (X → Y):
P ( X ,Y )
P (Y |X ) =
P( X )
# {customers who bought X and Y }
Lis (X → Y): =
# {customers who bought X }
P(X ,Y ) P(Y | X )
= =
P( X )P(Y ) P(Y )
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 17
Apriori algorithm (Agrawal et al., 1996)
For (X,Y,Z), a 3-item set, to be frequent (have enough
support), (X,Y), (X,Z), and (Y,Z) should be frequent.
If (X,Y) is not frequent, none of its supersets can be
frequent.
Once we find the frequent k-item sets, we convert them
to rules: X, Y → Z, ...
and X → Y, Z, ...
See also the FP-Growth Algorithm:
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent pawerns without candidate generaPon.
In SIGMOD, 2000
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 18