Sie sind auf Seite 1von 18

Lecture

Slides for

ETHEM ALPAYDIN
© The MIT Press, 2010

In preparation of these slides, I have benefited from slides alpaydin@boun.edu.tr
prepared by:
h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
E. Alpaydin (Intro. to Machine Learning),
D. Bouchaffra and V. Murino (PaGern Classification and Scene
Analysis),
R. Gutierrez-Osuna (Texas A&M)
A. Moore (CMU)
Probability and Inference
—  Result of tossing a coin is ∈ {Heads,Tails}
—  Random var X ∈{1,0}
Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)
—  Sample: X = {xt }Nt =1
EsPmaPon: po = # {Heads}/#{Tosses} = ∑t xt / N
—  PredicPon of next toss:
Heads if po > ½, Tails otherwise

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 3
Game
—  You record the following tosses:
—  {H, T, T, T, H, T, T, T, T, H, H, T, H, T, T, H, H, T, T, H?}
—  You win if you get the next toss right.
—  What do you guess?

—  You win 10TL and lose 5TL if you guess the next toss right.
—  How do you compute your earnings?
—  What do you guess?
—  Based on maximizing your earnings?

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 4
ClassificaPon
—  Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
—  Input: x = [x1,x2]T ,Output: C {0,1}
—  PredicPon:
⎧C = 1 if P(C = 1|x1 ,x 2 ) > 0.5
choose ⎨
⎩C = 0 otherwise
or
⎧C = 1 if P(C = 1|x1 ,x 2 ) > P(C = 0 |x1 ,x 2 )
choose ⎨
⎩C = 0 otherwise

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 5
Bayes’ Rule prior likelihood
posterior
P(C ) p(x|C )
P(C |x ) =
p(x )

evidence

P (C = 0) + P (C = 1) = 1
p(x ) = p(x|C = 1)P (C = 1) + p(x|C = 0)P (C = 0)
p(C = 0| x ) + P (C = 1| x ) = 1

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 6
Game

P(x|
hamsi)
short tall
white 0.6 0.1 You caught a tall and
gray 0.2 0.1 white fish.

P(x|lufer) Is it hamsi or lufer?
short tall
white 0.05 0.2
gray 0.05 0.7
Lecture Notes for E Alpaydın 2010 Introduction to
Machine Learning 2e © The MIT Press (V1.0) 7
Bayes’ Rule: K>2 Classes
p(x|C i )P (C i )
P (C i | x ) =
p(x )
p(x|C i )P (C i )
= K
∑ p(x|C k )P(C k )
k =1
K
P (C i ) ≥ 0 and ∑ P (C i ) = 1
i =1

choose C i if P (C i |x ) = max k P (C k |x )

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 8
Losses and Risks
—  AcPons: αi
—  Loss of αi when the state is Ck : λik

e.g. cancer prediction

Predicted Predicted Predicted Predicted


0 1 0 1
Actual 0 0 1 Actual 0 0 1
Actual1 1 0 Actual1 100 0

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 9
Losses and Risks
—  AcPons: αi
—  Loss of αi when the state is Ck : λik
—  Expected risk (Duda and Hart, 1973)
K
R(α i |x ) = ∑ λik P (C k |x )
k =1

choose α i if R(α i |x ) = min k R(α k |x )

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 10
Losses and Risks: 0/1 Loss
⎧0 if i = k
λik = ⎨
⎩1 if i ≠ k
K
R (α i | x ) = ∑ λik P (C k | x )
k =1

= ∑ P (C k | x )
k ≠i

= 1 − P (C i | x )

For minimum risk, choose the most probable class

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 11
Losses and Risks: Reject
⎧0 if i = k

λik = ⎨λ if i = K + 1 , 0 < λ < 1
⎪1 otherwise

K
R (α K+1 |x ) = ∑ λ P (Ck |x ) = λ
k=1

R (α i |x ) = ∑ P (Ck |x ) =1− P (Ci |x )


k≠i

choose C i if P (C i |x ) > P (C k |x ) ∀k ≠ i and P (C i |x ) > 1 − λ


reject otherwise

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 12
Discriminant FuncPons
choose C i if gi (x ) = max k gk (x ) gi (x ), i = 1,…, K

⎧− R(α i |x )

gi (x ) = ⎨P (C i |x )
⎪p(x |C )P (C )
⎩ i i

K decision regions R1,...,RK

Ri = {x|gi (x ) = max k gk (x )}

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 13
K=2 Classes
—  Dichotomizer (K=2) vs Polychotomizer (K>2)
—  g(x) = g1(x) – g2(x)
⎧C1 if g(x ) > 0
choose ⎨
⎩C 2 otherwise

—  Log odds: log P (C1 | x )


P (C 2 | x )

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 14
UPlity Theory
—  Prob of state k given exidence x: P (Sk|x)
—  UPlity of αi when state is k: Uik
—  Expected uPlity:
EU (α i |x ) = ∑ Uik P (Sk |x )
k

Choose αi if EU (α i |x ) = max EU (α j |x )
j

—  This is equivalent to minimizing the risk R (αi |x )


—  Based on the specific problem, other funcPons might be
opPmized (e.g. Minimize worst possible loss, maximize
money earned...)
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 15
AssociaPon Rules
—  AssociaPon rule: X → Y
—  People who buy/click/visit/enjoy X are also likely to buy/
click/visit/enjoy Y.
—  A rule implies associaPon, not necessarily causaPon.

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 16
AssociaPon measures
—  Support (X → Y):
# {customers who bought X and Y }
P(X ,Y ) =
# {customers }

—  Confidence (X → Y):
P ( X ,Y )
P (Y |X ) =
P( X )
# {customers who bought X and Y }
—  Lis (X → Y): =
# {customers who bought X }
P(X ,Y ) P(Y | X )
= =
P( X )P(Y ) P(Y )
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 17
Apriori algorithm (Agrawal et al., 1996)
—  For (X,Y,Z), a 3-item set, to be frequent (have enough
support), (X,Y), (X,Z), and (Y,Z) should be frequent.
—  If (X,Y) is not frequent, none of its supersets can be
frequent.
—  Once we find the frequent k-item sets, we convert them
to rules: X, Y → Z, ...
and X → Y, Z, ...

See also the FP-Growth Algorithm:
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent pawerns without candidate generaPon.
In SIGMOD, 2000

Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 18

Das könnte Ihnen auch gefallen