Chap3 BayesianDecisionTheory

Lecture
Slides for
ETHEM ALPAYDIN
© The MIT Press, 2010

In preparation of these slides, I have benefited from slides alpaydin@boun.edu.tr
prepared by:
h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
E. Alpaydin (Intro. to Machine Learning),
D. Bouchaffra and V. Murino (PaGern Classification and Scene
Analysis),
R. Gutierrez-Osuna (Texas A&M)
A. Moore (CMU)
Probability and Inference
  Result of tossing a coin is ∈ {Heads,Tails}
  Random var X ∈{1,0}
Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)
  Sample: X = {xt }Nt =1
EsPmaPon: po = # {Heads}/#{Tosses} = ∑t xt / N
  PredicPon of next toss:
Heads if po > ½, Tails otherwise
Lecture Notes for E Alpaydın 2010 IntroducPon to Machine Learning 2e © The MIT Press (V1.0) 3
Game
  You record the following tosses:
  {H, T, T, T, H, T, T, T, T, H, H, T, H, T, T, H, H, T, T, H?}
  You win if you get the next toss right.
  What do you guess?
  You win 10TL and lose 5TL if you guess the next toss right.
  How do you compute your earnings?
  What do you guess?
  Based on maximizing your earnings?
ClassificaPon
  Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
  Input: x = [x1,x2]T ,Output: C {0,1}
  PredicPon:
⎧C = 1 if P(C = 1|x1 ,x 2 ) > 0.5
choose ⎨
⎩C = 0 otherwise
or
⎧C = 1 if P(C = 1|x1 ,x 2 ) > P(C = 0 |x1 ,x 2 )
choose ⎨
⎩C = 0 otherwise
Bayes’ Rule prior likelihood
posterior
P(C ) p(x|C )
P(C |x ) =
p(x )
evidence
P (C = 0) + P (C = 1) = 1
p(x ) = p(x|C = 1)P (C = 1) + p(x|C = 0)P (C = 0)
p(C = 0| x ) + P (C = 1| x ) = 1
Game
P(x|
hamsi)
short tall
white 0.6 0.1 You caught a tall and
gray 0.2 0.1 white fish.

P(x|lufer) Is it hamsi or lufer?
short tall
white 0.05 0.2
gray 0.05 0.7
Lecture Notes for E Alpaydın 2010 Introduction to
Machine Learning 2e © The MIT Press (V1.0) 7
Bayes’ Rule: K>2 Classes
p(x|C i )P (C i )
P (C i | x ) =
p(x )
p(x|C i )P (C i )
= K
∑ p(x|C k )P(C k )
k =1
K
P (C i ) ≥ 0 and ∑ P (C i ) = 1
i =1
choose C i if P (C i |x ) = max k P (C k |x )
Losses and Risks
  AcPons: αi
  Loss of αi when the state is Ck : λik
e.g. cancer prediction
Predicted Predicted Predicted Predicted

0 1 0 1
Actual 0 0 1 Actual 0 0 1
Actual1 1 0 Actual1 100 0
Losses and Risks
  AcPons: αi
  Loss of αi when the state is Ck : λik
  Expected risk (Duda and Hart, 1973)
K
R(α i |x ) = ∑ λik P (C k |x )
k =1
choose α i if R(α i |x ) = min k R(α k |x )
Losses and Risks: 0/1 Loss
⎧0 if i = k
λik = ⎨
⎩1 if i ≠ k
K
R (α i | x ) = ∑ λik P (C k | x )
k =1
= ∑ P (C k | x )
k ≠i
= 1 − P (C i | x )
For minimum risk, choose the most probable class
Losses and Risks: Reject
⎧0 if i = k
⎪
λik = ⎨λ if i = K + 1 , 0 < λ < 1
⎪1 otherwise
⎩
K
R (α K+1 |x ) = ∑ λ P (Ck |x ) = λ
k=1
R (α i |x ) = ∑ P (Ck |x ) =1− P (Ci |x )

k≠i
choose C i if P (C i |x ) > P (C k |x ) ∀k ≠ i and P (C i |x ) > 1 − λ

reject otherwise
Discriminant FuncPons
choose C i if gi (x ) = max k gk (x ) gi (x ), i = 1,…, K
⎧− R(α i |x )
⎪
gi (x ) = ⎨P (C i |x )
⎪p(x |C )P (C )
⎩ i i
K decision regions R1,...,RK
Ri = {x|gi (x ) = max k gk (x )}
K=2 Classes
  Dichotomizer (K=2) vs Polychotomizer (K>2)
  g(x) = g1(x) – g2(x)
⎧C1 if g(x ) > 0
choose ⎨
⎩C 2 otherwise
  Log odds: log P (C1 | x )

P (C 2 | x )
UPlity Theory
  Prob of state k given exidence x: P (Sk|x)
  UPlity of αi when state is k: Uik
  Expected uPlity:
EU (α i |x ) = ∑ Uik P (Sk |x )
k
Choose αi if EU (α i |x ) = max EU (α j |x )
j
  This is equivalent to minimizing the risk R (αi |x )

  Based on the specific problem, other funcPons might be
opPmized (e.g. Minimize worst possible loss, maximize
money earned...)
AssociaPon Rules
  AssociaPon rule: X → Y
  People who buy/click/visit/enjoy X are also likely to buy/
click/visit/enjoy Y.
  A rule implies associaPon, not necessarily causaPon.
AssociaPon measures
  Support (X → Y):
# {customers who bought X and Y }
P(X ,Y ) =
# {customers }

  Confidence (X → Y):
P ( X ,Y )
P (Y |X ) =
P( X )
# {customers who bought X and Y }
  Lis (X → Y): =
# {customers who bought X }
P(X ,Y ) P(Y | X )
= =
P( X )P(Y ) P(Y )
Apriori algorithm (Agrawal et al., 1996)
  For (X,Y,Z), a 3-item set, to be frequent (have enough
support), (X,Y), (X,Z), and (Y,Z) should be frequent.
  If (X,Y) is not frequent, none of its supersets can be
frequent.
  Once we find the frequent k-item sets, we convert them
to rules: X, Y → Z, ...
and X → Y, Z, ...

See also the FP-Growth Algorithm:
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent pawerns without candidate generaPon.
In SIGMOD, 2000

Chap3 BayesianDecisionTheory

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chap3 BayesianDecisionTheory

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture

e.g. cancer prediction

Predicted Predicted Predicted Predicted

choose α i if R(α i |x ) = min k R(α k |x )

For minimum risk, choose the most probable class

R (α i |x ) = ∑ P (Ck |x ) =1− P (Ci |x )

choose C i if P (C i |x ) > P (C k |x ) ∀k ≠ i and P (C i |x ) > 1 − λ

K decision regions R1,...,RK

Log odds: log P (C1 | x )

This is equivalent to minimizing the risk R (αi |x )

Das könnte Ihnen auch gefallen

Chap3 BayesianDecisionTheory

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chap3 BayesianDecisionTheory

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture

e.g. cancer prediction

Predicted Predicted Predicted Predicted

choose α i if R(α i |x ) = min k R(α k |x )

For minimum risk, choose the most probable class

R (α i |x ) = ∑ P (Ck |x ) =1− P (Ci |x )

choose C i if P (C i |x ) > P (C k |x ) ∀k ≠ i and P (C i |x ) > 1 − λ

K decision regions R1,...,RK

 Log odds: log P (C1 | x )

 This is equivalent to minimizing the risk R (αi |x )

Das könnte Ihnen auch gefallen

  Log odds: log P (C1 | x )

  This is equivalent to minimizing the risk R (αi |x )