Beruflich Dokumente
Kultur Dokumente
2 / 46
Fish Sorting Example Revisited
3 / 46
Prior Probabilities
P (w1 ) + P (w2 ) = 1
4 / 46
Making a Decision
5 / 46
Class-Conditional Probabilities
6 / 46
Class-Conditional Probabilities
7 / 46
Posterior Probabilities
p(x|wj )P (wj )
P (wj |x) =
p(x)
P2
where p(x) = j=1 p(x|wj )P (wj ).
8 / 46
Making a Decision
I p(x|wj ) is called the likelihood and p(x) is called the
evidence.
I How can we make a decision after observing the value of x?
w if P (w |x) > P (w |x)
1 1 2
Decide
w2 otherwise
11 / 46
Bayesian Decision Theory
12 / 46
Bayesian Decision Theory
p(x|wj )P (wj )
P (wj |x) =
p(x)
Pc
where p(x) = j=1 p(x|wj )P (wj ).
13 / 46
Conditional Risk
14 / 46
Minimum-Risk Classification
15 / 46
Two-Category Classification
I Define
I α1 : deciding w1 ,
I α2 : deciding w2 ,
I λij = λ(αi |wj ).
I Conditional risks can be written as
16 / 46
Two-Category Classification
17 / 46
Minimum-Error-Rate Classification
18 / 46
Minimum-Error-Rate Classification
I Define the zero-one loss function
0 if i = j
λ(αi |wj ) = i, j = 1, . . . , c
1 if i 6= j
= 1 − P (wi |x).
19 / 46
Minimum-Error-Rate Classification
I The resulting error is called the Bayes error and is the best
performance that can be achieved.
20 / 46
Minimum-Error-Rate Classification
21 / 46
Discriminant Functions
I A useful way of representing classifiers is through
discriminant functions gi (x), i = 1, . . . , c, where the classifier
assigns a feature vector x to class wi if
23 / 46
The Gaussian Density
I Gaussian can be considered as a model where the feature
vectors for a given class are continuous-valued, randomly
corrupted versions of a single typical or prototype vector.
I Some properties of the Gaussian:
I Analytically tractable.
I Completely specified by the 1st and 2nd moments.
I Has the maximum entropy of all distributions with a given
mean and variance.
I Many processes are asymptotically Gaussian (Central Limit
Theorem).
I Linear transformations of a Gaussian are also Gaussian.
I Uncorrelatedness implies independence.
24 / 46
Univariate Gaussian
I For x ∈ R:
p(x) = N (µ, σ 2 )
" 2 #
1 1 x−µ
=√ exp −
2πσ 2 σ
where
Z ∞
µ = E[x] = x p(x) dx,
−∞
Z ∞
2 2
σ = E[(x − µ) ] = (x − µ)2 p(x) dx.
−∞
25 / 46
Univariate Gaussian
26 / 46
Multivariate Gaussian
I For x ∈ Rd :
p(x) = N (µ, Σ)
1 1 T −1
= exp − (x − µ) Σ (x − µ)
(2π)d/2 |Σ|1/2 2
where
Z
µ = E[x] = x p(x) dx,
Z
T
Σ = E[(x − µ)(x − µ) ] = (x − µ)(x − µ)T p(x) dx.
27 / 46
Multivariate Gaussian
28 / 46
Linear Transformations
Aw = ΦΛ−1/2
where
I Φ is the matrix whose columns are the orthonormal
eigenvectors of Σ,
I Λ is the diagonal matrix of the corresponding eigenvalues,
gives a covariance matrix equal to the identity matrix I.
29 / 46
Discriminant Functions for the Gaussian
Density
1 d 1
gi (x) = − (x − µi )T Σ−1
i (x − µi ) − ln2π − ln|Σi | + lnP (wi ).
2 2 2
30 / 46
Case 1: Σi = σ 2I
where
1
wi = µ
σ2 i
1
wi0 = − 2 µTi µi + ln P (wi )
2σ
(wi0 is the threshold or bias for the i’th category).
31 / 46
Case 1: Σi = σ 2I
w = µi − µj
1 σ2 P (wi )
x0 = (µi + µj ) − ln (µ − µj ).
2 kµi − µj k 2 P (wj ) i
32 / 46
Case 1: Σi = σ 2I
34 / 46
Case 2: Σi = Σ
where
wi = Σ−1 µi
1
wi0 = − µTi Σ−1 µi + ln P (wi ).
2
35 / 46
Case 2: Σi = Σ
wT (x − x0 ) = 0
where
w = Σ−1 (µi − µj )
1 ln(P (wi )/P (wj ))
x0 = (µi + µj ) − (µi − µj ).
2 (µi − µj )T Σ−1 (µi − µj )
36 / 46
Case 2: Σi = Σ
37 / 46
Case 3: Σi = arbitrary
I Discriminant functions are
where
1
Wi = − Σ−1
2 i
wi = Σi−1 µi
1 1
wi0 = − µTi Σ−1
i µi − ln |Σi | + ln P (wi ).
2 2
39 / 46
Case 3: Σi = arbitrary
40 / 46
Error Probabilities and Integrals
P (error ) = P (x ∈ R2 , w1 ) + P (x ∈ R1 , w2 )
= P (x ∈ R2 |w1 )P (w1 ) + P (x ∈ R1 |w2 )P (w2 )
Z Z
= p(x|w1 ) P (w1 ) dx + p(x|w2 ) P (w2 ) dx.
R2 R1
41 / 46
Error Probabilities and Integrals
P (error ) = 1 − P (correct)
Xc
=1− P (x ∈ Ri , wi )
i=1
c
X
=1− P (x ∈ Ri |wi )P (wi )
i=1
c Z
X
=1− p(x|wi ) P (wi ) dx.
i=1 Ri
42 / 46
Error Probabilities and Integrals
Figure 9: Components of the probability of error for equal priors and the
non-optimal decision point x∗ . The optimal point xB minimizes the total
shaded area and gives the Bayes error rate.
43 / 46
Receiver Operating Characteristics
I Consider the two-category case and define
I w1 : target is present,
I w2 : target is not present.
Assigned
w1 w2
w1 correct detection mis-detection
True
w2 false alarm correct rejection
45 / 46
Summary
46 / 46