Beruflich Dokumente
Kultur Dokumente
James L. Crowley
Contents
Notation .............................................................................2
Source:
"Pattern Recognition and Machine Learning", C. M. Bishop, Springer Verlag, 2006.
1
Notation
x a variable
X a random variable (unpredictable value)
!
→
x A vector of D variables.
!
X A vector of D random variables.
! !
D The number of dimensions for the vector x or X
!
E An observation. An event.
!
k Class index
K Total number of classes ! !
ωk The statement (assertion) that E ∈ Tk
Mk Number of examples for the class k. (think M = Mass)
M Total number of examples.
K
M = " Mk
k =1
k
{X }m A set of Mk examples for the class k.
{X m } = ! {X } k
m
k=1,K
!
{tm} A set of class labels (indicators) for the samples
!
µ = E{X m } The Expected Value, or Average from the M samples.
! "ˆ 2 Estimated Variance
2
"˜ True Variance
! (x–µ)2
1 – 2σ2
!
!
N(x; µ, σ) = 2πσ e Gaussian (Normal) Density function.
2
Expected Values and Moments: ele
M
1
E{X} =
M
"X m
m=1
µ x = E{X} is the first moment (or center of gravity) of the values of {Xm}.
!
This can be seen from the histogram h(x).
! The mass of the histogram is the zeroth moment, M
M = " h(n)
n=1
M
1
µ = E{X m } =
M
"X m
m=1
Thus the center of gravity of the histogram is the expected value of the random
! variable:
M
1 1 N
µ = E{X m } =
M
"X m = " h(n) # n
M n=1
m=1
The second moment is the expected deviation from the first moment:
!
M
1 1 N
" 2 = E{(X – E{X})2 } =
M
# (X m – µ )2 = #
M n=1
h(n) $ (n – µ )2
m=1
! 3
Probability Density Functions
In many cases, the number of possible feature values, N, or the number of features,
D, make a histogram based approach infeasible.
In such cases we can replace h(X) with a probability density function (pdf).
We will use the "likelihood" to determine the parameters for parametric models of
probability density functions. To do this, we first need to define probability density
functions.
!
A probability density function, p( X ), is a function of a continuous variable or vector,
!
X " R D , of random variables such that :
Thus: !
! ! p( X | " k )
p(" k | X ) = ! p(" k )
p( X )
4
The probability for a random variable to fall within a given set is given by the
integral of its density.
B
p(X in [A, B]) = p(X " [ A,B]) = # p(x)dx
A
Probability Density functions are a primary tool for designing recognition machines.
!
There is one more tool we need : Gaussian (Normal) density functions:
Just as with histograms, the expected value is the first moment of a pdf.
For the continuous random variable {Xm} and the pdf P(X).
!
1 M %
E{X} = " X = & p(x)# x dx
M m =1 m $%
Note that
5
M
1
2
"˜ = $ (X – µ )2
M #1 m=1 m
{X m } " N (x; µ, #˜ )
!
Then we compute the moments, we obtain.
!
M
1
µ = E{X m } =
M
"X m
m=1
and
M
1 M
!
2
"ˆ =
M
# (X m – µ )2 Where "˜ 2 =
M #1
"ˆ 2
m=1
!
! The expectation underestimates the variance by 1/M.
Later we will see that the expected RMS error for estimating p(X) from M samples is
related to the bias. But first, we need to examine the Gaussian (or normal) density
function.
6
The Normal (Gaussian) Density Function
x
µ"! µ µ+!
as N →∞ p(X)*N → N(x; µ, σ)
! !
If the features are mapped onto integers from [1, N]: { X m } " {nm } we can build a
! multi-dimensional histogram using a D dimensional table:
! 7
! !
"m = 1, M : h(nm ) # h(nm ) +1
!
As before the average feature vector, µ , is the center of gravity (first moment) of the
! histogram.
M
1 !N N N
1 1 N !
µ d = E{nd } =" ndm = M " "..." h(n1, n2 ,..., nD ) # nd = M "
M !
h(n ) #nd = µ d
m=1 n1 =1 n2 =1 nD =1 n =1
$ 1 N ! '
& " h(n ) #n1 )
& M n! =1 ) $ µ1 '
N & )
! ! ! 1 M ! 1 N ! ! & 1 " h(n! ) #n2 ) & µ2 )
µ = E{n} = " nm = " h(n ) #n = & M ! )=
M m=1 M n! =1 & n =1 ) & ... )
& ... ) &%µ )(
N
1 !
& " h(n ) #n ) D
&M ! D)
% n =1 (
In any case:
! " E{X1 } % " µ1 %
$ ' $ '
! ! $ E{X2 } ' $ µ2 '
µ = E{ X } = =
$ ... ' $ ... '
$ ' $ '
# E{X D }& #µ D &
and gives
!
$ #112 #122 2 '
... #1D
& 2 2 2 )
& # 21 # 22 ... # 2D )
"=
& ... ... ... ... )
& 2 2 2 )
%# D1 # D 2 ... # DD (
8
!
This provides the parameters for
1 ! ! ! !
! ! ! 1 – ( X – µ )T " –1 ( X – µ )
p( X ) = N ( X | µ,") = D 1 e 2
(2#) det(")
2 2
The exponent is positive and quadratic (2nd order). This value is known as the
! "Distance of Mahalanobis".
! ! 2 1 ! ! T –1 ! !
d ( X; µ, C) = – ( X – µ ) " ( X – µ )
2
This is a distance normalized by the covariance. In this case, the covariance is said to
provide the distance metric. This is very useful when the components of X have
different units.
p(X | µ, C)
Contours d'équi-probabilité
x1
In most people height and weight vary together and so σ122 would be positive