Beruflich Dokumente
Kultur Dokumente
• Objectives:
Sufficient Statistics
Dimensionality
Complexity
Overfitting
• Resources:
DHS – Chap. 3 (Part 2)
Rice – Sufficient Statistics
Ellem – Sufficient Statistics
TAMU – Dimensionality
• URL: .../publications/courses/ece_8443/lectures/current/lecture_14.ppt
14: SUFFICIENT STATISTICS
DEFINITION
• Direct computation of p(D|) and p(|D) for large data sets
is challenging (e.g. neural networks)
• We need a parametric form for p(x|) (e.g., Gaussian)
• Gaussian case: computation of the sample mean and
covariance, which was straightforward, contained all the
information relevant to estimating the unknown population
mean and covariance.
• This property exists for other distributions.
• A sufficient statistic is a function s of the samples D that
contains all the information relevant to a parameter, .
• A statistic, s, is said to be sufficient for if p(D|s,) is
independent of :
p( D | s , ) p( | s )
p( | s , D ) p( | s )
p( D | s )
14: SUFFICIENT STATISTICS
FACTORIZATION THEOREM
~ g( s , )
g( s, )
g( s , )d
14: SUFFICIENT STATISTICS
GAUSSIAN DISTRIBUTION
n 1 1 t 1
p( D | ) exp[ ( x k ) ( x k )]
d 2 12
k 1( 2 ) 2
1 1 n t 1 t 1 1
exp[ 2 x k x t
k xk ]
d 2 12
( 2 ) 2 k 1
t 1
n t 1 n
exp[ x k ]
2 k 1
1 1 n t 1
exp[ x k x k ]
d 2 12
( 2 ) 2 k 1
• This isolates the dependence in the first term, and
hence, the sample mean is a sufficient statistic.
• The kernel is: ~ 1 1 1 1
g(
ˆ n , ) exp[ (
ˆ n )t (
ˆ n) ]
1
12 2 n
( 2 )d 2
n
14: SUFFICIENT STATISTICS
EXPONENTIAL FAMILY
• This can be generalized:
p( x | ) x exp[ a( ) b( )t c( x )]
and: n n
p( D | ) exp[ na( ) b( ) c( x k ) ] x k g( s , )h( D )
t
k 1 k 1
• Examples:
14: PROBLEMS OF DIMENSIONALITY
DIRECTIONS OF DISCRIMINATION
• If features are statistically independent, in theory we can
get excellent performance.
• Recall the Bayes error rate for a two-class multivariate
normal problem: 1 u2 2
p( e ) e du
2 r 2
where r2 is the Mahalanobis distance:
r 2 ( 1 2 )t 1 ( 1 2 )
• For conditionally independent features:
2
d i1 i 2
r
2
i 1 i
Most useful features are those for which the difference of
the means is large w.r.t. the standard deviation.
14: PROBLEMS OF DIMENSIONALITY
COMPUTATIONAL COMPLEXITY
• “Big Oh” notation used to describe complexity:
if f(x) = 2+3x+4x2, f(x) has computational complexity O(x2)
• Recall:
1 t ˆ 1 d 1 ˆ
g( x ) ( x
ˆ ) (x
ˆ ) ln( 2 ) ln ln P ( )
2 2 2
O( dn ) O( nd 2 ) O( 1 ) O( d 2 n ) O ( n )