Beruflich Dokumente
Kultur Dokumente
4 / 35
Introduction PCA algorithm Extensions Multidimensional PCA Conclusion
PCA
What is PCA ?
I Most common answer would be ’an algorithm for
dimensionality reduction’
I Yes, but :
I Where does the algorithm comes from ?
I What’s the underlying model ?
I PCA is actually many different things (models)
I latent variable model (Hotelling, 1930s)
I variance maximization directions (Pearson, 1901)
I optimal linear reconstruction (Kosambi-Karhunen-Loève
transform in signal processing)
I It just turns out that these different models lead to the same
algorithm (in the linear Gaussian case)
5 / 35
Introduction PCA algorithm Extensions Multidimensional PCA Conclusion
PCA
What is PCA ?
Goal of PCA
The main goal of PCA is to express a complex data set into a new
set a basis vectors that ’best’ explain the data
6 / 35
PCA algorithm
Introduction PCA algorithm Extensions Multidimensional PCA Conclusion
P
Given a set of set of N data samples xi 2 Rd such that i xi = 0
PCA algorithm 1 PN T
1. Compute the sample covariance matrix C = i =1 xi xi
N
Note that C is a d ⇥ d matrix. P
d
Given
2. a set of
Compute set of N data samples
eigen-decomposition of C :xC
i 2=RU⇤such
UT that i xi = 0
U is an orthogonal d ⇥ d matrix and ⇤ is a diagonal 1 matrix.
PN T
1. Compute the sample covariance matrix C = i =1 xi xi
3. Since, C is symmetric, its eigenvectors u1 , u2 , . . . ,Nud form a
Note
basis ofthat
Rd . C is a d ⇥ d matrix.
2. Compute eigen-decomposition
I The eigenvectors of called
u1 , u2 , . . . , ud are = U⇤UT
C : C principal
components
U is an orthogonal d ⇥ d matrix and ⇤ is a diagonal matrix.
I The corresponding eigenvalues
1 > 2 > · · · > d give the
3. Since, C is
importance
symmetric, its eigenvectors u1 , u2 , . . . , ud form a
d of each principal axis.
basis of R .
I The eigenvectors u , u , . . . , u are called principal
1 2 d
components 8 / 35
8 / 35
Introduction
PCA PCA
algorithm
algorithm Extensions Multidimensional PCA Conclusion
P
Given a set of set of NPCA samples xi 2 Rd such that
dataalgorithm i xi = 0
1 PN T
1. Compute the sample covariance matrix C = i =1 xi xi
N
NoteThe
thatPCA
C isalgorithm is pretty simple
a d ⇥ d matrix.
P
2. Compute I First, center the data (if of
eigen-decomposition it isCnot) UT0
xi⇤=
: C =i U
U is anIorthogonal d ⇥ dthe
Then, compute matrix
sample andcovariance
⇤ is a diagonal
matrix matrix.
and its
3. Since, C eigenvectors
is symmetric, its eigenvectors u1 , u2 , . . . , ud form a
d
basis ofI RFinally,
. each sample point xi can be represented in the new
I The eigenvectors u , u , . . . , u are called principal
basis (projection
1 onto
2 thed eigenspace) as
components
I The corresponding eigenvalues yi1= > U2T x>i · · · > d give the
importance of each principal axis.
I We claim that the new representation makes the data
un-correlated, i.e. Cov (yi , yj ) = 0 if i , j.
8 / 35
9 / 35
PCA algorithm
We claim that the new representation makes the data un-correlated
Why ?
The sample covariance of the transformed data is
N N
1X T 1X T
Cnew = yi yi = (U xi )(UT xi )T
N N
i =1 i =1
N
0 N
1
1X T T T B
BBB 1 X
T T
CCC
= U xi xi U = U BB@ xi U xi CCCA U
N N
i =1 i =1
= UT CU = UT (U⇤UT )U
= ⇤
Hence, when projected onto the principal components, the data is
decorreletad.
10 / 35
Introduction PCA algorithm Extensions Multidimensional PCA Conclusion
PCA algorithm
Dimensionality reduction
I We usually want to represent our data in a lower dimensional
space Rk , with k ⌧ d.
I We achieve this by projecting onto the k principal axes which
preserve most of the variance in the data
I From the previous analysis, we see that those axes
correspond to the eigenvectors associated with the k largest
eigenvalues
2 3 2 3
666 | | | 777 666 | | | 777
6 7 6 7
U = 6666u1 u2 . . . ud 7777 ) Uk = 6666u1 u2 . . . uk 7777
4 5 4 5
| | | d ⇥d | | | d ⇥k
Dual PCA
I Let X be the d ⇥ N data matrix X = [x1 , x2 , . . . , xN ], xi 2 Rd
1 T
I The sample covariance can be computed as C = N XX
I If N ⌧ d, then it is better to work with C0 = N1 XT X
I C0 is an N ⇥ N matrix
I Let C0 = U0 ⇤0 U0T be the eigen-decomposition of C0
I We have ⇤ = ⇤0 , i.e. eigenvalues of C and C0 are equal
I We have u = Xu0 , for all i
i i
I Working with C0 is computationally less expensive if N ⌧ d.
12 / 35
PCA algorithm
13 / 35
Introduction
Introduction PCA
PCAalgorithm
algorithm Extensions
Extensions Multidimensional PCA
Multidimensional PCAConclusion Conclusion
14 / 35
Algebraic Interpretation
• Given m points in a n dimensional space, for large n, how does
one project on to a low dimensional space while preserving
broad trends in the data and allowing it to be visualized?
Algebraic Interpretation – 1D
• Given m points in a n dimensional space, for large n, how does
one project on to a low dimensional space while preserving
broad trends in the data and allowing it to be visualized?
• Choose a line that fits the data so the points are spread out well
along the line
Algebraic Interpretation – 1D
Line P P P… P Point 1 L
t t t … t Point 2 i
1 2 3… m Point 3 n
: e
Point m
xT B BT x
Algebraic Interpretation – 1D
• Rewriting this:
xTBBTx = e = e xTx = xT (ex)
<=> xT (BBTx – ex) = 0
• So, find the largest e and associated x such that the matrix BBT
when applied to x yields a new vector which is in the same direction
as x, only scaled by a factor e.
Algebraic Interpretation – 1D
(BBT)x
ex=(BBT)x
x
5
2nd Principal
Component, y2 1st Principal
Component, y1
4
2
4.0 4.5 5.0 5.5 6.0
PCA Scores
xi2 Yi,2
4 Yi,1
2
4.0 4.5 5.0 xi1 5.5 6.0
PCA Eigenvalues
5
λ1 λ2
2
4.0 4.5 5.0 5.5 6.0
LDA
Recognition pipeline
Hand-designed features
Many highly qualified researchers have spent years to design those
features
SIFT, SURF, HOG, LBP, BRIEF, DAISY, ORB, ...
some are class/problem specific
Main objectives
An introduction to an active research area in computer vision and
pattern recognition
An overview of the main ideas and principles
Some examples of applications
Expected outcomes
To know a bit more about sparse coding
To think about how to use it in your own works
To get ideas of extensions and/or applications domains
Retinal images classification
Feature extraction
Sampling strategy
Keypoints detection
Detect a set of keypoints (Harris, SIFT, etc)
Extract local descriptors around each keypoint
BoW representation
Sampling strategy
Dense sampling
Divide image into local patches
Extract local features from each patch
BoW representation
Clustering/Quantization
For each image Ii we extract a set of low level descriptors and
represent them as a feature matrix Xi :
2 3
666 | | | 777
666 1 2 i7
Xi = 66fi fi . . . fN
i 7
777 ,
4 5
| | |
where f1i , . . . , fN
i
i
are the Ni descriptors extracted from Ii .
We then put together all descriptors from all training images to form a
big training matrix X :
h i
X = X1 . . . XN .
PN
X is a matrix of size d ⇥ M, with M = i =1 Ni and d the dimension of
the descriptor.
Désiré Sidibé (Le2i) Le2i - Lab Seminar 25/06/2015 14 / 36
BoW representation
Clustering/Quantization
To simplify the notation, we will just write the set of descriptors from
the training images as
2 3
666 | | | 777
6 7
X = 6666f1 f2 . . . fM 7777 .
4 5
| | |
M
X
min min kfm dk k2 ,
D k =1...K
m =1
Clustering/Quantization
The optimization problem
M
X
min min kfm dk k2 ,
D k =1...K
m =1
K-means
1 Initialize the K centers (randomly)
2 Assign each data point to one of the K centers
3 Update the centers
4 Iterate until convergence
BoW representation
Clustering/Quantization
K-means algorithm results in a set of K cluster centers which form the
dictionary
2 3
666 | | | 777
666 7
D = 66d1 d2 . . . dK 7777
4 5
| | | d ⇥K
BoW representation
Features coding
Given the dictionary D
Given a set of low-level features Xi from image Ii
2 3
666 | | | 77
666 1 2 Ni 7 7
Xi = 66fi fi . . . fi 7777
4 5
| | |
Features coding
Encode each local descriptor fli using the dictionary D
BoW representation
Features pooling
xi = pooling(A)
b
BoW representation
Features pooling
1 PNi
4 Finally, compute the final representation of Ii : b
xi = l 1
al
Ni =
Summary