PCA revis-BoW PDF

Dimension Reduction from PCA
(and beyond) to BoW.

Some Slides credits :
Désire Sidibé, Assoc Prof, University of Burgundy
PCA
I PCA (principal component analysis) is one of the most widely

used technique for
I data analysis
I data visualization
I dimensionality reduction
I Applications of PCA include :
I data compression, image processing, pattern recognition, tec.
4 / 35
Introduction PCA algorithm Extensions Multidimensional PCA Conclusion
PCA
What is PCA ?
I Most common answer would be ’an algorithm for
dimensionality reduction’
I Yes, but :
I Where does the algorithm comes from ?
I What’s the underlying model ?
I PCA is actually many different things (models)
I latent variable model (Hotelling, 1930s)
I variance maximization directions (Pearson, 1901)
I optimal linear reconstruction (Kosambi-Karhunen-Loève
transform in signal processing)
I It just turns out that these different models lead to the same
algorithm (in the linear Gaussian case)
5 / 35
PCA
What is PCA ?
Goal of PCA
The main goal of PCA is to express a complex data set into a new
set a basis vectors that ’best’ explain the data
I So, PCA is essentially a change of basis

I We want to find the most meaningful basis to re-express the
data such that
I the new basis reveals hidden structure
I the new basis removes redundancy
I Most of the time, we would like a lower dimensional space.
6 / 35
PCA algorithm
P
Given a set of set of N data samples xi 2 Rd such that i xi = 0
PCA algorithm 1 PN T
1. Compute the sample covariance matrix C = i =1 xi xi
N
Note that C is a d ⇥ d matrix. P
d
Given
2. a set of
Compute set of N data samples
eigen-decomposition of C :xC
i 2=RU⇤such
UT that i xi = 0
U is an orthogonal d ⇥ d matrix and ⇤ is a diagonal 1 matrix.
PN T
3. Since, C is symmetric, its eigenvectors u1 , u2 , . . . ,Nud form a
Note
basis ofthat
Rd . C is a d ⇥ d matrix.
2. Compute eigen-decomposition
I The eigenvectors of called
u1 , u2 , . . . , ud are = U⇤UT
C : C principal
components
U is an orthogonal d ⇥ d matrix and ⇤ is a diagonal matrix.
I The corresponding eigenvalues
1 > 2 > · · · > d give the
3. Since, C is
importance
symmetric, its eigenvectors u1 , u2 , . . . , ud form a
d of each principal axis.
basis of R .
I The eigenvectors u , u , . . . , u are called principal
1 2 d
components 8 / 35
I The corresponding eigenvalues 1 > 2 > ··· > d give the

importance of each principal axis.
8 / 35
Introduction
PCA PCA
algorithm
algorithm Extensions Multidimensional PCA Conclusion
P
Given a set of set of NPCA samples xi 2 Rd such that
dataalgorithm i xi = 0
1 PN T
N
NoteThe
thatPCA
C isalgorithm is pretty simple
a d ⇥ d matrix.
P
2. Compute I First, center the data (if of
eigen-decomposition it isCnot) UT0
xi⇤=
: C =i U
U is anIorthogonal d ⇥ dthe
Then, compute matrix
sample andcovariance
⇤ is a diagonal
matrix matrix.
and its
3. Since, C eigenvectors
is symmetric, its eigenvectors u1 , u2 , . . . , ud form a
d
basis ofI RFinally,
. each sample point xi can be represented in the new
I The eigenvectors u , u , . . . , u are called principal
basis (projection
1 onto
2 thed eigenspace) as
components
I The corresponding eigenvalues yi1= > U2T x>i · · · > d give the
importance of each principal axis.
I We claim that the new representation makes the data
un-correlated, i.e. Cov (yi , yj ) = 0 if i , j.
8 / 35
9 / 35
PCA algorithm
We claim that the new representation makes the data un-correlated
Why ?
The sample covariance of the transformed data is
N N
1X T 1X T
Cnew = yi yi = (U xi )(UT xi )T
N N
i =1 i =1
N
0 N
1
1X T T T B
BBB 1 X
T T
CCC
= U xi xi U = U BB@ xi U xi CCCA U
N N
i =1 i =1
= UT CU = UT (U⇤UT )U
= ⇤
Hence, when projected onto the principal components, the data is
decorreletad.
10 / 35
PCA algorithm
Dimensionality reduction
I We usually want to represent our data in a lower dimensional
space Rk , with k ⌧ d.
I We achieve this by projecting onto the k principal axes which
preserve most of the variance in the data
I From the previous analysis, we see that those axes
correspond to the eigenvectors associated with the k largest
eigenvalues
2 3 2 3
666 | | | 777 666 | | | 777
6 7 6 7
U = 6666u1 u2 . . . ud 7777 ) Uk = 6666u1 u2 . . . uk 7777
4 5 4 5
| | | d ⇥d | | | d ⇥k
I The projected data is then yi = UTk xi , yi 2 Rk .

11 / 35
PCA algorithm
Dual PCA
I Let X be the d ⇥ N data matrix X = [x1 , x2 , . . . , xN ], xi 2 Rd
1 T
I The sample covariance can be computed as C = N XX
I If N ⌧ d, then it is better to work with C0 = N1 XT X
I C0 is an N ⇥ N matrix
I Let C0 = U0 ⇤0 U0T be the eigen-decomposition of C0
I We have ⇤ = ⇤0 , i.e. eigenvalues of C and C0 are equal
I We have u = Xu0 , for all i
i i
I Working with C0 is computationally less expensive if N ⌧ d.
12 / 35
PCA algorithm
Connection with SVD

PCA & SVD
There is a direct link between PCA and SVD
I Let X be the d ⇥ N data matrix X = [x1 , x2 , . . . , xN ]

1 T
I The sample covariance can be computed as C = N XX
I The eigenvectors of C are the principal components
I The SVD of X is given as X = U⌃VT ,
where U is orthogonal d ⇥ d and V is orthogonal N ⇥ N.
I The columns of U are eigenvectors of XXT
I So, the columns of U are the principal components
I The sigular values of X are ordered as the eigenvalues of C,
since 2i = i
13 / 35
Introduction
Introduction PCA
PCAalgorithm
algorithm Extensions
Extensions Multidimensional PCA
Multidimensional PCAConclusion Conclusion
Other facts about PCA

Other facts about PCA
I It can be shown that the principal axes found as described

It can (i.e.
I above be shown that
the matrix U)the
formprincipal axes
the best set found as described
of orthogonal basis
vectors whichthe
above (i.e. minimizes
matrix U)the form
average
thereconstruction error
best set of orthogonal basis
vectors which minimizes the
N average reconstruction error
X 1
U = argmin kxi WT xi kF
W N N
X
i =1
1
U = argmin kxi WT xi kF
W N
For each data point xi , the projection T
I i =1yi = Uk xi is the best
k-dimensional approximation to xi (best in the mean square
For each
I error sense)data point xi , the projection yi = UTk xi is the best
k-dimensional
I The approximation
principal axes to xi (best
are axes of maximum in the mean square
variance
error sense)
I The principal axes are axes of maximum variance 14 / 35
14 / 35
Algebraic Interpretation
•  Given m points in a n dimensional space, for large n, how does
one project on to a low dimensional space while preserving
broad trends in the data and allowing it to be visualized?
Algebraic Interpretation – 1D
•  Given m points in a n dimensional space, for large n, how does
one project on to a low dimensional space while preserving
broad trends in the data and allowing it to be visualized?
•  Choose a line that fits the data so the points are spread out well
along the line
•  Formally, minimize sum of squares of distances to the line.
•  Why sum of squares? Because it allows fast minimization, assuming

the line passes through 0
•  Minimizing sum of squares of distances to the line is the same as

maximizing the sum of squares of the projections on that line, thanks
to Pythagoras.
•  How is the sum of squares of projection lengths expressed in

algebraic terms?
Line P P P… P Point 1 L
t t t … t Point 2 i
1 2 3… m Point 3 n
: e
Point m
xT B BT x
•  How is the sum of squares of projection lengths

expressed in algebraic terms?
max(X ' B.B' X) subject to X ' X = 1

•  Rewriting this:
xTBBTx = e = e xTx = xT (ex)
<=> xT (BBTx – ex) = 0
•  Show that the maximum value of xTBBTx is obtained for x satisfying

BBTx=ex
•  So, find the largest e and associated x such that the matrix BBT
when applied to x yields a new vector which is in the same direction
as x, only scaled by a factor e.
•  (BBB)x points in some other direction in general
(BBT)x
x is an eigenvector and e an eigenvalue if
ex=(BBT)x
x
5
2nd Principal
Component, y2 1st Principal
Component, y1
4
2
4.0 4.5 5.0 5.5 6.0
PCA Scores
xi2 Yi,2
4 Yi,1
2
4.0 4.5 5.0 xi1 5.5 6.0
PCA Eigenvalues
5
λ1 λ2
2
4.0 4.5 5.0 5.5 6.0
LDA
• Recall PCA does not consider class

memberships.
• We need to define some new criteria to

optimize that accounts for class
memberships of our training data.
LDA Summary
1. Compute total mean.
2. Compute class means.
3. Compute within-class and between-class
scatter matrices.
4. Solve generalized eigenvector problem.
5. Project data onto lower-dimensional
subspace.
Object Bag of ‘words’
Features
Many problems and applications in Computer Vision require

extracting "good" features from images/videos :
image matching
object detection/recognition
object tracking
3D reconstruction
image segmentation
classification
etc
We can represent an image in several ways
How to find good representations ?
Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 4 / 58

Features

Features
Recognition pipeline
From H. Lee 2010

Features Extraction
Hand-designed features
Many highly qualified researchers have spent years to design those
features
SIFT, SURF, HOG, LBP, BRIEF, DAISY, ORB, ...
some are class/problem specific
Can we find better representation ?

Can we learn the features from the data directly ?
How ?
This talk
Main objectives
An introduction to an active research area in computer vision and
pattern recognition
An overview of the main ideas and principles
Some examples of applications
Expected outcomes
To know a bit more about sparse coding
To think about how to use it in your own works
To get ideas of extensions and/or applications domains
Retinal images classification
Feature extraction
Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 4

Features

BoW representation
Sampling strategy
Keypoints detection
Detect a set of keypoints (Harris, SIFT, etc)
Extract local descriptors around each keypoint
BoW representation
Sampling strategy
Dense sampling
Divide image into local patches
Extract local features from each patch
BoW representation
Clustering/Quantization
For each image Ii we extract a set of low level descriptors and
represent them as a feature matrix Xi :
2 3
666 | | | 777
666 1 2 i7
Xi = 66fi fi . . . fN
i 7
777 ,
4 5
| | |
where f1i , . . . , fN
i
i
are the Ni descriptors extracted from Ii .
We then put together all descriptors from all training images to form a
big training matrix X :
h i
X = X1 . . . XN .
PN
X is a matrix of size d ⇥ M, with M = i =1 Ni and d the dimension of
the descriptor.
BoW representation
To simplify the notation, we will just write the set of descriptors from
the training images as
2 3
666 | | | 777
6 7
X = 6666f1 f2 . . . fM 7777 .
4 5
| | |
Create a dictionary by solving the following optimization problem
M
X
min min kfm dk k2 ,
D k =1...K
m =1
where D = [d1 , . . . , dK ] are the K clusters centers to be found and k.k

is the L2 norm of vectors.
D is the visual dictionary or codebook.
BoW representation
The optimization problem
M
X
min min kfm dk k2 ,
D k =1...K
m =1
is solved iteratively with K-means algorithm.
K-means
1 Initialize the K centers (randomly)
2 Assign each data point to one of the K centers
3 Update the centers
4 Iterate until convergence
BoW representation
K-means algorithm results in a set of K cluster centers which form the
dictionary
2 3
666 | | | 777
666 7
D = 66d1 d2 . . . dK 7777
4 5
| | | d ⇥K
BoW representation
Features coding
Given the dictionary D
Given a set of low-level features Xi from image Ii
2 3
666 | | | 77
666 1 2 Ni 7 7
Xi = 66fi fi . . . fi 7777
4 5
| | |
Encode each local descriptor fli using the dictionary D

Find al such that
min kfli Dal k2 s .t . kal k0 = 1, al ⌫ 0

al
BoW representation
Features coding
Encode each local descriptor fli using the dictionary D
BoW representation
Features pooling
The coding of image Ii results in a matrix of codes A

2 3
666 | | | 777
666 7
A = 66a1 a2 . . . aK 7777 ,
4 5
| | | K ⇥N
i
where each al satisfies kal k0 = 1, al ⌫ 0

The pooling step transforms A into a single signature vector b
xi
xi = pooling(A)
b
BoW representation
Features pooling
A popular choice for pooling is to compute a histogram

Ni
1 X
xi =
b al
Ni
l =1
The final vector just encodes the frequency of occurrence of each

visual words.
BoW representation
Summary : Basic BoW framework
1 Extract a set of local features from all images
2 3
666 | | | 777
6 7
X = 6666f1 f2 . . . fM 7777
4 5
| | | d ⇥M
2 Create a visual dictionary by clustering of the set of local features
2 3
666 | | | 777
666 7
D = 66d1 d2 . . . dK 7777
4 5
| | | d ⇥K
3 Given D, encode each local feature from an image Ii , by assigning it
2 3
666 | | | 777
6 7
to its closest word : A = 6666a1 a2 . . . aK 7777
4 5
| | | K ⇥N
i
1 PNi
4 Finally, compute the final representation of Ii : b
xi = l 1
al
Ni =
Summary

PCA revis-BoW PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PCA revis-BoW PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Dimension Reduction from PCA

(and beyond) to BoW.

I PCA (principal component analysis) is one of the most widely

I So, PCA is essentially a change of basis

I The corresponding eigenvalues 1 > 2 > ··· > d give the

I The projected data is then yi = UTk xi , yi 2 Rk .

Connection with SVD

I Let X be the d ⇥ N data matrix X = [x1 , x2 , . . . , xN ]

Other facts about PCA

I It can be shown that the principal axes found as described

• Formally, minimize sum of squares of distances to the line.

• Why sum of squares? Because it allows fast minimization, assuming

• Minimizing sum of squares of distances to the line is the same as

• How is the sum of squares of projection lengths expressed in

• How is the sum of squares of projection lengths

max(X ' B.B' X) subject to X ' X = 1

• Show that the maximum value of xTBBTx is obtained for x satisfying

• (BBB)x points in some other direction in general

x is an eigenvector and e an eigenvalue if

• Recall PCA does not consider class

• We need to define some new criteria to

Many problems and applications in Computer Vision require

We can represent an image in several ways

How to find good representations ?

Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 4 / 58

Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 5 / 58

From H. Lee 2010

Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 6 / 58

Can we find better representation ?

Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 4

Désiré Sidibé (Le2i) Le2i - Lab Seminar 13/11/2014 5 / 58

Create a dictionary by solving the following optimization problem

where D = [d1 , . . . , dK ] are the K clusters centers to be found and k.k

is solved iteratively with K-means algorithm.

Encode each local descriptor fli using the dictionary D

min kfli Dal k2 s .t . kal k0 = 1, al ⌫ 0

The coding of image Ii results in a matrix of codes A

where each al satisfies kal k0 = 1, al ⌫ 0

A popular choice for pooling is to compute a histogram

The final vector just encodes the frequency of occurrence of each

Das könnte Ihnen auch gefallen

•  Formally, minimize sum of squares of distances to the line.

•  Why sum of squares? Because it allows fast minimization, assuming

•  Minimizing sum of squares of distances to the line is the same as

•  How is the sum of squares of projection lengths expressed in

•  How is the sum of squares of projection lengths

•  Show that the maximum value of xTBBTx is obtained for x satisfying

•  (BBB)x points in some other direction in general