Karhunen-Loève Transform - KLT: Jankees Van Der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Love Transform KLT
JanKees van der Poel

D.Sc. Student, Mechanical
Engineering
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 2
Karhunen-Love Transform

Has many names cited in literature:
Karhunen-Love Transform (KLT);
Karhunen-Love Decomposition (or Expansion);
Principal (or Principle) Component Analysis (PCA);
Principal (or Principle) Factor Analysis (PFA);
Singular Value decomposition (SVD);
Proper Orthogonal Decomposition (POD);

Has many names cited in literature:
Galerkin Method (this variation is used to find
solutions to certain types of Partial Differential
Equations, PDEs, specially in the field of Mechanical
Engineering and electromechanically coupled
systems);
Hotelling Transform; and
Collective Coordinates.
Karhunen-Love Transform (KLT) takes a given
collection of data (an input collection) and
creates an orthogonal basis (the KLT basis) for
the data.
An orthogonal basis for a space V is a set of mutually
orthogonal vectors (in other words, they are linearly
independent) {b
i
} that span the space V.
Here is provided an overview of KLT for some
specific type of input collections.

Pearson (1901), Hotelling (1933), Kosambi
(1943), Love (1945), Karhunen (1946),
Pougachev (1953) and Obukhov (1954) have
been independently credited to the discovery of
KLT under one of its many titles.
KLT has applications in almost any scientific field.
KLT has been widely used in:
Studies of turbulence;
Thermal/chemical reactions;
Feed-forward and feedback control design
applications (KLT is used to obtain a reduced order
model for simulations or control design);
Data analysis or compression (characterization of
human faces, map generation by robots and freight
traffic prediction);
One of the most important mathematical matrix
factorizations is what is called the Singular
Value Decomposition (SVD).
The Singular Value Decomposition has many useful
properties desirable in many applications.
The Principle Components Analysis (PCA) is an
application of the SVD.
It identifies patterns in data, expressing this data in a
way as to highlight their similarities and differences.
To make things easy, the name Principal
Component Analysis (PCA) will be used from
now on, instead of KLT or SVD.
In our field of signal/image processing, this is the
known name for the Karhunen-Love Transform
What is Principal Component Analysis?
Patterns in data can be hard to find in high
dimensional data (where the luxury of graphical
representation is not available).
Principal Component Analysis
So, use PCA for analyzing the data.
Once the data patterns where found, reduce the
number of data dimensions (without much loss of
information), by compressing the data (this makes
more easy to visualize the hidden data pattern).
The PCA basically analyzes the data in order to
reduce its dimensions, eliminate superpositions
and it better using linear combinations obtained
from the original variables.
Data Presentation
Example:
53 blood and urine measurements from 65 people
(33 alcoholics, 32 non-alcoholics).
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHC H-MCHC
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000
0 10 20 30 40 50 60
0
100
200
300
400
500
600
700
800
900
1000
measurement
V
a
l
u
e
Measurement
Matrix Format Spectral Format
0
10 20 30 40 50 60 70
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Person
H
-
B
a
n
d
s
Univariate
0 50 150 250 350 450
50
100
150
200
250
300
350
400
450
500
550
C-Triglycerides
C
-
L
D
H
Bivariate
0
100
200
300
400
500
0
200
400
600
0
1
2
3
4
C-Triglycerides
C-LDH
M
-
E
P
I
Trivariate
Data Presentation
Data Presentation
Is there a better presentation than the common
Cartesian axes?
That is, do we really need a space with 53
dimensions to view the data?

This rises the question of how to find the best
low dimension space that conveys maximum
useful information.
The answer is Find the Principal Components!
Principal Components
All of the Principal
Components (PCs) start
at the origin of the
ordinate axes.
The first PC is the direction
of maximum variance
from origin.
All subsequent PCs are
orthogonal to the first PC,
describing maximum
residual variance.
Algebraic Interpretation nD Case

Let's say that m points in a space with n (n large)
dimensions are given.

Now, how does one project these m points on to
a low dimensional space while preserving broad
trends in the data, while also allowing it to be
visualized?
Algebraic Interpretation 1D Case
Given m points in a n (n large) dimensional
space, how does one project these m points on
to a one dimensional space?
Simply choose a line that fits the data so the points
are spread out well along the line.
Formally, minimize the sum of squares of
distances to the line.
Why sum of squares?
Because it allows fast minimization, assuming the line
passes through zero!
Minimizing the sum of squares of distances to
the line is the same as maximizing the sum of
squares of the projections on that line.
Many thanks to Pythagoras!
Basic Mathematical Concepts
Before getting to a description of PCA, this
tutorial first introduces mathematical concepts
that will be used in PCA:
Standard deviation, covariance, and eigenvectors and
eigenvalues

This background knowledge is meant to make
the PCA section very easy, but can be skipped if
the concepts are already familiar.
Standard Deviation

The Standard Deviation (SD) of a data set is a
measure of how spread out the data is.
The average distance from the mean of the data set
to a point.
The datasets [0, 8, 12, 20] and [8, 9, 11, 12] have
the same mean (that is 10) but are quite
different.
Standard Deviation

By means of the Standard Deviation it is
possible, in some way, to differentiate these two
sets
As expected, the first set ([0, 8, 12, 20]) has a
much larger standard deviation than the second
set ([8, 9, 11, 12]) due to the fact that the data is
much more spread out from the mean.
Variance

Variance is another measure of the spread of
data in a data set.
In fact it is almost identical to the standard deviation.
The only difference is that the variance is simply
the standard deviation squared.
Variance, in addition to Standard Deviation, was
introduced to provide a solid platform from which
the next section, covariance, can be launched.
Covariance

Both standard deviation and variance are purely
one dimensional measures.
However many data sets have more than one
dimension.
The aim of the statistical analysis of these kind of
data sets is usually to see if there is any
relationship between its dimensions.
Covariance
Standard deviation and variance only operate on
one dimensional data, so it is only possible to
calculate the standard deviation for each
dimension of the data set independently of the
other dimensions.

However, it is useful to have a similar measure to
find out how much the dimensions vary from the
mean with respect to each other.
Covariance
Covariance is always calculated between two
dimensions.
With a 3D data (X, Y, Z), covariance is calculated
between (X, Y), (X, Z) and (Y, Z).
With a nD data set, [n!/2*(n-2)!] different covariance
values can be calculated.
The covariance calculated between a dimension and
itself gives the variance.
The covariance between (X, X), (Y, Y) and (Z, Z) gives the
variance of the X, Y and Z dimensions.
Covariance Matrix
As an example, lets make up the covariance
matrix for an imaginary 3 dimensional data set,
with the usual dimensions x, y and z.
In this case, the covariance matrix has three
rows and three columns with these values:
|
|
|
.
|
\
|
=
) , cov( ) , cov( ) , cov(
) , cov( ) , cov( ) , cov(
) , cov( ) , cov( ) , cov(
z z y z x z
z y y y x y
z x y x x x
C
Covariance Matrix

Down the main diagonal, one can see that the
covariance value is computed between one of
the dimensions and itself (which are the
variances for that dimension).

Since cov(a,b) = cov(b,a), the covariance matrix is
symmetrical about the main diagonal.
Eigenvectors and Eigenvalues

A vector v is an eigenvector of a square matrix
(m by m) M if M*v (multiplication of the matrix
M by the vector v) gives a multiple of v, i.e., a
*v (multiplication of the scalar by the vector
v).
In this case, is called the eigenvalue of M that
is associated to the eigenvector v.
Eigenvector Properties

Eigenvectors can only be found for square
matrices.
Not every square matrix has eigenvectors.
An m by m matrix has m eigenvectors, given that
they exist.
For example, given a 3 by 3 matrix that has
eigenvectors, there are three of them.

Even if the eigenvector is scaled by some amount
before being multiplied, one still gets the same
multiple of it as a result.
This is because if a vector is scaled by some amount,
all it is done is to make it longer, not changing its
direction
All the eigenvectors of a matrix are
perpendicular (orthogonal), i.e., at right angles
to each other, no matter how many dimensions
the matrix have.

This is important because it means that the data
can be expressed in terms of these
perpendicular eigenvectors, instead of
expressing them in terms of their axes.
The PCA Method

Step 1: Get some data to use in a simple
example.
I am going to use my own two dimensional data set.
I have chosen a two dimensional data set because I can
provide plots of the data to show what the PCA analysis is
doing at each step.
The data I have used is found in the next slide.
The PCA Method
The data used in this
example is shown
here.
alturas pesos
183 79
173 69
120 45
168 70
188 81
158 61
201 98
163 63
193 79
167 71
178 73
Data =
The PCA Method
Step 2: Subtract the mean.
For PCA to work properly, you have to subtract the
mean from each of the data dimensions.
The mean subtracted is the average across each
dimension.
All the x values have their mean value subtracted from
them, as well as all the y values have their mean value
subtracted from them.
This produces a data set whose mean is zero.
The PCA Method
The data with its
mean subtracted
(adjusted data) is
shown here.
Both the data and the
adjusted data are
plotted in the next
slide.
alturas pesos
11 7.27
1 -2.72
-52 -26.72
-4 -1.72
16 9.27
-14 -10.72
29 26.27
-9 -8.72
21 7.27
-5 -0.72
6 1.27
Data =
The PCA Method
The PCA Method
Step 3: Calculate the covariance matrix.
Since the data is two dimensional, the covariance
matrix will have two rows and two columns:
|
|
.
|
\
|
=
02 . 180 70 . 277
70 . 277 80 . 471
C
As the non-diagonal
el ement s i n t hi s
covariance matrix are
positive, we should
expect that both x
a nd y v a r i a b l e s
i ncrease together.
One shoul d noti ce t hat
hei ghts and wei ghts do
normally increase together.
The PCA Method
Step 4: Calculate the eigenvectors and
eigenvalues of the data matrix.
In Matlab, this step is performed using eig (only
for square matrices) or svd (matrices with any
shape) commands.
As the data matrix is not square, we only can use the
svd command.
The eigenvectors and eigenvalues are rather
important, giving useful information about the data.
The PCA Method
Step 4: Calculate the eigenvectors and
eigenvalues of the data matrix.
Here are the eigenvectors, which are found along the
diagonal of the matrix S, diag(S) in Matlab, and the
eigenvalues:
|
|
.
|
\
|
+

=
|
|
.
|
\
|
=
0.9220 0.3871
0.3871 0.9220
0392 . 16
623.1194
rs eigenvecto
s eigenvalue
The PCA Method
Looking at the plot of the
adjusted data shown
here, one can see how it
has quite a strong
pattern.
As expected from the
covariance matrix (and
from the common sense),
both of the variables
increase together.
The PCA Method
On top of the adjusted data I have plotted both
eigenvectors as well (appearing as a red and a
green line).
As stated earlier, they are perpendicular to each
other.
More important than this is that they provide
information about the data patterns.
One of the eigenvectors goes right through the
middle of the points, drawing a line of best fit.
The PCA Method
The first eigenvector (the one plotted in green)
shows us that these two data sets are very
related to each other along that line.

The second eigenvector (the one plotted in red)
gives the other, and less important, pattern in
the data.
It shows that all the points follow the main line, but
are off to its side by some amount.
The PCA Method

By the process of taking the eigenvectors of the
covariance matrix, we have been able to extract
lines that characterize the data.

The rest of the steps involve transforming the
data so that this data is expressed in terms of
these lines.
The PCA Method

Recalling the important aspects from the
previous figure:
Two lines are perpendicular to each other, being
interchangeably orthogonal;
The eigenvectors provides us a way to see hidden
patterns of the data;
One of the eigenvectors draws a line which best fits
to the data.
The PCA Method

Step 5: Choosing components and forming a
feature vector.
Here comes the notion of data compression and
reduced dimensionality.
Eigenvalues have different values: the highest one
corresponds to the eigenvector that is the principal
component of the data set (the most significant
relationship between the data dimensions).
The PCA Method
Once the eigenvectors are found from the data
matrix, they are ordered by their eigenvalues,
from the highest to the lowest.
This gives the components in order of significance.

The components which are less significant can
be ignored.
Some information is lost but, if the eigenvalues are
small, the amount lost is not too much.
The PCA Method

If some components are left out, the final data
set will have less dimensions than the original.
If the original data set has n dimensions and n
eigenvectors are calculated (together with their
eigenvalues) and only the first p eigenvectors are
chosen, then the final data set will have only p
dimensions.
The PCA Method
Now, what needs to be done is to form a feature
vector (a fancy name for a matrix of vectors).
This feature vector is constructed by taking the
eigenvectors that are to be kept from the list of
eigenvectors and form a matrix with them in the
columns.
T
n
r eigenvecto
r eigenvecto
r eigenvecto
Vector Feature
(
(
(
= , ,
,
_
2
1
The PCA Method
Using the data set seen before, and the fact that
there are two eigenvectors, there are two
choices.
One is to form a feature vector with both of the
eigenvectors:
|
|
.
|
\
|
+

=
0.9220 0.3871
0.3871 0.9220
rs eigenvecto
The PCA Method
The other is to form a feature vector leaving out
the smaller, less significant, component and only
have a single column:

r eigenvecto t significan Less r eigenvecto t significan Most
eigenvalue t significan Less
eigenvalue t significan Most
0.9220 0.3871
0.3871 0.9220
0392 . 16
623.1194
|
|
.
|
\
|
+

=
|
|
.
|
\
|
=
rs eigenvecto
s eigenvalue
The PCA Method
In other words, the result is a feature vector with
p vectors, selected from n eigenvectors (where p
< n).
This is the most common option.
( )}

r eigenvecto t significan Most
eigenvalue t significan Most
3871 . 0
9220 . 0
1194 . 623
|
|
.
|
\
|
=
=
r eigenvecto
eigenvalue
The PCA Method

Step 6: Deriving the new data set.
This the final step in PCA (and the easiest one).
Chose the components (eigenvectors) to be kept in
the data set and form a feature vector.
Just remember that the eigenvector with the highest
eigenvalue is the principal component of the data set.
Take the transpose of the vector and multiply it on
the left of the transposed original data set.
The PCA Method

The matrix called RowFeaureVector has the
transposed eigenvectors in its columns.
The eigenvectors are now in the rows, with the most
significant one at the top.
The matrix called RowDataAdusted has the
transposed mean adjusted data in its columns.
The data items are in each column, each row holding a
separate dimension.
usted RowDataAdj Vector RowFeature Data Final = _
The PCA Method
This sudden transpose of all data is confusing,
but equations from now on are easier if the
transpose of the feature vector and the data is
taken first.
Better that having to always carry a little T symbol
above their names!

Final_Data is the final data set, with data items
in columns, and dimensions along rows.
The PCA Method
The original data is now only given in terms of
the chosen vectors.
The original data set was written in terms of the x
and y axes.
The data can be expressed in terms of any axes,
but the expression is most efficient if these axes
are perpendicular.
This is why it was important that eigenvectors are
always perpendicular to each other.
The PCA Method

So, the original data (expressed in terms of the x
and y axes ) is now expressed in terms of the
eigenvectors found.
If a reduced dimension is needed (throwing
some of the eigenvectors out), the new data will
be expressed in terms of the vectors that were
kept.
The PCA Method
The PCA Method
Among all possible orthogonal transforms, PCA is
optimal in the following sense:
KLT completely decorrelates the signal; and
KLT maximally compacts the energy (in other words,
the information) contained in the signal.
But the PCA is computationally expensive and is
not supposed to be used carelessly.
Instead, one can use the Discrete Cosine Transform,
DCT, which approaches the KLT in this sense.
The PCA Method Examples
Here, we switch to Matlab in order to run some
examples that (I sincerely hope) may clarify the
things to you:
Project the data into the principal component axis,
show the rank one approximation, and compress an
image by reducing the number of its coefficients
(PCA.m), pretty much as by using the DCT.
Show the difference between the least squares and
the PCA and do the alignment of 3D models using the
PCA properties (SVD.m).
Some things should be noticed about the power
of PCA to compress an image (as seen in the
PCA.m example).
The amount of memory required to store an
uncompressed image of size m n is M
image
=
m*n.
So, notice that the amount of memory we need to
store an image increases exponentially as its
dimensions get larger.
But, the amount of memory required to store an
SVD image (also of size m n) approximation
using rank k is M
approx
= k(m + n + 1).
So, notice that the amount of memory required
increases linearly as the dimensions get larger, as
opposed to exponentially.
Thus, as the image gets larger, more memory is
saved by using SVD.
Perform face recognition using the Principal
Component Analysis approach!
This is accomplished using a technique known in
the literature by the Eigenface Technique.
We will see an example of how to do it using a
well known Face Database called The AT & T
Faces Database.
Two Matlab functions: faceRecognitionExample.m
and loadFaceDatabase.m.
What is the Eigenface Technique?

The idea is that face images can be economically
represented by their projection onto a small
number of basis images derived by finding the
most significant eigenvectors of the pixel wise
covariance matrix for a set of training images.
A lot of people like to play with this technique, but in
my tutorial I will simply show how to get some
eigenfaces and play with them in Matlab.
AT&T Database of Faces
AT&T Database of Faces contains a set of face
images.
Database used in the context of a face recognition
project.
Ten different images of 40 distinct subjects taken
at different times (varying lighting, facial details
and expressions) and against a dark homogeneous
background with subjects in an upright, frontal
position (some side movement was tolerated).
AT&T Database of Faces
The images have a size of 92x112 pixels (in other
words, 10304 pixels) and 256 grey levels per pixel,
organized in 40 directories (one for each subject) and
each directory contains ten different images of a
subject.
Matlab can read PNG files and other formats
without help.
So, it is relatively easy to load all face database into
Matlabs workspace and process it.
Getting The Faces Into One Big Matrix
First of all, we need to put all the faces of the
database in one huge matrix with a size of
112*92 = 10304 lines and 400 columns.

This step is done by the function called
loadFaceDatabase.m.
It reads a bunch of images, makes column vectors
out of each of one of them, put all together and
return the result.
Getting the Recognition to Work

Here we change to Matlab directly, because the
steps we do to perform the face recognition task
are better explained seeing the function called
faceRecognitionExample.m.
All the steps necessary to perform this task are done
in this function and it is ready to be executed and
commented.
Cases When PCA Fail (1)
PCA projects data onto a set of orthogonal
vectors (principle components).
This restricts the new input components to be a
linear combination of old ones.
However, there are cases where the intrinsic
freedom of data can not be expressed as a linear
combination of input components
In such cases PCA will overestimate the input
dimensionality.
So, PCA does is not capable to find the non-
linear intrinsic dimension of data (like the angle
between the two vectors in the example above).
Instead, it will find out two components with equal
importance.
In cases when components with small variability
really matter, PCA will make mistakes due to its
unsupervised nature.
In such cases, if we only consider the projections of
two classes of data as input, they will become
indistinguishable.
A
n
y

(
R
e
a
s
o
n
a
b
l
e
)

D
o
u
b
t
s
?

Karhunen-Loève Transform - KLT: Jankees Van Der Poel D.Sc. Student, Mechanical Engineering

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Karhunen-Loève Transform - KLT: Jankees Van Der Poel D.Sc. Student, Mechanical Engineering

Hochgeladen von

Copyright:

Verfügbare Formate

Karhunen-Love Transform KLT

JanKees van der Poel

Das könnte Ihnen auch gefallen