Pca 1

PRINCIPLE COMPONENT
ANALYSIS
Korea, January 2017
An introduction to PCA
PCA is a method of extracting some of the most
important trends in high-dimensional data.
We will perform PCA on this simple data set as an
example
To understand how PCA works, We will need
- Basic statistics
- Matrix algebra
Technique quite old: Pearson (1901) and Hotelling
(1933), but still one of the most used multivariate
techniques today
Main idea:
Start with variables X1, . . . , Xp
Find a rotation of these variables, say Y 1, . . . , Yp
(called
principal components), so that:
Y1, . . . , Yp are uncorrelated.

Idea: they measure different dimensions of the data.
Var(Y1) Var(Y2) . . . Var(Yp).
Idea: Y1 is most important, then Y2, etc.
BASIC STATISTICS
Statistics bring out the trends wherever there is
randomness and uncertainty
May not be able to predict exact values, but

statistics may be predictable
We shall introduce the following statistics:
The mean: the middle of the data
the Variance: the spread of the data
the Covariance: the degree of co-dependence of
two variables
Suppose that we have a random vector

with
X. population variance-covariance matrix
Consider the linear combinations
First Principal Component

(PCA1): Y1
The first principal component is the linear combination of x-variables
that has maximum variance (among all linear combinations), so it
accounts for as much variation in the data as possible.
Specifically we will define coefficients e11, e12, ... , e1p for that
component in such a way that its variance is maximized, subjectto the
constraint that the sum of the squared coefficientsis equal to one. This
constraint is required so that a unique answer may be obtained.
select e11, e12, ... , e1p that maximizes
subject to the constraint that
First Principal Component

(PCA1): Y1
PCA is a technique for dimensionality reduction from p
dimensions to k < p dimensions
It tries to find, in order, the most informative k linear
combinations of a set of variables y1; y2; ; yk
Here information will be interpreted as a percentage of the
total variation (as previously defined) in
The k sample PCs that "explain" x% of the total variation in a
sample covariance matrix S may be similarly defined.
Coordinate transformations: an
example of matrix algebra
Two dimensional space described by coordinates
Points in space described by x1 and x2 column
vectors with elements: x = [x1 x2];
e.g. the vector [1;2] represents the point
Coordinate transformations: an
example of matrix algebra
Alternative coordinate system described by the
coordinates x1 and x2
Have a different column vector describing the same
point:
The two coordinate systems are Tx = x related by a
transformation matrix T, such that
Important: An orthogonal matrix
is the kind of matrix which
performs rotated-axis coordinate
transformations
Eigenvectors and Eigenvalues

Special equation, where a matrix maps a vector to
a multiple of itself:
Tx x
Lamda are the Eigenvalues, and x are the

Eigenvectors (can use matlab to find these)
A matrix formed from the Eigenvectors placed in
the columns is Orthogonal generates a rotatedaxis coordinate system.
PCA by simple example

We will perform PCA on the
following data:
This might be several
measurements
of
the
expression of two genes for
example.
Bring out underlying trends
in this 2-D data
X1
X2
2.5
2.4
0.5
0.7
2.2
2.9
1.9
2.2
3.1
3.0
2.3
2.7
2.1
1.6
1.0
1.1
1.5
1.6
1.1
0.9

We will perform PCA
on the following data:
Plot the data:
Trends are already
apparent because data
is simple, but this is not
usually the case:
First
statistical
step
is

Calculate the mean of each of the two variables:
Subtract the means:

Covariance matrix:
For 2 variables:
For our data:

Calculate the Eigenvalues of
this matrix:
Most variation in one direction

The corresponding eigenvectors
are :
Place in a matrix in descending order of
eigenvalues
The transpose of this perform our coordinating
transformation
This is an orthogonal matrix, and so performs a
rotated-axis coordinate transformation
The matrix just derived

provides two new coordinate
for our data shown in the plot
We can transform our data

matrix so that the data are
represented in the new
coordinates

PCA is a method of revealing underling trends in
large amounts of data
A new coordinate system is constructed by rotating
the axes
The first coordinate is the direction in which the data
varies most, and so on
Select a few variables which contain most of the data,
and can be visuallized
THANK YOU
FOR YOUR
ATTENTION!

Pca 1

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Pca 1

Hochgeladen von

Copyright:

Verfügbare Formate

PRINCIPLE COMPONENT

Korea, January 2017

Y1, . . . , Yp are uncorrelated.

May not be able to predict exact values, but

Suppose that we have a random vector

Consider the linear combinations

First Principal Component

First Principal Component

Eigenvectors and Eigenvalues

Lamda are the Eigenvalues, and x are the

PCA by simple example

PCA by simple example

PCA by simple example

Subtract the means:

PCA by simple example

For our data:

PCA by simple example

PCA by simple example

The matrix just derived

We can transform our data

PCA by simple example

Das könnte Ihnen auch gefallen