Dimensionality Reduction by Pca: Non - Feasible

DIMENSIONALITY REDUCTION
by PCA
Motivation
n Doing Exhaustive search is very expensive
– non-feasible
n Doing Wrapper based feature selection
(SFS,SBS, SFFS, etc ) again very
expensive
n Doing filter based feature selection is
suboptimal
n Solution??? Try to automate this method
14
Spread of Data
n Often data varies in only some limited
directions
n Can’t spot low dimensional data by looking
at numbers
Example
15
16
17
18
Dimensionality Reduction
n Reduce dimensions by projecting onto low
dimensional subspace with maximum
variation
n You can consider this as dropping non-
necessary axis and rotating the remaining
axis
Data Compression
Reduce data from

2D to 1D
19
PCA is not linear regression
Principal Component Analysis

n Most common form of dimensionality
reduction
n The new variables/dimensions
n Are linear combinations of the original ones
n Are uncorrelated with one another
n Orthogonal in original dimension space
n Capture as much of the original variance in
the data as possible
n Are called Principal Components
20
Orthogonal Axis that
capture the max
variance of the data
What are the new axes?
n Orthogonal d irections o f g reatest variance in d ata

n Projections a long P C1 d iscriminate the d ata most a long a ny o ne
axis
21
Principal Components
• First principal component is the d irection o f g reatest
variability (covariance) in the d ata
• S econd is the n ext o rthogonal (uncorrelated) d irection o f g reatest

variability
– So first remove a ll the variability a long the first
component, a nd then find the n ext d irection o f
greatest variability
• A nd so o n …
Principal Components Analysis (PCA)

n Principle
n Linear p rojection method to reduce the n umber o f p arameters
n Transfer a set o f correlated variables into a n ew set o f
uncorrelated variables
n Map the d ata into a space o f lower d imensionality
n Form of u nsupervised learning
n Properties
n It can b e viewed a s a rotation o f the e xisting a xes to n ew
positions in the space d efined b y o riginal variables
n New axes a re o rthogonal a nd represent the d irections with
maximum variability
22
SOME BACKGROUND OF
STATISTICS
1st order statistics

n Mean: 1 X
m
µi = Xi
m i=1
n What is the diff bw these two sets

[0 8 12 20] and [8 9 11 12]
n Std Dev:
n The average distance from the mean of the
data set to a point
23
1st order statistics
n Std Dev:
sP
n
i=1 (Xi µ)2
=
(n 1)
n Variance (s2)
Pn
2 i=1 (Xi µ)2
=
(n 1)
Covariance
n Covariance always between two
dimensions cov (x,y)
n Covariance with itself is variance
n Covariance of 3-dimensional data set
(x,y,z),
n Measure cov between (x,y), (x,z) and (y,z)
24
n Variance:
Pn
i=1 (Xi µ)(Xi µ)
var(X) =
(n 1)
n Covariance:
Pn
i=1 (Xi µX )(Yi µY )
cov(X, Y ) =
(n 1)
In English
n For each data item, multiply the difference
between the x value and the mean of x, by
the the difference between the y value and
the mean of y. Add all these up, and divide
by (n-1)
25
Question
n Is cov(X,Y) equal to cov(Y,X) ??
n (Xi-μx )(Yi-μy ) and (Yi-μy ) (Xi-μx ) and
multiplication is commutative
Covariance Matrix
n For dataset with dimensions more than 2,
n!
you can calculate different
covariance values (n 2)! ⇤ 2
n Calculate for n=3
0 1
cov(x, x) cov(x, y) cov(x, z)
C = @ cov(y, x) cov(y, y) cov(y, z) A
cov(z, x) cov(z, y) cov(z, z)
26
Matrix Algebra
n Matrix * vector = rotated and scaled vector
n Matrix * vector = ONLY scaled vector and
NO rotation
Vector = Eigen vector
Example
27
Example 2
Eigen Vectors
n Can only be found for Square Matrices
n Give nxn matrix, there are n Eigen vectors
n Even if we scale the Eigen vector, you get
same multiple as a result—cuz you are
only scaling the vector, its direction
remains the same
n All Eigen vectors of a matrix are
orthogonal
n Usually Eigen vectors are calculated as
unit vectors: magnitude is exactly one
28
Eigen Vector
✓ ✓◆ 3 ◆
3
2 2
pp 2 p p
2 = 13
32 +3 22+=2 13
✓ ✓p3/p◆13 ◆
3/p13 p
2/ 13
2/ 13
Eigen Value
n In both those examples, the amount by
which the original vector was scaled after
multiplication by the square matrix was the
same
n Eigen value of this Eigen vector is 4
29
Principal Components Analysis
n Step 1: Get some data
DATA:
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9 59
n Step 2
ZERO MEAN DATA:
n Subtract Mean
x y
.69 .49
-1.31 -1.21
.39 .99
.09 .29
1.29 1.09
.49 .79
.19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01
30
n Step 3
nCalculate Covariance matrix
cov = .616555556 .615444444
.615444444 .716555556
n since the non-diagonal elements in this

covariance matrix are positive, we should expect
that both the x and y variable increase together.
n Step 4:
n Calculate the eigenvectors and
eigenvalues of the covariance matrix
eigenvalues = .0490833989
1.28402771
eigenvectors = -.735178656 -.677873399
.677873399 -.735178656
31
Eigen Vectors
n Sign of the Eigen vector does not really

mater
n Vector in opposite direction
n Sort them according to eigen value
64
32
Note:
• Note they are perpendicular to each other.
• Note one of the eigenvectors goes through
the middle of the points, like drawing a line
of best fit.
• The second eigenvector gives us the
other, less important, pattern in the data,
that all the points follow the main line, but
are off to the side of the main line by some
amount.
PCA Example –STEP 5

Now, if you like, you can decide to ignore the
components of lesser significance.
You do lose some information, but if the eigenvalues are
small, you don’t lose much
n n dimensions in your data

n calculate n eigenvectors and eigenvalues
n choose only the first k eigenvectors
n final data set has only k dimensions.
33
n Feature Vector
FeatureVector = (eig1 eig2 eig3 … eign)
We can either form a feature vector with both of
the eigenvectors:
-.677873399 -.735178656
-.735178656 .677873399
or, we can choose to leave out the smaller, less
significant component and only have a single
column:
- .677873399
- .735178656
67
Eigen Vectors
n Ureduce = U(:,1:k);
n z = Ureduce’*x; % Projected
data
34
n Deriving the new data
FinalData = RowFeatureVector x RowZeroMeanData
RowFeatureVector is the matrix with the eigenvectors in
the columns transposed so that the eigenvectors are
now in the rows, with the most significant eigenvector at
the top
RowZeroMeanData is the mean-adjusted data
transposed, ie. the data items are in each
column, with each row holding a separate
dimension.

FinalData transpose:
dimensions along columns
x y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
35
Reconstruction of original Data

n If we reduced the dimensionality, obviously,
when reconstructing the data we would lose
those dimensions we chose to discard. In our
example let us assume that we considered only
the x dimension…
36
n Z = UT * X
n X = (U-1 * Z) + originalMean
Reconstruction of original Data
x
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
37
HOW TO SELECT K PCS
PCs, Variance and Least-Squares
• The first PC retains the g reatest a mount o f variation in the sample
• The kth PC retains the kth greatest fraction o f the variation in the
sample
• The kth largest e igenvalue o f the correlation matrix C is the variance
in the sample a long the kth PC
• The least-squares view: P Cs a re a series o f linear least

squares fits to a sample, e ach o rthogonal to a ll p revious o nes
38
Dimensionality Reduction
Can ignore the components of lesser significance.
Based upon %age Variance covered by the PCs
2 3
s11 0 0 0 0
6 0 s22 0 0 0 7
6 7
S=6 .. 7
4 0 0 . 0 0 5
0 0 0 0 snn
Pk
sii
Pi=1
n 0.99
i=1 sii
39

Dimensionality Reduction by Pca: Non - Feasible

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Dimensionality Reduction by Pca: Non - Feasible

Hochgeladen von

Copyright:

Verfügbare Formate

DIMENSIONALITY REDUCTION

Reduce data from

Principal Component Analysis

What are the new axes?

n Orthogonal d irections o f g reatest variance in d ata

• S econd is the n ext o rthogonal (uncorrelated) d irection o f g reatest

Principal Components Analysis (PCA)

1st order statistics

n What is the diff bw these two sets

n since the non-diagonal elements in this

n Sign of the Eigen vector does not really

PCA Example –STEP 5

n n dimensions in your data

PCA Example –STEP 5

Reconstruction of original Data

Reconstruction of original Data

PCs, Variance and Least-Squares

• The least-squares view: P Cs a re a series o f linear least

Can ignore the components of lesser significance.

Based upon %age Variance covered by the PCs

Das könnte Ihnen auch gefallen

Dimensionality Reduction by Pca: Non - Feasible

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Dimensionality Reduction by Pca: Non - Feasible

Hochgeladen von

Copyright:

Verfügbare Formate

DIMENSIONALITY REDUCTION

Reduce data from

Principal Component Analysis

What are the new axes?

n Orthogonal d irections o f g reatest variance in d ata

• S econd is the n ext o rthogonal (uncorrelated) d irection o f g reatest

Principal Components Analysis (PCA)

1st order statistics

n What is the diff bw these two sets

n since the non-­diagonal elements in this

n Sign of the Eigen vector does not really

PCA Example –STEP 5

n n dimensions in your data

PCA Example –STEP 5

Reconstruction of original Data

Reconstruction of original Data

PCs, Variance and Least-­Squares

• The least-­squares view: P Cs a re a series o f linear least

Can ignore the components of lesser significance.

Based upon %age Variance covered by the PCs

Das könnte Ihnen auch gefallen

n since the non-diagonal elements in this

PCs, Variance and Least-Squares

• The least-squares view: P Cs a re a series o f linear least