Beruflich Dokumente
Kultur Dokumente
Dimensionality Reduction
MOTIVATION
PRINCIPAL METHODS
PCA
What and How?
Executing on python
AIMS
I FEATURE SELECTION:
Choosing the p < k most important features
I FEATURE EXTRACTION:
It allow us reduce dimensional space to a lower dimension
space.
PRINCIPAL METHODS
I Principal Component Analysis(PCA)
I Linear Discriminant Analysis(LDA)
I Quadratic Discriminant Analysis(QDA)
I Regularized Discriminant Analysis (RDA)
I Multidimensional scalling (MDS)
¿What is PCA?
I PCA reducing the dimension of a data into specified smaller
dimension.
I Unsupervised learning algorithm.
I Quantify the variability of the data.
I PCA helps to visualize better the high dimensional data.
1
COV (X ) = (X T X )
n−1
Let v1 the first pc, v1 = θ11 x1 + θ12 x2 + θ13 x3 . . . θ1k xk
Σv1 = λ1 v1
v1 is an eigenvector of Σ
Choose the one with the largest eigenvalue for Var(z) to be max
var (x T v1 ) ≈ v1T Σv1 = v1T λ1 v1 = λ1 v1T v1 = λ1
and v2T v1 = 0
Σv2 − λ2 (v2 ) = 0
Σv2 = λ2 v2
where v is another eigenvector of Σ
| The Feather Beamer Theme
15
T
var(x v2 ) ≈ v2T Σv2 = v2T λ2 v2 = λ2 (v2T v2 ) = λ2
PROCEEDING ONWARDS
I The eigenvalue, λ1 and their corresponding eigenvector v1 of Σ
such that var (x T V1 ) ≈ v1T Σv1 = λ1
I The eigenvalue, λ2 ≤ λ1 and their corresponding eigenvector v2
of Σ such that var (x T V2 ) ≈ v2T Σv2 = λ2
I The eigenvalue, λ3 ≤ λ2 and their corresponding eigenvector v3
of Σ such that var (x T V3 ) ≈ v3T Σv3 = λ3
..
.
I The eigenvalue, λp ≤ λp−1 and their corresponding eigenvector
vp of Σ such that var (x T Vp ) ≈ vp0 Σvp = λp
I If our objective is visualize the data, then picking the first two or
three principal components that makes sense.
I Typically, choose K to be smallest value so that 99% of variance
is retained, as below:
1
Pm
m i=1 kx
(i)
− xaprox (i) k2
1
P m (i) 2
≤ 0.01
m i=1 kx k
For K given,
Pk
Sii
1 − Pi=1
n ≤ 0.01
i=1 Sii
I Plot of PoV vs K
Executing on python
I To execute PCA in python, simply import PCA from sklearn
library.
I Let’s to implement this tool learned FOR RECOGNITION OF
HANDWRITTEN DIGITS DATA SET.