Sie sind auf Seite 1von 26

DIMENSIONALITY  REDUCTION

by  PCA

Motivation
n Doing  Exhaustive  search  is  very  expensive  
– non-­feasible
n Doing  Wrapper  based  feature  selection  
(SFS,SBS,  SFFS,  etc )  again  very  
expensive
n Doing  filter  based  feature  selection  is  
suboptimal
n Solution???  Try  to  automate this  method

14
Spread  of  Data  
n Often  data  varies  in  only  some  limited  
directions  
n Can’t  spot  low  dimensional  data  by  looking  
at  numbers  

Example

15
16
17
18
Dimensionality  Reduction
n Reduce  dimensions  by  projecting  onto  low  
dimensional  subspace  with  maximum  
variation
n You  can  consider  this  as  dropping  non-­
necessary  axis  and  rotating  the  remaining  
axis

Data  Compression

Reduce  data  from  


2D  to  1D

19
PCA  is  not  linear  regression

Principal  Component  Analysis


n Most  common  form  of  dimensionality  
reduction
n The  new  variables/dimensions
n Are  linear  combinations  of  the  original  ones
n Are  uncorrelated  with  one  another
n Orthogonal in  original dimension space
n Capture  as  much  of  the  original  variance in  
the  data  as  possible
n Are  called  Principal  Components

20
Orthogonal  Axis  that  
capture  the  max  
variance  of  the  data

What  are  the  new  axes?

n Orthogonal  d irections  o f  g reatest  variance  in  d ata


n Projections  a long  P C1  d iscriminate  the  d ata  most  a long  a ny  o ne  
axis

21
Principal  Components
•  First  principal  component  is  the  d irection  o f  g reatest
variability  (covariance)  in  the  d ata

•  S econd  is  the  n ext  o rthogonal  (uncorrelated)  d irection  o f  g reatest  


variability
– So  first  remove  a ll  the  variability  a long  the  first
component,  a nd  then  find  the  n ext  d irection  o f  
greatest  variability

•  A nd  so  o n  …

Principal  Components  Analysis  (PCA)


n Principle
n Linear  p rojection method  to  reduce  the  n umber  o f  p arameters
n Transfer  a  set  o f  correlated  variables  into  a  n ew  set  o f  
uncorrelated  variables
n Map  the  d ata  into  a  space  o f  lower  d imensionality
n Form  of  u nsupervised learning

n Properties
n It  can  b e  viewed  a s  a  rotation  o f  the  e xisting  a xes  to  n ew  
positions  in  the  space  d efined  b y  o riginal  variables
n New  axes  a re  o rthogonal  a nd  represent  the  d irections  with  
maximum  variability

22
SOME  BACKGROUND   OF  
STATISTICS

1st order  statistics


n Mean: 1 X
m
µi = Xi
m i=1

n What  is  the  diff  bw these  two  sets


[0  8  12  20]  and  [8  9  11  12]
n Std Dev:
n The  average  distance  from  the  mean  of  the  
data  set  to  a  point  

23
1st order  statistics
n Std Dev:
sP
n
i=1 (Xi µ)2
=
(n 1)
n Variance  (s2)
Pn
2 i=1 (Xi µ)2
=
(n 1)

Covariance
n Covariance  always  between  two  
dimensions  cov (x,y)
n Covariance  with  itself  is  variance
n Covariance  of  3-­dimensional  data  set  
(x,y,z),
n Measure  cov between  (x,y),  (x,z)  and  (y,z)

24
n Variance:  
Pn
i=1 (Xi µ)(Xi µ)
var(X) =
(n 1)
n Covariance:
Pn
i=1 (Xi µX )(Yi µY )
cov(X, Y ) =
(n 1)

In  English
n For  each  data  item,  multiply  the  difference  
between  the  x  value  and  the  mean  of  x,  by  
the  the  difference  between  the  y  value  and  
the  mean  of  y.  Add  all  these  up,  and  divide  
by  (n-­1)

25
Question
n Is  cov(X,Y)  equal  to  cov(Y,X)  ??
n (Xi-­μx )(Yi-­μy ) and  (Yi-­μy )  (Xi-­μx )  and  
multiplication  is  commutative

Covariance  Matrix
n For  dataset  with  dimensions  more  than  2,  
n!
you  can  calculate                                              different  
covariance  values (n 2)! ⇤ 2
n Calculate  for  n=3
0 1
cov(x, x) cov(x, y) cov(x, z)
C = @ cov(y, x) cov(y, y) cov(y, z) A
cov(z, x) cov(z, y) cov(z, z)

26
Matrix  Algebra
n Matrix  *  vector  =  rotated  and  scaled  vector
n Matrix  *  vector  =  ONLY  scaled  vector  and

NO  rotation
Vector  =  Eigen  vector

Example

27
Example  2

Eigen  Vectors
n Can  only  be  found  for  Square  Matrices
n Give  nxn matrix,  there  are  n  Eigen vectors
n Even  if  we  scale  the  Eigen  vector,  you  get  
same  multiple  as  a  result—cuz you  are  
only  scaling  the  vector,  its  direction  
remains  the  same
n All  Eigen  vectors  of  a  matrix  are  
orthogonal
n Usually  Eigen  vectors  are  calculated  as  
unit  vectors:  magnitude  is  exactly  one

28
Eigen  Vector
✓ ✓◆ 3 ◆
3
2 2

pp 2 p p
2 = 13
32 +3 22+=2 13
✓ ✓p3/p◆13 ◆
3/p13 p
2/ 13
2/ 13

Eigen  Value
n In  both  those  examples,  the  amount  by  
which  the  original  vector  was  scaled  after  
multiplication  by  the  square  matrix  was  the  
same  
n Eigen  value  of  this  Eigen  vector  is  4

29
Principal  Components  Analysis
n Step  1:  Get  some  data
DATA:
x             y
2.5   2.4
0.5   0.7
2.2   2.9
1.9   2.2
3.1   3.0
2.3   2.7
2   1.6
1   1.1
1.5   1.6
1.1   0.9 59

n Step  2
ZERO  MEAN  DATA:
n Subtract  Mean
x   y        
.69   .49
-­1.31   -­1.21
.39   .99
.09   .29
1.29   1.09
.49   .79
.19   -­.31
-­.81   -­.81
-­.31   -­.31
-­.71   -­1.01

30
n Step  3
nCalculate  Covariance  matrix
cov =              .616555556        .615444444
.615444444        .716555556

n since  the  non-­diagonal  elements  in  this  


covariance  matrix  are  positive,  we  should  expect  
that  both  the  x  and  y  variable  increase  together.

n Step  4:
n Calculate  the  eigenvectors  and  
eigenvalues  of  the  covariance  matrix
eigenvalues  =  .0490833989
1.28402771
eigenvectors  =  -­.735178656      -­.677873399
.677873399    -­.735178656  

31
Eigen  Vectors

n Sign  of  the  Eigen  vector  does  not  really  


mater
n Vector  in  opposite  direction
n Sort  them  according  to  eigen value

64

32
Note:
• Note  they  are  perpendicular  to  each  other.
• Note  one  of  the  eigenvectors  goes  through  
the  middle  of  the  points,  like  drawing  a  line  
of  best  fit.  
• The  second  eigenvector  gives  us  the  
other,  less  important,  pattern  in  the  data,  
that  all  the  points  follow  the  main  line,  but  
are  off  to  the  side  of  the  main  line  by  some  
amount.

PCA  Example  –STEP  5


Now,  if  you  like,  you  can  decide  to  ignore  the  
components  of  lesser  significance.  

You  do  lose  some  information,  but  if  the  eigenvalues  are  
small,  you  don’t  lose  much

n n dimensions  in  your  data  


n calculate n eigenvectors  and  eigenvalues
n choose  only  the  first  k eigenvectors
n final  data  set  has  only  k dimensions.

33
PCA  Example  –STEP  5
n Feature  Vector
FeatureVector  =  (eig1 eig2 eig3   …  eign)
We  can  either  form  a  feature  vector  with  both  of  
the  eigenvectors:
-­.677873399        -­.735178656  
-­.735178656          .677873399  
or,  we  can  choose  to  leave  out  the  smaller,  less  
significant  component  and  only  have  a  single  
column:
-­ .677873399  
-­ .735178656
67

Eigen  Vectors

n Ureduce = U(:,1:k);
n z = Ureduce’*x; % Projected
data

34
PCA  Example  –STEP  5
n Deriving  the  new  data
FinalData =  RowFeatureVector x  RowZeroMeanData
RowFeatureVector is  the  matrix  with  the  eigenvectors  in  
the  columns  transposed  so  that  the  eigenvectors  are  
now  in  the  rows,  with  the  most  significant  eigenvector  at  
the  top
RowZeroMeanData is  the  mean-­adjusted  data  
transposed,  ie.  the  data  items  are  in  each  
column,  with  each  row  holding  a  separate  
dimension.

PCA  Example  –STEP  5


FinalData  transpose:  
dimensions  along  columns
x y
-­.827970186   -­.175115307
1.77758033   .142857227
-­.992197494   .384374989
-­.274210416   .130417207
-­1.67580142   -­.209498461
-­.912949103   .175282444
.0991094375   -­.349824698
1.14457216   .0464172582
.438046137   .0177646297
1.22382056   -­.162675287

35
PCA  Example  –STEP  5

Reconstruction  of  original  Data


n If  we  reduced  the  dimensionality,  obviously,  
when  reconstructing  the  data  we  would  lose  
those  dimensions  we  chose  to  discard.  In  our  
example  let  us  assume  that  we  considered  only  
the  x  dimension…

36
n Z  =  UT *  X
n X  =  (U-­1 *  Z)  +  originalMean

Reconstruction  of  original  Data

x
-­.827970186  
1.77758033  
-­.992197494  
-­.274210416  
-­1.67580142  
-­.912949103  
.0991094375  
1.14457216  
.438046137  
1.22382056

37
HOW TO  SELECT  K  PCS

PCs,  Variance  and  Least-­Squares

•   The  first  PC  retains  the  g reatest  a mount  o f  variation  in  the  sample

•  The  kth PC  retains  the  kth greatest  fraction  o f  the  variation  in  the  
sample

•  The  kth largest  e igenvalue  o f  the  correlation  matrix  C  is  the  variance  
in  the  sample  a long  the  kth PC

•  The  least-­squares  view:  P Cs  a re  a  series  o f  linear  least


squares  fits  to  a  sample,  e ach  o rthogonal  to  a ll  p revious  o nes

38
Dimensionality  Reduction

Can  ignore  the  components  of  lesser  significance.

Based  upon  %age  Variance  covered  by  the  PCs

2 3
s11 0 0 0 0
6 0 s22 0 0 0 7
6 7
S=6 .. 7
4 0 0 . 0 0 5
0 0 0 0 snn

Pk
sii
Pi=1
n 0.99
i=1 sii

39

Das könnte Ihnen auch gefallen