19 Pca

1
CSE446
DimensionalityReductionandPCA
Winter2012
SlidesadaptedfromCarlosGuestrin &LukeZettlemoyer
MachineLearning
Supervised Learning
Parametric
Reinforcement Learning
Unsupervised Learning
Non-parametric
2
Fri Kmeans&AgglomerativeClustering
Mon ExpectationMaximization(EM)
Wed PrincipleComponentAnalysis(PCA)
EM
Pick K random cluster
models
Alternate:
Assign data instances
another iterative clustering algorithm
Assign data instances
proportionately to different
models
Revise each cluster model
based on its (proportionately)
assigned points
Stop when no changes
GaussianMixtureExample:Start
Afterfirstiteration
After20thiteration
2
CanwedoEMwithhardassignments?
(univariate caseforsimplicity)
Iterate:Onthetth iterationletourestimatesbe
t
={
1
(t)
,
2
(t)

k
(t)
}
Estep
Computeexpectedclassesofalldatapoints
p y = i x
j
,
1
...
k ( )
exp
1
2o
2
x
j

i
2 |
\

|
.
| P y = i ( )
M-step
Compute most likely new s given class expectations
i
=
P y = i x
j ( )
j=1
m
x
j
P y = i x
j ( )
j=1
m

i
=
o y = i, x
j ( )
x
j
o y = i, x
j ( )
j=1
m
represents hard
assignment to
most likelyor
nearest cluster
Equivalent to k-means clustering algorithm!!!
Summary
Kmeansforclustering:
algorithm
convergesbecauseitscoordinateascent
Hierarchicalagglomerativeclustering
EMformixtureofGaussians:
Viewedascoordinateascentinparameterspace(
i
,E
i
,t
i
)
Howtolearnmaximumlikelihoodparameters(locallymax.like.)inthe
caseofunlabeleddata
RelationtoKmeans
Hard/softclustering
Probabilisticmodel
Remember,E.M.can(&does)getstuckinlocalminima
DimensionalityReduction
Inputdatamayhavethousandsormillionsofdimensions!
e.g.,imageshave???Andtextdatahas???,
D l S Wld Th J Cbl / WRF P f f C t Dnl S. Wld s Thms J . Cbl / WRF Prfssr f Cmptr
Scnc nd ngnrng t th nvrsty f Wshngtn. ftr gng t schl
t Phllps cdmy, h rcvd bchlr's dgrs n bth Cmptr
Scnc nd Bchmstry t Yl nvrsty n 1982.
Areallthosedimensions(letters,pixels)necessary?
DimensionalityReduction
Representdatawithfewerdimensions!
Easierlearning fewerparameters
|Features|>>|trainingexamples|??
Bettervisualization
Hardtounderstandmorethan3Dor4D
Discoverintrinsicdimensionalityofdata
Highdimensionaldatathatistrulylowerdimensional
FeatureSelection
Wanttolearnf:XY
X=<X
1
,,X
n
>
butsomefeaturesaremoreimportantthanothers
Approach:selectsubsetoffeaturestobeusedby
learningalgorithm
Score eachfeature(orsetsoffeatures)
Select setoffeatureswithbestscore
Greedyforward featureselectionalgorithm
Pickadictionaryoffeatures
e.g.,polynomialsforlinearregression
Greedy:Startfromempty(orsimple)setof
featuresF
0
=C
Run learning algorithm for current set of features F RunlearningalgorithmforcurrentsetoffeaturesF
t
Obtainh
t
SelectnextbestfeatureX
i
e.g., X
j
giving lowest heldout error when learning with F
t
{X
j
}
F
t+1
F
t
{X
i
}
Repeat
3
Greedybackward featureselectionalgorithm
Pickadictionaryoffeatures
e.g.,polynomialsforlinearregression
Greedy: StartwithallfeaturesF
0
=F
RunlearningalgorithmforcurrentsetoffeaturesF
t
Obt i h Obtainh
t
SelectnextworstfeatureX
i
X
j
giving lowest held outerror learner when learning with F
t
{X
j
}
F
t+1
F
t
{X
i
}
Repeat
Impactoffeatureselectionon
classificationoffMRIdata[Pereiraetal.05]
FeatureSelectionthroughRegularization
Previously,wediscussedregularizationwitha
squarednorm:
WhatifweusedanL
1
norminstead?
WhataboutL
?
Thesenormswork,butareharderto
optimize!And,itcanbetrickytoset!!!
Regularization
w
2
w
2
16
w
1
w
*
w
1
w
*
L1 L2
LowerDimensionalProjections
Ratherthanpickingasubsetofthefeatures,wecan
makenewones bycombiningexistingfeaturesx
1
x
n
Newfeaturesarelinearcombinationsofoldones
Reducesdimensionwhenk<n
Letsseethisintheunsupervisedsetting
justX,butnoY
Linearprojectionandreconstruction
x
2
project into
1-dimension
z
1
x
1
reconstruction: only know z
1
,
and projection weights, what
was (x
1
,x
2
)
4
Principalcomponentanalysis basicidea
Projectndimensionaldataintokdimensional
spacewhilepreservinginformation:
e.g.,projectspaceof10000wordsinto3
dimensions
e.g.,project3dinto2d
Chooseprojectionwithminimum
reconstructionerror
Linearprojections,areview
Projectapointintoa(lowerdimensional)
space:
point:x=(x
1
,,x
n
)
selectabasis setofunit(length1)basisvectors
( ) (u
1
,,u
k
)
weconsiderorthonormalbasis:
u
i
-u
i
=1,andu
i
-u
j
=0fori=j
selectacenter x, definesoffsetofspace
bestcoordinatesinlowerdimensionalspace
definedbydotproducts:(z
1
,,z
k
),z
i
=(xx)-u
i
3DData
21
Projectionto2D
22
PCAfindsprojectionthatminimizes
reconstructionerror
Givenmdatapoints:x
i
=(x
1
i
,,x
n
i
),i=1m
Willrepresenteachpointasaprojection:
x
2
u
1
PCA:
Givenk<n,find(u
1
,,u
k
)
minimizingreconstructionerror:
x
1
Understandingthe
reconstructionerror
Notethatx
i
canberepresented
exactlybyndimensionalprojection:
Rewriting error:
Given k<n, find (u
1
,,u
k
)
minimizing reconstruction error:
Rewritingerror:
Aha! Error is sumof
squared weights that would
have be used for
dimensions that are cut!!!!
5
Reconstructionerrorandcovariancematrix
Now, to find the u
j
, we minimize:
Lagrange multiplier Lagrange multiplier
to ensure
orthonormal
Take derivative, set equal to 0,
, solutions are eigenvectors
Minimizingreconstructionerrorand
eigen vectors
Minimizingreconstructionerrorequivalentto
pickingorthonormalbasis(u
1
,,u
n
)minimizing:
Solutions:eigen vectors
So,minimizingreconstructionerrorequivalentto
picking(u
k+1
,,u
n
)tobeeigen vectorswithsmallest
eigen values
And,ourprojectionshouldbeontothe(u
1
,,u
k
)with
thelargestvalues
BasicPCAalgorithm
StartfrommbyndatamatrixX
Recenter:subtractmeanfromeachrowofX
X
c
X X
Compute covariance matrix: Computecovariance matrix:
E 1/m X
c
T
X
c
Findeigen vectorsandvaluesofE
Principalcomponents: keigen vectorswith
highesteigen values
PCAexample
Data: Projection: Reconstruction:
Handwriting
29
TwoPrincipleEigenvectors
30
6
Eigenfaces [Turk,Pentland 91]
Inputimages:
Nimages
Each5050pixels
2500features
Misleadingfigure.
BesttothinkofasanN 2500 matrix: |Examples| |Features|
ReduceDimensionality250015
Average Average
face face
First principal component First principal component
Other Other
components components
For all except average, For all except average,
gray=0, gray=0,
white>0, white>0,
black<0 black<0
UsingPCAforCompression
Storeeachfaceascoefficientsofprojection
ontofirstfewprincipalcomponents
=
=
max
0
i
Eigenface image
i
i
i
a
UsingPCAforRecognition
Computeprojectionsoftargetimage,
comparetodatabase(nearestneighbor
classifier)
Eigenfaces reconstruction
Eachimagecorrespondstoaddingtogether
theprincipalcomponents:
Scalingup
Covariancematrixcanbereallybig!
E isnn
10000featurescanbecommon!
Findingeigenvectorsisveryslow d g e ge ecto s s e y s o
Usesingularvaluedecomposition(SVD)
Findstopkeigenvectors
Greatimplementationsavailable,
e.g.,Matlab svd
SVD
WriteX=WSV
T
X datamatrix,onerowperdatapoint
W weightmatrix
onerowperdatapoint coordinateofx
i
ineigenspace p p g p
S singularvaluematrix,diagonalmatrix
inoursettingeachentryiseigenvalue
j
V
T
singularvectormatrix
inoursettingeachrowiseigenvectorv
j
7
SVD
notationchange
T
1
0 0
0 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=
|
|
|
|
|
|
V U A
w
Treatasblackbox:codewidelyavailable
InMatlab:[U,W,V]=svd(A,0)
0 0
|
.
\
|
.
\
|
|
|
.
\
|
|
|
.
\
n
w
PCAusingSVDalgorithm
StartfrommbyndatamatrixX
Recenter:subtractmeanfromeachrowofX
X
c
X X
CallSVDalgorithmonX
c
askforksingular g
c
g
vectors
Principalcomponents: ksingularvectorswith
highestsingularvalues(rowsofV
T
)
Coefficients: projecteachpointontothenewvectors
SingularValueDecomposition(SVD)
Handymathematicaltechniquethathas
applicationtomanyproblems
Givenanymn matrixA,algorithmtofind
matrices U V and W such that matricesU,V,andW suchthat
A =UWV
T
U ismn andorthonormal
W isnn anddiagonal
V isnn andorthonormal
SVD
Thew
i
arecalledthesingularvalues ofA
IfA issingular,someofthew
i
willbe0
Ingeneralrank(A)=numberofnonzerow
i
SVDismostlyunique(uptopermutationof
singularvalues,orifsomew
i
areequal)
Whatyouneedtoknow
Dimensionalityreduction
whyandwhenitsimportant
Simplefeatureselection
Regularizationasatypeoffeatureselection
Principalcomponentanalysis
minimizingreconstructionerror
relationshiptocovariancematrixandeigenvectors
usingSVD

19 Pca

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

19 Pca

Hochgeladen von

Copyright:

Verfügbare Formate

1

Das könnte Ihnen auch gefallen