Sie sind auf Seite 1von 7

1

CSE446
DimensionalityReductionandPCA
Winter2012
SlidesadaptedfromCarlosGuestrin &LukeZettlemoyer
MachineLearning
Supervised Learning
Parametric
Reinforcement Learning
Unsupervised Learning
Non-parametric
2
Fri Kmeans&AgglomerativeClustering
Mon ExpectationMaximization(EM)
Wed PrincipleComponentAnalysis(PCA)
EM
Pick K random cluster
models
Alternate:
Assign data instances
another iterative clustering algorithm
Assign data instances
proportionately to different
models
Revise each cluster model
based on its (proportionately)
assigned points
Stop when no changes
GaussianMixtureExample:Start
Afterfirstiteration
After20thiteration
2
CanwedoEMwithhardassignments?
(univariate caseforsimplicity)
Iterate:Onthetth iterationletourestimatesbe

t
={
1
(t)
,
2
(t)

k
(t)
}
Estep
Computeexpectedclassesofalldatapoints
p y = i x
j
,
1
...
k ( )
exp
1
2o
2
x
j

i
2 |
\

|
.
| P y = i ( )
M-step
Compute most likely new s given class expectations

i
=
P y = i x
j ( )
j=1
m
x
j
P y = i x
j ( )
j=1
m


i
=
o y = i, x
j ( )
x
j
o y = i, x
j ( )
j=1
m

represents hard
assignment to
most likelyor
nearest cluster
Equivalent to k-means clustering algorithm!!!
Summary
Kmeansforclustering:
algorithm
convergesbecauseitscoordinateascent
Hierarchicalagglomerativeclustering
EMformixtureofGaussians:
Viewedascoordinateascentinparameterspace(
i
,E
i
,t
i
)
Howtolearnmaximumlikelihoodparameters(locallymax.like.)inthe
caseofunlabeleddata
RelationtoKmeans
Hard/softclustering
Probabilisticmodel
Remember,E.M.can(&does)getstuckinlocalminima
DimensionalityReduction
Inputdatamayhavethousandsormillionsofdimensions!
e.g.,imageshave???Andtextdatahas???,
D l S Wld Th J Cbl / WRF P f f C t Dnl S. Wld s Thms J . Cbl / WRF Prfssr f Cmptr
Scnc nd ngnrng t th nvrsty f Wshngtn. ftr gng t schl
t Phllps cdmy, h rcvd bchlr's dgrs n bth Cmptr
Scnc nd Bchmstry t Yl nvrsty n 1982.
Areallthosedimensions(letters,pixels)necessary?
DimensionalityReduction
Representdatawithfewerdimensions!
Easierlearning fewerparameters
|Features|>>|trainingexamples|??
Bettervisualization
Hardtounderstandmorethan3Dor4D
Discoverintrinsicdimensionalityofdata
Highdimensionaldatathatistrulylowerdimensional
FeatureSelection
Wanttolearnf:XY
X=<X
1
,,X
n
>
butsomefeaturesaremoreimportantthanothers
Approach:selectsubsetoffeaturestobeusedby
learningalgorithm
Score eachfeature(orsetsoffeatures)
Select setoffeatureswithbestscore
Greedyforward featureselectionalgorithm
Pickadictionaryoffeatures
e.g.,polynomialsforlinearregression
Greedy:Startfromempty(orsimple)setof
featuresF
0
=C
Run learning algorithm for current set of features F RunlearningalgorithmforcurrentsetoffeaturesF
t
Obtainh
t
SelectnextbestfeatureX
i
e.g., X
j
giving lowest heldout error when learning with F
t
{X
j
}
F
t+1
F
t
{X
i
}
Repeat
3
Greedybackward featureselectionalgorithm
Pickadictionaryoffeatures
e.g.,polynomialsforlinearregression
Greedy: StartwithallfeaturesF
0
=F
RunlearningalgorithmforcurrentsetoffeaturesF
t
Obt i h Obtainh
t
SelectnextworstfeatureX
i
X
j
giving lowest held outerror learner when learning with F
t
{X
j
}
F
t+1
F
t
{X
i
}
Repeat
Impactoffeatureselectionon
classificationoffMRIdata[Pereiraetal.05]
FeatureSelectionthroughRegularization
Previously,wediscussedregularizationwitha
squarednorm:
WhatifweusedanL
1
norminstead?
WhataboutL

?
Thesenormswork,butareharderto
optimize!And,itcanbetrickytoset!!!
Regularization
w
2
w
2
16
w
1
w
*
w
1
w
*
L1 L2
LowerDimensionalProjections
Ratherthanpickingasubsetofthefeatures,wecan
makenewones bycombiningexistingfeaturesx
1
x
n

Newfeaturesarelinearcombinationsofoldones
Reducesdimensionwhenk<n
Letsseethisintheunsupervisedsetting
justX,butnoY
Linearprojectionandreconstruction
x
2
project into
1-dimension
z
1
x
1
reconstruction: only know z
1
,
and projection weights, what
was (x
1
,x
2
)
4
Principalcomponentanalysis basicidea
Projectndimensionaldataintokdimensional
spacewhilepreservinginformation:
e.g.,projectspaceof10000wordsinto3
dimensions
e.g.,project3dinto2d
Chooseprojectionwithminimum
reconstructionerror
Linearprojections,areview
Projectapointintoa(lowerdimensional)
space:
point:x=(x
1
,,x
n
)
selectabasis setofunit(length1)basisvectors
( ) (u
1
,,u
k
)
weconsiderorthonormalbasis:
u
i
-u
i
=1,andu
i
-u
j
=0fori=j
selectacenter x, definesoffsetofspace
bestcoordinatesinlowerdimensionalspace
definedbydotproducts:(z
1
,,z
k
),z
i
=(xx)-u
i
3DData
21
Projectionto2D
22
PCAfindsprojectionthatminimizes
reconstructionerror
Givenmdatapoints:x
i
=(x
1
i
,,x
n
i
),i=1m
Willrepresenteachpointasaprojection:
x
2
u
1
PCA:
Givenk<n,find(u
1
,,u
k
)
minimizingreconstructionerror:
x
1
Understandingthe
reconstructionerror
Notethatx
i
canberepresented
exactlybyndimensionalprojection:
Rewriting error:
Given k<n, find (u
1
,,u
k
)
minimizing reconstruction error:
Rewritingerror:
Aha! Error is sumof
squared weights that would
have be used for
dimensions that are cut!!!!
5
Reconstructionerrorandcovariancematrix
Now, to find the u
j
, we minimize:
Lagrange multiplier Lagrange multiplier
to ensure
orthonormal
Take derivative, set equal to 0,
, solutions are eigenvectors
Minimizingreconstructionerrorand
eigen vectors
Minimizingreconstructionerrorequivalentto
pickingorthonormalbasis(u
1
,,u
n
)minimizing:
Solutions:eigen vectors
So,minimizingreconstructionerrorequivalentto
picking(u
k+1
,,u
n
)tobeeigen vectorswithsmallest
eigen values
And,ourprojectionshouldbeontothe(u
1
,,u
k
)with
thelargestvalues
BasicPCAalgorithm
StartfrommbyndatamatrixX
Recenter:subtractmeanfromeachrowofX
X
c
X X
Compute covariance matrix: Computecovariance matrix:
E 1/m X
c
T
X
c
Findeigen vectorsandvaluesofE
Principalcomponents: keigen vectorswith
highesteigen values
PCAexample
Data: Projection: Reconstruction:
Handwriting
29
TwoPrincipleEigenvectors
30
6
Eigenfaces [Turk,Pentland 91]
Inputimages:
Nimages
Each5050pixels
2500features
Misleadingfigure.
BesttothinkofasanN 2500 matrix: |Examples| |Features|
ReduceDimensionality250015
Average Average
face face
First principal component First principal component
Other Other
components components
For all except average, For all except average,
gray=0, gray=0,
white>0, white>0,
black<0 black<0
UsingPCAforCompression
Storeeachfaceascoefficientsofprojection
ontofirstfewprincipalcomponents

=
=
max
0
i
Eigenface image
i
i
i
a
UsingPCAforRecognition
Computeprojectionsoftargetimage,
comparetodatabase(nearestneighbor
classifier)
Eigenfaces reconstruction
Eachimagecorrespondstoaddingtogether
theprincipalcomponents:
Scalingup
Covariancematrixcanbereallybig!
E isnn
10000featurescanbecommon!
Findingeigenvectorsisveryslow d g e ge ecto s s e y s o
Usesingularvaluedecomposition(SVD)
Findstopkeigenvectors
Greatimplementationsavailable,
e.g.,Matlab svd
SVD
WriteX=WSV
T
X datamatrix,onerowperdatapoint
W weightmatrix
onerowperdatapoint coordinateofx
i
ineigenspace p p g p
S singularvaluematrix,diagonalmatrix
inoursettingeachentryiseigenvalue
j
V
T
singularvectormatrix
inoursettingeachrowiseigenvectorv
j
7
SVD
notationchange
T
1
0 0
0 0
|
|
|
|

|
|
|
|
|

|
|
|
|
|
|

|
=
|
|
|
|
|

|
V U A
w

Treatasblackbox:codewidelyavailable
InMatlab:[U,W,V]=svd(A,0)
0 0
|
.

\
|
.

\
|
|
|
.

\
|
|
|
.

\
n
w
PCAusingSVDalgorithm
StartfrommbyndatamatrixX
Recenter:subtractmeanfromeachrowofX
X
c
X X
CallSVDalgorithmonX
c
askforksingular g
c
g
vectors
Principalcomponents: ksingularvectorswith
highestsingularvalues(rowsofV
T
)
Coefficients: projecteachpointontothenewvectors
SingularValueDecomposition(SVD)
Handymathematicaltechniquethathas
applicationtomanyproblems
Givenanymn matrixA,algorithmtofind
matrices U V and W such that matricesU,V,andW suchthat
A =UWV
T
U ismn andorthonormal
W isnn anddiagonal
V isnn andorthonormal
SVD
Thew
i
arecalledthesingularvalues ofA
IfA issingular,someofthew
i
willbe0
Ingeneralrank(A)=numberofnonzerow
i
SVDismostlyunique(uptopermutationof
singularvalues,orifsomew
i
areequal)
Whatyouneedtoknow
Dimensionalityreduction
whyandwhenitsimportant
Simplefeatureselection
Regularizationasatypeoffeatureselection
Principalcomponentanalysis
minimizingreconstructionerror
relationshiptocovariancematrixandeigenvectors
usingSVD

Das könnte Ihnen auch gefallen