Sie sind auf Seite 1von 26

Non-Negative Matrix

Factorization
Marshall Tappen
6.899
Problem Statement
Given a set of images:
1. Create a set of basis images that can be
linearly combined to create new images
2. Find the set of weights to reproduce
every input image from the basis images
One set of weights for each input image
3 ways to do this discussed
Vector Quantization
Principal Components Analysis
Non-negative Matrix Factorization

Each method optimizes a different aspect
Vector Quantization
The reconstructed image is the basis image
that is closest to the input image.
Whats wrong with VQ?
Limited by the number of basis images
Not very useful for analysis
PCA
Find a set of orthogonal basis images
The reconstructed image is a linear combination
of the basis images
What dont we like about PCA?
PCA involves adding up some basis
images then subtracting others
Basis images arent physically intuitive
Subtracting doesnt make sense in context
of some applications
How do you subtract a face?
What does subtraction mean in the context of
document classification?

Non-negative Matrix Factorization
Like PCA, except the coefficients in the linear
combination cannot be negative
NMF Basis Images
Only allowing adding of basis images
makes intuitive sense
Has physical analogue in neurons
Forcing the reconstruction coefficients to
be positive leads to nice basis images
To reconstruct images, all you can do is add
in more basis images
This leads to basis images that represent
parts

PCA vs NMF
PCA
Designed for
producing optimal (in
some sense) basis
images
Just because its
optimal doesnt mean
its good for your
application
NMF
Designed for
producing coefficients
with a specific property
Forcing coefficients to
behave induces nice
basis images
No SI unit for nice
The cool idea
By constraining the weights, we can
control how the basis images wind up

In this case, constraining the weights
leads to parts-based basis images
Objective function
Let the value of a pixel in an original input image be V. Let
(WH)
i
be the reconstructed pixel.
If we consider V to be a noisy version of (WH)
i
, then the
PDF of V is



Now we will maximize the log probability of this PDF over
W and H, leaving the relevant objective function to be:

( )
!
) ) ( | (
) (
V
e WH
WH V P
i
WH V
i
i

=
How do we derive the update
rules?
This is in the NIPS paper.
(Im going to change the error function to
to match the NIPS paper)
Use gradient descent to find a local minimum
The gradient descent update rule is:

2
WH V
Deriving Update Rules
Gradient Descent Rule:


Set


The update rule becomes
Whats significant about this?
This is a multiplicative update instead of
an additive update.
If the initial values of W and H are all non-
negative, then the W and H can never
become negative.
This lets us produce a non-negative
factorization
(See NIPS Paper for full proof that this will
converge)
How do we know that this will
converge?
If F is the objective function, let be G be an
auxiliary function


If G is an auxiliary function of F, then F is non-
increasing under the update

Auxiliary Function
How do we know that this will
converge?
Let the auxiliary function be


Then the update is


Which results in the update rule:
Main Contributions
Idea that representations which allow
negative weights do not make sense in
some applications
A simple system for producing basis
images with non-negative weights
Points out that this leads to basis images
that are based on parts
A larger point here is that by constraining the
problem in new ways, we can induce nice
properties
Mels Commentary
Most significant point:
NMFs non-negativity constraint is well-
matched to our intuitive ideas about
decomposition into parts
Second: Basis images are better
Third: Simple System
Mels Caveats
Comparison of NMF to PCA is subjective
Basis images dont necessarily correspond
to parts as we think of them.
Subtraction may actually occur in the brain
Some neurons are negative versions of
others

A Quick Review of Linear Algebra
Every vector can be expressed as the linear
combination of basis vectors



Can think of images as big vectors
(Raster scan image into vector)


This means we can express an image as the
linear combination of a set of basis images
(

+
(

=
(

+
(

=
(

2
6
0
3
1
2
6
2
or
1
0
6
0
1
2
6
2
Where do those update rules
come from???
How this is a matrix problem
Basis Images (W)
X
Reconstruction Weights (H)
Weights used to combine basis images
Into reconstructed pixel
The basis pixels that will be
combined into the reconstructed
pixel

Das könnte Ihnen auch gefallen