Sie sind auf Seite 1von 50

CSE291D Lecture 8

Latent Linear Models

1
Latent variable models

Z Latent variables

Parameters Φ X Observed data


Data
Points

Dimensionality(X) >> dimensionality(Z)


Z is a bottleneck, which finds a compressed, low-dimensional representation of X 2
Mixture models

Discrete
Z latent variables:
Cluster assignments

Parameters Φ X Observed data


Data
Points

3
Latent linear models

Continuous
Z latent variables:
Embeddings

Parameters Φ X Observed data


Data
Points

4
Example: embedding of cars

5
Learning outcomes
By the end of the lesson, you should be able to:

• Apply a variety of latent linear models to


solve modeling tasks

• Train these models using techniques such as


EM

6
Principal components analysis (PCA)
Data points K Data points

K
Features

Features
X W

7
Latent variable models
as matrix factorization
Data points K Data points

K
Features
Features

X X| W

8
Latent variable models
as matrix factorization
Data points K Data points

K
Features
Features

X X| W

• Flexible modeling via likelihood model f


• Corresponds to appropriate loss function
• Latent representations can control the semantics
• Priors, modeling of posterior uncertainty
• Missing data
• Principled extensions to the model 9
Latent variable models
as matrix factorization
Data points K Data points

K
Features
Features

X X| W

Do something linear, do something non-linear, slap


on some probabilistic noise, and call it a day!

10
Gaussian mixture models
as matrix factorization
Data points K Data points

K
Features
Features

X X| W

Zi = binary 1-of-K indicator vector


Wj = cluster mean
f = Gaussian for each data point

11
Generalized linear models
as matrix factorization
Data points D Data points

D
y y| W
1

1
We observe one of the factors, X
W is a vector, not a whole matrix

12
Most of machine learning
as matrix factorization …

13
Social network models
as matrix factorization …

14
Deep neural networks
• Do something linear, do something non-linear, slap
on some probabilistic noise, and call it a day
do it again a few more times!

Figure from http://numericinsight.blogspot.com/2014/07/a-gentle-introduction-to-backpropagation.html 15


Continuous latent variables

Data points K Data points

K
Features
Features

X X| W

Zi = real-valued latent vector

16
17
18
Factor analysis
Data points K Data points

K
X| W

Features
Features

Gaussian prior on z:

Gaussian likelihood:

19
Factor analysis

Draw low-dimensional Linear mapping from Marginal distribution


representation z z-space to x-space. of x

“spray-can”

20
Observations, latent variables are
jointly Gaussian

21
Factor analysis is a low-rank
parameterization of an MVN

22
Probabilistic PCA
• A special case of factor analysis, where:
– the covariance of the likelihood is isotropic
(spherical),

– is constrained to be orthonormal (as in PCA)

1 23
Probabilistic PCA
• Theorem: As , the model approaches
PCA, in the sense that the MLE approaches
the solution to PCA,

24
Maximum likelihood for PPCA
• Log-likelihood:

• Incredibly, a closed form solution exists!


(Tipping and Bishop, 1999)

Arbitrary rotation matrix


25
Eigenvalues
Maximum likelihood for PPCA
• Log-likelihood:

• Incredibly, a closed form solution exists!


(Tipping and Bishop, 1999)

Arbitrary rotation matrix


26
Eigenvalues
Maximum likelihood for PPCA
• Log-likelihood:

• Incredibly, a closed form solution exists!


(Tipping and Bishop, 1999)

Arbitrary rotation matrix


27
Eigenvalues
28
EM for factor analysis
• The closed form solution
– can be expensive in high dimensions
– doesn’t work for general factor analysis
– doesn’t handle missing data

• Instead, we can use EM

29
EM for factor analysis
• E-step: Complete data log-likelihood

• The trace trick

30
EM for factor analysis
• E-step: Complete data log-likelihood

• The trace trick

31
EM for factor analysis
• E-step: Complete data log-likelihood

• The trace trick

32
EM for factor analysis
• E-step: Complete data log-likelihood

• The trace trick

33
EM for factor analysis
• E-step: Complete data log-likelihood

• Need to compute first and second moments


of posterior for z’s, which is Gaussian,

34
EM for factor analysis
• M-step: take derivatives and set to 0

35
Mixtures of factor analyzers

mixture proportions

qi = latent cluster

36
Mixtures of factor analyzers

1 FA Mixture of 10 FA’s

37
PCA for categorical data

Gaussian prior on z:

Discrete likelihood:

38
Logistic normal distribution

Diagonal covariance, Negative correlation Positive correlation


non-zero mean between bottom two between bottom two
states states

39
Independent component analysis (ICA)
• You are at a cocktail party and a bunch of
people are speaking at once
• You have two microphones (your ears!).
– Goal: separate the voices into different signals

40
Blind source separation
• Acoustic signal processing (e.g. Siri!)
• EEG data
• Financial data

41
PCA struggles for
blind source separation

42
PCA struggles for
blind source separation

PCA identifies linear subspace,


but cannot distinguish the
correct rotation

43
Independent component analysis
Time Sources Time

Sources
Z

Sensors
X| W
Sensors

• Key assumptions
– 1) the latent variables are independent

– 2) the prior on the latent variables is non-Gaussian, to break rotational symmetry

44
Independent component analysis
• Prior for latent variables

• A common choice is the following


heavy-tailed distribution:

• Corresponds to Gaussian, with a non-linear mapping,

45
Independent component analysis
• Likelihood: linear-Gaussian

• Often the noise level is chosen to be 0,


which simplifies analysis

46
PCA struggles for
blind source separation

47
Training ICA
• Gradient ascent (possible in noise-free case)
• Newton’s method
• Non model-based estimation principles
– Maximize non-Gaussianity
– Minimize mutual information

• Natural gradient

48
Training ICA
• Natural gradient ascent:
– Operate on “recognition weights”:
• Algorithm:
1. Put x through a linear mapping

2. Put a through a non-linear map

3. Put a back through W

4. Adjust the weights

49
Think-pair-share
• You are a financial analyst and you want to understand the
behavior of the stock market by finding interpretable latent
structure in stock prices (natural disasters, improving
economy, growth sectors,…). Design a latent linear modeling
system to accomplish this. Consider questions such as:

– Prior
– Likelihood
– Any constraints on the latent representation (sparse, binary, sums to
one,…)
– Data, preprocessing, post-processing
– How to evaluate the model
– Can you find a way to include time dependence?

50

Das könnte Ihnen auch gefallen