Cse291d 8

CSE291D Lecture 8
Latent Linear Models
1
Latent variable models
Z Latent variables
Parameters Φ X Observed data

Data
Points
Dimensionality(X) >> dimensionality(Z)

Z is a bottleneck, which finds a compressed, low-dimensional representation of X 2
Mixture models
Discrete
Z latent variables:
Cluster assignments

Data
Points
3
Latent linear models
Continuous
Z latent variables:
Embeddings

Data
Points
4
Example: embedding of cars
5
Learning outcomes
By the end of the lesson, you should be able to:
• Apply a variety of latent linear models to

solve modeling tasks
• Train these models using techniques such as

EM
6
Principal components analysis (PCA)
Data points K Data points
K
Features
Features
X W
7
as matrix factorization
K
Features
Features
X X| W
8
K
Features
Features
X X| W
• Flexible modeling via likelihood model f

• Corresponds to appropriate loss function
• Latent representations can control the semantics
• Priors, modeling of posterior uncertainty
• Missing data
• Principled extensions to the model 9
K
Features
Features
X X| W
Do something linear, do something non-linear, slap

on some probabilistic noise, and call it a day!
10
Gaussian mixture models
K
Features
Features
X X| W
Zi = binary 1-of-K indicator vector

Wj = cluster mean
f = Gaussian for each data point
11
Generalized linear models
Data points D Data points
D
y y| W
1
1
We observe one of the factors, X
W is a vector, not a whole matrix
12
Most of machine learning
as matrix factorization …
13
Social network models
as matrix factorization …
14
Deep neural networks
• Do something linear, do something non-linear, slap
on some probabilistic noise, and call it a day
do it again a few more times!
Figure from http://numericinsight.blogspot.com/2014/07/a-gentle-introduction-to-backpropagation.html 15

Continuous latent variables
K
Features
Features
X X| W
Zi = real-valued latent vector
16
17
18
Factor analysis
K
X| W
Features
Features
Gaussian prior on z:
Gaussian likelihood:
19
Factor analysis
Draw low-dimensional Linear mapping from Marginal distribution

representation z z-space to x-space. of x
“spray-can”
20
Observations, latent variables are
jointly Gaussian
21
Factor analysis is a low-rank
parameterization of an MVN
22
Probabilistic PCA
• A special case of factor analysis, where:
– the covariance of the likelihood is isotropic
(spherical),
– is constrained to be orthonormal (as in PCA)
1 23
Probabilistic PCA
• Theorem: As , the model approaches
PCA, in the sense that the MLE approaches
the solution to PCA,
24
Maximum likelihood for PPCA
• Log-likelihood:
• Incredibly, a closed form solution exists!

(Tipping and Bishop, 1999)
Arbitrary rotation matrix

25
Eigenvalues
• Log-likelihood:


26
Eigenvalues
• Log-likelihood:


27
Eigenvalues
28
EM for factor analysis
• The closed form solution
– can be expensive in high dimensions
– doesn’t work for general factor analysis
– doesn’t handle missing data
• Instead, we can use EM
29
• E-step: Complete data log-likelihood
• The trace trick
30
• The trace trick
31
• The trace trick
32
• The trace trick
33
• Need to compute first and second moments

of posterior for z’s, which is Gaussian,
34
• M-step: take derivatives and set to 0
35
Mixtures of factor analyzers
mixture proportions
qi = latent cluster
36
Mixtures of factor analyzers
1 FA Mixture of 10 FA’s
37
PCA for categorical data
Gaussian prior on z:
Discrete likelihood:
38
Logistic normal distribution
Diagonal covariance, Negative correlation Positive correlation

non-zero mean between bottom two between bottom two
states states
39
Independent component analysis (ICA)
• You are at a cocktail party and a bunch of
people are speaking at once
• You have two microphones (your ears!).
– Goal: separate the voices into different signals
40
Blind source separation
• Acoustic signal processing (e.g. Siri!)
• EEG data
• Financial data
41
PCA struggles for
blind source separation
42
PCA struggles for
PCA identifies linear subspace,

but cannot distinguish the
correct rotation
43
Independent component analysis
Time Sources Time
Sources
Z
Sensors
X| W
Sensors
• Key assumptions
– 1) the latent variables are independent
– 2) the prior on the latent variables is non-Gaussian, to break rotational symmetry
44
• Prior for latent variables
• A common choice is the following

heavy-tailed distribution:
• Corresponds to Gaussian, with a non-linear mapping,
45
• Likelihood: linear-Gaussian
• Often the noise level is chosen to be 0,

which simplifies analysis
46
PCA struggles for
47
Training ICA
• Gradient ascent (possible in noise-free case)
• Newton’s method
• Non model-based estimation principles
– Maximize non-Gaussianity
– Minimize mutual information
• Natural gradient
48
Training ICA
• Natural gradient ascent:
– Operate on “recognition weights”:
• Algorithm:
1. Put x through a linear mapping
2. Put a through a non-linear map
3. Put a back through W
4. Adjust the weights
49
Think-pair-share
• You are a financial analyst and you want to understand the
behavior of the stock market by finding interpretable latent
structure in stock prices (natural disasters, improving
economy, growth sectors,…). Design a latent linear modeling
system to accomplish this. Consider questions such as:
– Prior
– Likelihood
– Any constraints on the latent representation (sparse, binary, sums to
one,…)
– Data, preprocessing, post-processing
– How to evaluate the model
– Can you find a way to include time dependence?
50

Cse291d 8

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Cse291d 8

Hochgeladen von

Copyright:

Verfügbare Formate

CSE291D Lecture 8

Latent Linear Models

Parameters Φ X Observed data

Dimensionality(X) >> dimensionality(Z)

Parameters Φ X Observed data

Parameters Φ X Observed data

• Apply a variety of latent linear models to

• Train these models using techniques such as

• Flexible modeling via likelihood model f

Do something linear, do something non-linear, slap

Zi = binary 1-of-K indicator vector

Figure from http://numericinsight.blogspot.com/2014/07/a-gentle-introduction-to-backpropagation.html 15

Data points K Data points

Zi = real-valued latent vector

Draw low-dimensional Linear mapping from Marginal distribution

– is constrained to be orthonormal (as in PCA)

• Incredibly, a closed form solution exists!

Arbitrary rotation matrix

• Incredibly, a closed form solution exists!

Arbitrary rotation matrix

• Incredibly, a closed form solution exists!

Arbitrary rotation matrix

• Instead, we can use EM

• The trace trick

• The trace trick

• The trace trick

• The trace trick

• Need to compute first and second moments

Diagonal covariance, Negative correlation Positive correlation

PCA identifies linear subspace,

– 2) the prior on the latent variables is non-Gaussian, to break rotational symmetry

• A common choice is the following

• Corresponds to Gaussian, with a non-linear mapping,

• Often the noise level is chosen to be 0,

2. Put a through a non-linear map

3. Put a back through W

4. Adjust the weights

Das könnte Ihnen auch gefallen