Beruflich Dokumente
Kultur Dokumente
1
Latent variable models
Z Latent variables
Discrete
Z latent variables:
Cluster assignments
3
Latent linear models
Continuous
Z latent variables:
Embeddings
4
Example: embedding of cars
5
Learning outcomes
By the end of the lesson, you should be able to:
6
Principal components analysis (PCA)
Data points K Data points
K
Features
Features
X W
7
Latent variable models
as matrix factorization
Data points K Data points
K
Features
Features
X X| W
8
Latent variable models
as matrix factorization
Data points K Data points
K
Features
Features
X X| W
K
Features
Features
X X| W
10
Gaussian mixture models
as matrix factorization
Data points K Data points
K
Features
Features
X X| W
11
Generalized linear models
as matrix factorization
Data points D Data points
D
y y| W
1
1
We observe one of the factors, X
W is a vector, not a whole matrix
12
Most of machine learning
as matrix factorization …
13
Social network models
as matrix factorization …
14
Deep neural networks
• Do something linear, do something non-linear, slap
on some probabilistic noise, and call it a day
do it again a few more times!
K
Features
Features
X X| W
16
17
18
Factor analysis
Data points K Data points
K
X| W
Features
Features
Gaussian prior on z:
Gaussian likelihood:
19
Factor analysis
“spray-can”
20
Observations, latent variables are
jointly Gaussian
21
Factor analysis is a low-rank
parameterization of an MVN
22
Probabilistic PCA
• A special case of factor analysis, where:
– the covariance of the likelihood is isotropic
(spherical),
1 23
Probabilistic PCA
• Theorem: As , the model approaches
PCA, in the sense that the MLE approaches
the solution to PCA,
24
Maximum likelihood for PPCA
• Log-likelihood:
29
EM for factor analysis
• E-step: Complete data log-likelihood
30
EM for factor analysis
• E-step: Complete data log-likelihood
31
EM for factor analysis
• E-step: Complete data log-likelihood
32
EM for factor analysis
• E-step: Complete data log-likelihood
33
EM for factor analysis
• E-step: Complete data log-likelihood
34
EM for factor analysis
• M-step: take derivatives and set to 0
35
Mixtures of factor analyzers
mixture proportions
qi = latent cluster
36
Mixtures of factor analyzers
1 FA Mixture of 10 FA’s
37
PCA for categorical data
Gaussian prior on z:
Discrete likelihood:
38
Logistic normal distribution
39
Independent component analysis (ICA)
• You are at a cocktail party and a bunch of
people are speaking at once
• You have two microphones (your ears!).
– Goal: separate the voices into different signals
40
Blind source separation
• Acoustic signal processing (e.g. Siri!)
• EEG data
• Financial data
41
PCA struggles for
blind source separation
42
PCA struggles for
blind source separation
43
Independent component analysis
Time Sources Time
Sources
Z
Sensors
X| W
Sensors
• Key assumptions
– 1) the latent variables are independent
44
Independent component analysis
• Prior for latent variables
45
Independent component analysis
• Likelihood: linear-Gaussian
46
PCA struggles for
blind source separation
47
Training ICA
• Gradient ascent (possible in noise-free case)
• Newton’s method
• Non model-based estimation principles
– Maximize non-Gaussianity
– Minimize mutual information
• Natural gradient
48
Training ICA
• Natural gradient ascent:
– Operate on “recognition weights”:
• Algorithm:
1. Put x through a linear mapping
49
Think-pair-share
• You are a financial analyst and you want to understand the
behavior of the stock market by finding interpretable latent
structure in stock prices (natural disasters, improving
economy, growth sectors,…). Design a latent linear modeling
system to accomplish this. Consider questions such as:
– Prior
– Likelihood
– Any constraints on the latent representation (sparse, binary, sums to
one,…)
– Data, preprocessing, post-processing
– How to evaluate the model
– Can you find a way to include time dependence?
50