Bayesian Learning Lecture Captures Key Concepts

B AYESIAN L EARNING - L ECTURE 4
Mattias Villani
Division of Statistics and Machine Learning

Department of Computer and Information Science
Linköping University
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 1 / 14

L ECTURE OVERVIEW
I Prediction
I Normal model
I More complex examples
I Decision theory
I The elements of a decision problem
I The Bayesian way
I Point estimation as a decision problem

P REDICTION /F ORECASTING
I Posterior predictive distribution for future ỹ given observed data y
Z
p (ỹ |y ) = p (ỹ |θ, y )p (θ |y )d θ
θ
I If p (ỹ |θ, y ) = p (ỹ |θ ) [not true for time series], then

Z
p (ỹ |y ) = p (ỹ |θ )p (θ |y )d θ
θ
I The parameter uncertainty is represented in p (ỹ |y ) by averaging

over p (θ |y ).

P REDICTION - N ORMAL DATA , KNOWN VARIANCE
I Under the uniform prior p (θ ) ∝ c, then
Z
p (ỹ |y ) = p (ỹ |θ )p (θ |y )d θ
θ
where
θ |y ∼ N (ȳ , σ2 /n)
ỹ |θ ∼ N (θ, σ2 )

P REDICTION - N ORMAL DATA , KNOWN VARIANCE
I Under the uniform prior p (θ ) ∝ c, then
Z
p (ỹ |y ) = p (ỹ |θ )p (θ |y )d θ
θ
where
θ |y ∼ N (ȳ , σ2 /n)
ỹ |θ ∼ N (θ, σ2 )
1. Generate a posterior draw of θ (θ (1) ) from N (ȳ , σ2 /n )

2. Generate a draw of ỹ (ỹ (1) ) from N (θ (1) , σ2 ) (note the mean)
3. Repeat steps 1 and 2 a large number of times (N) with the result:
I Sequence of posterior draws: θ (1) , ...., θ (N )
I Sequence of predictive draws: ỹ (1) , ..., ỹ (N ) .

P REDICTIVE DISTRIBUTION - N ORMAL MODEL AND
UNIFORM PRIOR
I θ (1) = ȳ + ε(1) , where ε(1) ∼ N (0, σ2 /n ). (Step 1).

I ỹ (1) = θ (1) + υ(1) , where υ(1) ∼ N (0, σ2 ). (Step 2).
I ỹ (1) = ȳ + ε(1) + υ(1) .
I ε(1) and υ(1) are independent.
I The sum of two normal random variables is normal so
E (ỹ |y ) = ȳ
σ2

2 2 1
V (ỹ |y ) = +σ = σ 1+
n n

1
ỹ |y ∼ N ȳ , σ2 1 +
n

P REDICTIVE DISTRIBUTION - N ORMAL MODEL AND
NORMAL PRIOR
I It easy to see that the predictive distribution is normal.
I The mean can be obtained from
Eỹ |θ (ỹ ) = θ
and then remove the conditioning on θ by averaging over θ
E (ỹ |y ) = Eθ |y (θ ) = µn (Posterior mean of θ).
I The predictive variance of ỹ (conditional variance formula):
V (ỹ |y ) = Eθ |y [Vỹ |θ (ỹ )] + Vθ |y [Eỹ |θ (ỹ )]
= Eθ |y ( σ 2 ) + Vθ |y ( θ )
= σ2 + τn2
= (Population variance + Posterior variance of θ).
I In summary:
ỹ |y ∼ N (µn , σ2 + τn2 ).
B AYESIAN PREDICTION IN MORE COMPLEX MODELS
I Autoregressive process
iid
yt = φ1 (yt −1 − µ) + ... + φp (yt −p − µ) + ε t , ε t ∼ N (0, σ2 )
I Simulate a draw from p (φ1 , φ2 , ..., φp , µ, σ|y )
(1) (1) (1)
I Conditional on that draw θ (1) = (φ1 , φ2 , ..., φp , µ(1) , σ(1) ),
simulate
I ỹT +1 ∼p (yT +1 |yT , yT −1 , ..., yT −p , θ (1) )
I ỹT +2 ∼ p (yT +2 |ỹT +1 , yT , ..., yT −p , θ (1) )
I and so on.
I Repeat for new θ draws.

B AYESIAN PREDICTION IN MORE COMPLEX MODELS
I Autoregressive process
iid
yt = φ1 (yt −1 − µ) + ... + φp (yt −p − µ) + ε t , ε t ∼ N (0, σ2 )
I Simulate a draw from p (φ1 , φ2 , ..., φp , µ, σ|y )
(1) (1) (1)
I Conditional on that draw θ (1) = (φ1 , φ2 , ..., φp , µ(1) , σ(1) ),
simulate
I ỹT +1 ∼p (yT +1 |yT , yT −1 , ..., yT −p , θ (1) )
I ỹT +2 ∼ p (yT +2 |ỹT +1 , yT , ..., yT −p , θ (1) )
I and so on.
I Repeat for new θ draws.
I Regression trees.
I Uncertainty on which variables to split on, and the split point.
I For given draw of splitting variables and split points, simulate a
response. Repeat for many different draws.
P REDICTING AUCTION PRICES ON E B AY
I Problem: Predicting the auctioned price in eBay coin auctions.
I Data: Bid from 1000 auctions on eBay.

I The highest bid is not observed.
I The lowest bids are also not observed because of the seller’s reservation
price.
I Covariates: auction-specific, e.g. Book value from catalog, seller’s

reservation price, quality of sold object, rating of seller, powerseller,
verified seller ID etc
I Buyers are strategic. Their bids does not fully reflect their valuation.
Game theory. Very complicated likelihood.

S IMULATING AUCTION PRICES ON E B AY, CONT.
I A draw from the posterior predictive distibution of an auction’s price:
1. Simulate a draw θ (1) from the posterior of the model parameters θ

(using MCMC)
2. Simulate the number of bidders conditional on θ (Poisson process)
3. Simulate the bidders’ valuations.
4. Simulate a complete auction bid sequence, b(1) , conditional on the
valuations and θ = θ (1) .
5. For the bid sequence b(1) , return the next to largest bid (eBay’s proxy
bidding system).

P REDICTING AUCTION PRICES ON E B AY, CONT.
Pr(No bids) = 0.014 Pr(Price = No bids) = 0.042

Pr(Price = reservation) = 0.067
0.4 0.2
0.18
0.35
0.16
0.3
0.14
0.25
0.12
Density
0.2 0.1
0.08
0.15
0.06
0.1
0.04
0.05
0.02
0 0
4 6 8 10 12 14 16 18 25 30 35 40 45 50
Auction price

D ECISION T HEORY
I Let θ be an unknown quantity. State of nature. Examples: Future
inflation, Global temperature, Disease.
I Let a ∈ A be an action. Ex: Interest rate, Energy tax, Surgery.
I Choosing action a when state of nature turns out to be θ gives utility
U (a, θ )
I Alternatively loss L(a, θ ) = −U (a, θ ).
θ1 θ2
I Loss table: a1 L(a1 , θ1 ) L ( a1 , θ 2 )
a2 L(a2 , θ1 ) L ( a2 , θ 2 )
Rainy Sunny
I Example: Umbrella 20 10
No umbrella 50 0
D ECISION T HEORY, CONT.
I Example loss functions when both a and θ are continuous:

I Linear: L(a, θ ) = |a − θ |
I Quadratic: L(a, θ ) = (a − θ )2
I Lin-Lin: (
c1 · | a − θ | if a ≤ θ
L(a, θ ) =
c2 · | a − θ | if a > θ
I Example:
I θ is the number of items demanded of a product
I a is the number of items in stock
I Utility
(
p · θ − c1 (a − θ ) if a > θ [too much stock]
U (a, θ ) =
p · a − c2 (θ − a)2 if a ≤ θ [too little stock]

O PTIMAL DECISION
I Ad hoc decision rules:
I Minimax. Choose the decision that minimizes the maximum loss.
I Minimax-regret ... bla bla bla ...
I Bayesian theory: Just maximize the posterior expected utility:
abayes = argmaxa∈A Ep (θ |y ) [U (a, θ )],
where Ep (θ |y ) denotes the posterior expectation.
I Using simulated draws θ (1) ,θ (2) , ..., θ (N ) from p (θ|y ) :
N
Ep (θ |y ) [U (a, θ )] ≈ N −1 ∑ U (a, θ (i ) )
i =1
I Separation principle:
1. First obtain p (θ |y )
2. then form U (a, θ ) and finally
3. choose a that maximes Ep (θ |y ) [U (a, θ )].
C HOOSING A POINT ESTIMATE IS A DECISION
I Choosing a point estimator is a decision problem.
I Which to choose: posterior median, mean or mode?
I It depends on your loss function:

I Linear loss → Posterior median is optimal
I Quadratic loss → Posterior mean is optimal
I Lin-Lin loss → c2 /(c1 + c2 ) quantile of the posterior is optimal
I Zero-one loss → Posterior mode is optimal

Bayesian Learning Lecture Captures Key Concepts

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Bayesian Learning Lecture Captures Key Concepts

Hochgeladen von

Copyright:

Verfügbare Formate

B AYESIAN L EARNING - L ECTURE 4

Division of Statistics and Machine Learning

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 1 / 14

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 2 / 14

I If p (ỹ |θ, y ) = p (ỹ |θ ) [not true for time series], then

I The parameter uncertainty is represented in p (ỹ |y ) by averaging

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 3 / 14

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 4 / 14

1. Generate a posterior draw of θ (θ (1) ) from N (ȳ , σ2 /n )

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 4 / 14

I θ (1) = ȳ + ε(1) , where ε(1) ∼ N (0, σ2 /n ). (Step 1).

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 5 / 14

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 7 / 14

I Problem: Predicting the auctioned price in eBay coin auctions.

I Data: Bid from 1000 auctions on eBay.

I Covariates: auction-specific, e.g. Book value from catalog, seller’s

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 8 / 14

I A draw from the posterior predictive distibution of an auction’s price:

1. Simulate a draw θ (1) from the posterior of the model parameters θ

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 9 / 14

Pr(No bids) = 0.014 Pr(Price = No bids) = 0.042

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 10 / 14

I Alternatively loss L(a, θ ) = −U (a, θ ).

I Example loss functions when both a and θ are continuous:

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 12 / 14

I Choosing a point estimator is a decision problem.

I Which to choose: posterior median, mean or mode?

I It depends on your loss function:

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 14 / 14

Das könnte Ihnen auch gefallen