Sie sind auf Seite 1von 16

B AYESIAN L EARNING - L ECTURE 4

Mattias Villani

Division of Statistics and Machine Learning


Department of Computer and Information Science
Linköping University

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 1 / 14


L ECTURE OVERVIEW

I Prediction
I Normal model
I More complex examples
I Decision theory
I The elements of a decision problem
I The Bayesian way
I Point estimation as a decision problem

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 2 / 14


P REDICTION /F ORECASTING
I Posterior predictive distribution for future ỹ given observed data y
Z
p (ỹ |y ) = p (ỹ |θ, y )p (θ |y )d θ
θ

I If p (ỹ |θ, y ) = p (ỹ |θ ) [not true for time series], then


Z
p (ỹ |y ) = p (ỹ |θ )p (θ |y )d θ
θ

I The parameter uncertainty is represented in p (ỹ |y ) by averaging


over p (θ |y ).

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 3 / 14


P REDICTION - N ORMAL DATA , KNOWN VARIANCE
I Under the uniform prior p (θ ) ∝ c, then
Z
p (ỹ |y ) = p (ỹ |θ )p (θ |y )d θ
θ

where

θ |y ∼ N (ȳ , σ2 /n)
ỹ |θ ∼ N (θ, σ2 )

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 4 / 14


P REDICTION - N ORMAL DATA , KNOWN VARIANCE
I Under the uniform prior p (θ ) ∝ c, then
Z
p (ỹ |y ) = p (ỹ |θ )p (θ |y )d θ
θ

where

θ |y ∼ N (ȳ , σ2 /n)
ỹ |θ ∼ N (θ, σ2 )

1. Generate a posterior draw of θ (θ (1) ) from N (ȳ , σ2 /n )


2. Generate a draw of ỹ (ỹ (1) ) from N (θ (1) , σ2 ) (note the mean)
3. Repeat steps 1 and 2 a large number of times (N) with the result:
I Sequence of posterior draws: θ (1) , ...., θ (N )
I Sequence of predictive draws: ỹ (1) , ..., ỹ (N ) .

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 4 / 14


P REDICTIVE DISTRIBUTION - N ORMAL MODEL AND
UNIFORM PRIOR

I θ (1) = ȳ + ε(1) , where ε(1) ∼ N (0, σ2 /n ). (Step 1).


I ỹ (1) = θ (1) + υ(1) , where υ(1) ∼ N (0, σ2 ). (Step 2).
I ỹ (1) = ȳ + ε(1) + υ(1) .
I ε(1) and υ(1) are independent.
I The sum of two normal random variables is normal so

E (ỹ |y ) = ȳ
σ2
 
2 2 1
V (ỹ |y ) = +σ = σ 1+
n n
  
1
ỹ |y ∼ N ȳ , σ2 1 +
n

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 5 / 14


P REDICTIVE DISTRIBUTION - N ORMAL MODEL AND
NORMAL PRIOR
I It easy to see that the predictive distribution is normal.
I The mean can be obtained from
Eỹ |θ (ỹ ) = θ
and then remove the conditioning on θ by averaging over θ
E (ỹ |y ) = Eθ |y (θ ) = µn (Posterior mean of θ).
I The predictive variance of ỹ (conditional variance formula):
V (ỹ |y ) = Eθ |y [Vỹ |θ (ỹ )] + Vθ |y [Eỹ |θ (ỹ )]
= Eθ |y ( σ 2 ) + Vθ |y ( θ )
= σ2 + τn2
= (Population variance + Posterior variance of θ).
I In summary:
ỹ |y ∼ N (µn , σ2 + τn2 ).
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 6 / 14
B AYESIAN PREDICTION IN MORE COMPLEX MODELS
I Autoregressive process
iid
yt = φ1 (yt −1 − µ) + ... + φp (yt −p − µ) + ε t , ε t ∼ N (0, σ2 )
I Simulate a draw from p (φ1 , φ2 , ..., φp , µ, σ|y )
(1) (1) (1)
I Conditional on that draw θ (1) = (φ1 , φ2 , ..., φp , µ(1) , σ(1) ),
simulate
I ỹT +1 ∼p (yT +1 |yT , yT −1 , ..., yT −p , θ (1) )
I ỹT +2 ∼ p (yT +2 |ỹT +1 , yT , ..., yT −p , θ (1) )
I and so on.
I Repeat for new θ draws.

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 7 / 14


B AYESIAN PREDICTION IN MORE COMPLEX MODELS
I Autoregressive process
iid
yt = φ1 (yt −1 − µ) + ... + φp (yt −p − µ) + ε t , ε t ∼ N (0, σ2 )
I Simulate a draw from p (φ1 , φ2 , ..., φp , µ, σ|y )
(1) (1) (1)
I Conditional on that draw θ (1) = (φ1 , φ2 , ..., φp , µ(1) , σ(1) ),
simulate
I ỹT +1 ∼p (yT +1 |yT , yT −1 , ..., yT −p , θ (1) )
I ỹT +2 ∼ p (yT +2 |ỹT +1 , yT , ..., yT −p , θ (1) )
I and so on.
I Repeat for new θ draws.

I Regression trees.
I Uncertainty on which variables to split on, and the split point.
I For given draw of splitting variables and split points, simulate a
response. Repeat for many different draws.
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 7 / 14
P REDICTING AUCTION PRICES ON E B AY

I Problem: Predicting the auctioned price in eBay coin auctions.

I Data: Bid from 1000 auctions on eBay.


I The highest bid is not observed.
I The lowest bids are also not observed because of the seller’s reservation
price.

I Covariates: auction-specific, e.g. Book value from catalog, seller’s


reservation price, quality of sold object, rating of seller, powerseller,
verified seller ID etc

I Buyers are strategic. Their bids does not fully reflect their valuation.
Game theory. Very complicated likelihood.

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 8 / 14


S IMULATING AUCTION PRICES ON E B AY, CONT.

I A draw from the posterior predictive distibution of an auction’s price:

1. Simulate a draw θ (1) from the posterior of the model parameters θ


(using MCMC)
2. Simulate the number of bidders conditional on θ (Poisson process)
3. Simulate the bidders’ valuations.
4. Simulate a complete auction bid sequence, b(1) , conditional on the
valuations and θ = θ (1) .
5. For the bid sequence b(1) , return the next to largest bid (eBay’s proxy
bidding system).

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 9 / 14


P REDICTING AUCTION PRICES ON E B AY, CONT.

Pr(No bids) = 0.014 Pr(Price = No bids) = 0.042


Pr(Price = reservation) = 0.067
0.4 0.2

0.18
0.35

0.16
0.3
0.14

0.25
0.12
Density

0.2 0.1

0.08
0.15

0.06
0.1
0.04

0.05
0.02

0 0
4 6 8 10 12 14 16 18 25 30 35 40 45 50
Auction price

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 10 / 14


D ECISION T HEORY
I Let θ be an unknown quantity. State of nature. Examples: Future
inflation, Global temperature, Disease.
I Let a ∈ A be an action. Ex: Interest rate, Energy tax, Surgery.
I Choosing action a when state of nature turns out to be θ gives utility

U (a, θ )

I Alternatively loss L(a, θ ) = −U (a, θ ).

θ1 θ2
I Loss table: a1 L(a1 , θ1 ) L ( a1 , θ 2 )
a2 L(a2 , θ1 ) L ( a2 , θ 2 )
Rainy Sunny
I Example: Umbrella 20 10
No umbrella 50 0
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 11 / 14
D ECISION T HEORY, CONT.

I Example loss functions when both a and θ are continuous:


I Linear: L(a, θ ) = |a − θ |
I Quadratic: L(a, θ ) = (a − θ )2
I Lin-Lin: (
c1 · | a − θ | if a ≤ θ
L(a, θ ) =
c2 · | a − θ | if a > θ
I Example:
I θ is the number of items demanded of a product
I a is the number of items in stock
I Utility
(
p · θ − c1 (a − θ ) if a > θ [too much stock]
U (a, θ ) =
p · a − c2 (θ − a)2 if a ≤ θ [too little stock]

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 12 / 14


O PTIMAL DECISION
I Ad hoc decision rules:
I Minimax. Choose the decision that minimizes the maximum loss.
I Minimax-regret ... bla bla bla ...
I Bayesian theory: Just maximize the posterior expected utility:
abayes = argmaxa∈A Ep (θ |y ) [U (a, θ )],
where Ep (θ |y ) denotes the posterior expectation.
I Using simulated draws θ (1) ,θ (2) , ..., θ (N ) from p (θ|y ) :
N
Ep (θ |y ) [U (a, θ )] ≈ N −1 ∑ U (a, θ (i ) )
i =1
I Separation principle:
1. First obtain p (θ |y )
2. then form U (a, θ ) and finally
3. choose a that maximes Ep (θ |y ) [U (a, θ )].
M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 13 / 14
C HOOSING A POINT ESTIMATE IS A DECISION

I Choosing a point estimator is a decision problem.

I Which to choose: posterior median, mean or mode?

I It depends on your loss function:


I Linear loss → Posterior median is optimal
I Quadratic loss → Posterior mean is optimal
I Lin-Lin loss → c2 /(c1 + c2 ) quantile of the posterior is optimal
I Zero-one loss → Posterior mode is optimal

M ATTIAS V ILLANI (S TATISTICS , L I U) B AYESIAN L EARNING 14 / 14

Das könnte Ihnen auch gefallen