Beruflich Dokumente
Kultur Dokumente
2 Department
1 Introduction
1 Introduction
Introduction to R
Introduction to WinBUGS
Likelihood Principle
Normal approximation
Non-iterative Simulation
1 Introduction
1 Introduction
Computing
hypothesis tests.
1 Introduction
1 Introduction
Example 1.1
Light travels fast, but it is not transmitted instantaneously. Light takes
over a second to reach us from the moon and over 10 billion years to
reach us from the most distant objects yet observed in the expanding
universe. Because radio and radar also travel at the speed of light, an
accurate value for that speed is important in communicating with
astronauts and orbiting satellites. An accurate value for the speed of
light is also important to computer designers because electrical signals
travel only at light speed.
The first reasonably accurate measurements of the speed of light were
made by Simon Newcomb between July and September 1882. He
measured the time in seconds that a light signal took to pass from his
laboratory on the Potomac River to a mirror at the base of the
Washington Monument and back, a total distance of 7400m. His first
measurement was 24.828 millions of a second.
x 1.96 / n
X
N(0, 1):
/ n
X
< 1.96
P 1.96 <
= 0.95
/ n
1.96/ n < < X
1.96/ n = 0.95
P X
Because as
1 Introduction
1 Introduction
After collecting the data and computing the CI, this interval either
contains the true mean or it does not. Its coverage probability is not
0.95 but either 0 or 1.
Then where does our 95% confidence come from?
Let us do an experiment:
N(24.828, 0.0052 )
24.8
Sample
Coverage
to date
1st
100%
2nd
100%
3rd
4th
100%
100%
5th
6th
100%
7th
100%
8th
9th
100%
88.9%
10th
.
100th
.
991st
.
1000th
90.0%
.
94.0%
.
95.2%
.
95.2%
100%
S1
1 Introduction
1 Introduction
We take comfort in the fact that the method works 95% of the time
in the long run, i.e. the method produces a CI that contains the
unknown mean 95% of the time that the method is used in the
long run.
10
11
12
1 Introduction
1 Introduction
H0 : 0 (= 24.828)
versus H1 : > 0
Test statistic:
if = 0
P-value:
we take comfort in the fact that this test works 95% of the time in
the long run, i.e. rejects H0 even though H0 is true only in 5% of
the cases that this method is used.
1 Introduction
13
1 Introduction
P-values depend not only on the observed data but also the
sampling probability of certain unobserved datapoints. This
violates the Likelihood Principle.
This has serious practical implications for instance for the analysis
of clinical trials, where often interim analyses and unexpected
drug toxicities change the original trial design.
Prof. Dr. Renate Meyer
14
15
16
1 Introduction
1 Introduction
Historical Overview
Inverse Probability
I
I
1 Introduction
17
1 Introduction
Thomas Bayes
18
Bayes Biography
19
20
1 Introduction
1 Introduction
Bayes Biography
Bayes Biography
1 Introduction
21
1 Introduction
22
20th Century
Sir R.A. Fisher (1890-1962) was a lifelong critic of inverse probability.
and one of the most important persons involved in the demise of
inverse probability.
23
24
1 Introduction
1 Introduction
20th Century
1 Introduction
25
1 Introduction
26
27
Total of 69 patients
In 11 clinical centers
28
1 Introduction
1 Introduction
Treatm.
1
2
1
2
2
2
2
2
1
1
1
1
1
1
2
1
2
2
2
1
1
1
2
2
Time
74+
248
272+
244
20+
64
88
148+
162+
184+
188+
198+
382+
436+
32+
64+
102
162+
182+
364+
18+
36+
160+
254
Unit
B
B
Treatm.
2
1
Time
4+
156+
C
E
E
E
E
E
E
E
E
2
1
2
2
1
1
1
2
2
20+
50+
64+
82
186+
214+
214
228+
262
H
H
H
H
H
H
K
K
K
2
1
1
1
1
2
1
1
2
22+
22+
74+
88+
148+
162
28+
70+
106+
Unit
F
F
F
F
F
F
F
F
F
F
F
F
Treatm.
1
2
1
2
2
1
1
2
1
1
2
2
Time
6
16+
76
80
202
258+
268+
368+
380+
424+
428+
436+
I
I
I
I
I
I
I
I
I
I
I
I
I
2
2
2
1
1
2
1
2
1
1
2
2
1
8
16+
40
120+
168+
174+
268+
276
286+
366
396+
466+
468+
1 Introduction
On the basis of the stratified analysis, the Board would have had to
continue the trial.
The P-value of the unstratified analysis was small enough to convince
the Board to stop the trial.
29
1 Introduction
Li () =
k =1
If the largest time in ith stratum is a death, then the partial likelihood
derives no information from this event.
Stratified:
hi (t) = h0i (t) exp( 0 x)
Unstratified:
hi (t) = h0 (t) exp( 0 x)
frailty model
This is the case in the study: 4 deaths that have largest survival time
per stratum and these are all in treatment group 2.
30
Why does the stratified analysis fail to detect the treatment difference?
Contribution of ith stratum to partial likelihood:
e xik
P
0 xik
jRik e
di
Y
31
32
1 Introduction
1 Introduction
1 Introduction
33
1 Introduction
34
Chess Example
Example 1.3
Theorem 1.2
Let A1 , A2 , . . . , An be a set of mutually exclusive and exhaustive events.
Then
You are in a chess tournament and will play your next game against
either Jun or Martha, depending on results of some other games.
7
, but of beating Martha is
Suppose your probability of beating Jun is 10
2
only 10 . You assess your probability of playing Jun as 14 .
P(A )P(B|Ai )
Pn i
.
j=1 P(Aj )P(B|Aj )
35
36
1 Introduction
1 Introduction
Chess Example
Diagnostic Testing
Example 1.4
A new home HIV test is claimed to have 95% sensitivity and 98%
specificity. In a population with an HIV prevalence of 1/1000, what is
the chance that someone testing positive actually has HIV? Let A be
Now suppose that you tell me you won your next chess game.
Who was your opponent?
be the event
the event that the individual is truly HIV positive and A
that the individual is truly HIV negative.
P(J|W )
=
P(W |J)P(J)
7
=
P(W |J)P(J) + P(W |M)P(M)
13
P(A) = 0.001.
Let B be the event that the test is positive. We want P(A|B).
95% sensitivity" means that
P(B|A) = 0.95.
98% specificity" means that
A)
= 0.98 or P(B|A)
= 0.02.
P(B|
1 Introduction
37
1 Introduction
Diagnostic Testing
38
P(B|A)P(A)
A)
P(B|A)P(A) + P(B|A)P(
.95 .001
= .045.
.95 .001 + .02 .999
You are a contestant on the TV show Lets Make a Deal" and given
the choice of three doors. Two of the doors have a goat behind them
and one a car. You choose a door, say door 2, but before opening the
chosen door, the emcee, Monty Hall, opens a door that has a goat
behind it (e.g. door 1). He gives you the option of revising your choice
or sticking to your first choice. What do you do?
Thus, over 95% of those testing positive will, in fact, not have HIV.
The following example caused a stir in 1991 after a US columnist, who
calls herself Marilyn Vos Savant, used it in her column. She gave the
correct answer. A surprising number of mathematicians wrote to her
saying that she was wrong.
39
Since either box 2 or box 3 must contain the key, he claimed that her
probability of winning had increased to 12 .
Obviously, choose box 3. The probability of finding the prize in either
box 1 or 3 is 2/3. As the emcee showed you that it is not in box 1, the
probability that it is in box 2 is 2/3.
Prof. Dr. Renate Meyer
40
1 Introduction
1 Introduction
likelihood of A1
likelihood of A2
likelihood of A3
P(B|A3 )P(A3 )
P(B|A1 )P(A1 ) + P(B|A2 )P(A2 ) + P(B|A3 )P(A3 )
1 13
0
2
= 3.
1
3
1
2
1
3
+1
1
3
1 Introduction
41
1 Introduction
H2 =
man
H3 =
something else
42
Example 1.6
D = event that I look through my window and see a tall, branched thing
with green blobs covering its branches.
tree
H1 =
44
2 Bayesian Inference
2 Bayesian Inference
parameter estimation
hypothesis testing
(, 2 )
2 Bayesian Inference
45
2 Bayesian Inference
Overview
46
Likelihood Function
Definition 2.2
The likelihood function of is the function that associates the value
f (x|) to each . This function is denoted by l(; x). Other common
notations are lx (), l(|x) and l(). It is defined by
likelihood function
information criteria
score function
Fisher information
l(; x) = f (x|)
47
( )
(2.1)
48
2 Bayesian Inference
2 Bayesian Inference
Definition 2.3
maximizing (2.1) as a function of , with x fixed,
Any vector
provides a maximum likelihood (ML) estimate of .
In intuitive terms, this gives the realization of most likely to have
given rise to the current data set, an important finite sample property.
R
R
I n
f (x|)dx = 1,
l(; x)d
6= 1, in general.
x) p
AIC = Akaike Information Criterion = log l(;
I
2 Bayesian Inference
49
2 Bayesian Inference
Binomial Example
p
n
log
2
2
50
Binomial Example
Example 2.4
X Binomial(2, ). Then
f (x|) = l(; x)
2
=
x (1 )2x ,
x
and
Note that:
x = 0, 1, 2; = (0, 1)
f (x|) = 1
but
Z 1
=
2
x
Z
x (1 )2x d =
0
Prof. Dr. Renate Meyer
2
x
B(x + 1, 3 x) =
1
6= 1.
3
51
52
2 Bayesian Inference
2 Bayesian Inference
Binomial Example
Geometric Example
Example 2.5
1.0
0.6
0.4
0.2
likelihood
0.8
f (Xi = xi |) = (1 )xi 1
l(theta;x=0)
l(theta;x=1)
l(theta;x=2)
0.0
= P(X1 = x1 , X2 = x2 , . . . , Xn = xn |) = f (x1 , . . . xn |)
Q
Q
= ni=1 f (xi |) = ni=1 (1 )xi 1
0.0
0.2
0.4
0.6
0.8
1.0
Pn
= n (1 )
theta
2 Bayesian Inference
= n (1 )n(x1)
i=1 (xi 1)
53
2 Bayesian Inference
Geometric Example
54
Geometric Example
=
d2
d2
n(x 1)
1
=0
)
=
= (1
1
x
1
x
1
x
= n2
n(x 1)
(1)2
< 0
55
56
2 Bayesian Inference
2 Bayesian Inference
Exponential Example
Exponential Example
Example 2.6
Let X1 , X2 , . . . Xn denote a random sample from the exponential
distribution with unknown location parameter , unknown scale
parameter , and pdf
f (x|, ) = exp{(x )}
f (x1 , . . . , xn |, ) =
( < x < ),
n
Y
i=1
n
Y
f (xi |, )
exp{(xi )}I( xi )
i=1
2 Bayesian Inference
57
2 Bayesian Inference
Exponential Example
i=1
58
Exponential Example
Defining z = min(x1 , . . . , xn )
Then
log g() = n log a.
As a function of
l(, ; x1 , . . . , xn )
=n = 1 .
=
a
x z
exp(n), z,
0
otherwise.
1 = z + (x z) = x ,
= +
2 = (x z)2 .
2 =
g() = n exp{a}
> 0 (if x1 , . . . , xn are not all equal).
with a = n(x )
Prof. Dr. Renate Meyer
59
60
2 Bayesian Inference
2 Bayesian Inference
Fisher Information
Fisher Information
The information measure defined this way is related to the mean value
of the curvature of the likelihood. The larger this curvature is, the
larger is the information content summarized in the likelihood function
and so the larger will I() be. Since the curvature is expected to be
negative, the information value is taken as minus the curvature. The
expectation is taken with respect to the sample distribution. The
observed Fisher information corresponds to minus the second
derivative of the log likelihood:
2
log f (X|)
JX () =
0
Definition 2.7
Let X be a random vector with pdf f (x|) depending on a 1-dim.
parameter .
The expected Fisher information measure of through X is defined by
2
log f (X|)
.
I() = EX|
2
If = (1 , . . . , p ) is a vector then the expected Fisher information
matrix of through X is defined by
2
log f (X|)
I() = EX|
0
with elements Iij () given by
2
log f (X|)
Iij () = EX|
,
i j
2 Bayesian Inference
i, j = 1, . . . , p.
61
2 Bayesian Inference
62
Fisher Information
Example 2.8
Let X N(, 2 ) with 2 known. It is easy to get I() = JX () = 2 ,
the normal precision. Verify!
log f (X |) = log{ 1 e
2
1
(X )2
2 2
} = const.
1
(X
2 2
)2
d
2
X
(X ) =
log f (X |) =
d
2 2
2
2
d
1
log f (X |) = 2
d2
d2
1
1
I() = E 2 log f (X |) = E
= 2 = JX ()
2
d
I() =
n
X
Ii ().
i=1
63
64
2 Bayesian Inference
2 Bayesian Inference
Score Function
Definition 2.9
The score function of X, is defined as
U(X; ) =
log f (X|)
.
iid
Xi | Binomial(1, ) with
E(Xi ) = and Var (Xi ) = (1 )
2 Bayesian Inference
65
l(; x1 , . . . , xn ) =
i=1
P
xi
n
Y
xi (1 )1xi
i=1
P
n xi
(1 )
= x (1 )nx
2 Bayesian Inference
=0
1
x
= =
n
f (xi |) =
P
where x = ni=1 xi .
log l(; x1 , . . . , xn ) = x log + (n x) log(1 )
d
log l(; x1 , . . . , xn ) =
d
n
Y
66
d
Xi
1 Xi
Xi
log f (Xi |) =
=
d
1
(1 )
2
(Xi )
U 2 (Xi ; ) = 2
(1 )2
Var (Xi )
(1 )
1
Ii () = E[U 2 (Xi ; )] = 2
= 2
=
2
2
(1 )
(1 )
(1 )
n
X
n
I() =
Ii () =
.
(1 )
U(Xi ; ) =
Definition 2.11
A Bayesian statistical model consists of a parametric statistical model
(the sampling distribution or likelihood), f (x|), and a prior
distribution on the parameters f ().
i=1
67
68
2 Bayesian Inference
2 Bayesian Inference
Bayes theorem
Essential Distributions
Given a complete Bayesian model, we can construct:
a) the joint distribution of (, X ),
Theorem 2.12
Continuous version of Bayes theorem:
f (, x) = f (x|)f ();
f ()f (x|)
R
f ()f (x|)d
f (|x) = R
prior likelihood
2 Bayesian Inference
2 Bayesian Inference
After seeing the data x, what do we now know about the parameter ?
plot of posterior density function
70
69
f ()f (x|)
f ()f (x|)
;
=
f (x)
f ()f (x|)d
71
72
3 Conjugate Distributions
3 Conjugate Distributions
Conjugate Distributions
Assume a drug may have response rate of 0.2, 0.4, 0.6, 0.8, each of
equal prior probability. If we observe a single positive response
(x = 1), how is our prior revised?
Likelihood:
f (x = 0|) = 1
3 Conjugate Distributions
73
f (x|) = x (1 )1x
f (x = 1|) =
Posterior:
f (x|)f ()
f (|x) = P
f (x|)f ()
j f (x|j )f (j )
.2
.4
.6
.8
P
prior
f ()
0.25
0.25
0.25
0.25
1.0
3 Conjugate Distributions
74
P(X = 0) = f (x = 0) = 1 f (x = 1) = 0.5
The prior predictive probability is thus a weighted average of the
likelihoods under the 4 possible values of :
f (x) =
wj f (x|j )
Note: a single positive response makes it 4 times as likely that the true
response rate is 80% rather than 20%.
Furthermore:
f (x = 1) =
j
Prof. Dr. Renate Meyer
75
76
3 Conjugate Distributions
3 Conjugate Distributions
f (z|x) =
f (z|j )f (j |x)
3 Conjugate Distributions
77
3 Conjugate Distributions
Likelihood
f (x = r |) =
n
r
.2
.4
.6
.8
P
r (1 )nr r (1 )nr
Suppose n = 20, r = 15
f (x = 15||) = 15 (1 )5
78
79
80
3 Conjugate Distributions
3 Conjugate Distributions
Let X | Binomial(n, )
(or Xj | Bernoulli() conditionally independent for j = 1, . . . , n),
where the unknown parameter can attain I different values i , with a
priori probabilities f (i ), i = 1, . . . , I, respectively.
f (z = 1|x = 15)
P
= i f (z = 1|i )f (i |x = 15)
= 0.7384
= 1 , . . . , I
3 Conjugate Distributions
81
3 Conjugate Distributions
I
X
82
f (x|i )f (i )
for x = 0, 1, . . . , n
f (y |x) =
i=1
I
X
f (y |i )f (i |x)
i=1
Posterior pdf of :
f (i |x) =
f (i )f (x|i )
=
PI
f (x)
j=1 f (j )f (x|j )
f (i )f (x|i )
f (i )f (x|i )
f (1|x) =
I=1
i = 1, . . . , I
I
X
f (0|x) = 1 f (1|x)
83
84
3 Conjugate Distributions
3 Conjugate Distributions
Calculating Posterior
Posterior:
f (x|) =
n
x
x (1 )nx x (1 )nx
f (|x)
f (x|)f ()
x (1 )nx 1 (1 )1
Beta(, )
+x1 (1 )+nx1
( + ) 1
(1 )1
()()
f () =
Beta( + x, + n x)
Note: the Binomial and Beta distributions are conjugate distributions
1 (1 )1
3 Conjugate Distributions
85
3 Conjugate Distributions
Posterior Moments
86
mean = /( + )
likelihood
prior
posterior
density
3
Suppose our prior estimate of the response rate is 0.4 with a standard
deviation of 0.1.
successes
failures
Prof. Dr. Renate Meyer
prior
9.2
13.8
likelihood
15
5
0.0
0.4
0.6
0.8
1.0
theta
posterior
24.2
18.8
0.2
87
88
3 Conjugate Distributions
3 Conjugate Distributions
Compromise
In general, the posterior mean is a compromise between prior mean
and data mean, i.e. for some w, 0 w 1:
prior mode:
x +
n++
Solve w.r.t. w:
9.2
= 0.4
23
15
data mean:
= 0.75
20
24.2
posterior mean:
= 0.56
43
x
+ (1 w)
+
n
i.e.
prior mean:
w=
3 Conjugate Distributions
= w
x +
+
n
x
=
+
n++
n++ + n++ n
89
+
n++
n
n++
+
n++
for n
1 for n
Applied Bayesian Inference
3 Conjugate Distributions
Compromise
90
Hypothesis Test
H0 : > 0 = 0.4
Calculate prior and posterior probability of H0 :
Z 1
Z 0
P( > 0 ) =
f ()d = 1
f ()d = 1 FBeta(,) (0 )
0
P( > 0 |x) =
Z
f (|x)d = 1
f (|x)d = 1FBeta(+x,+nx) (0 )
91
92
3 Conjugate Distributions
3 Conjugate Distributions
0.95 =
f (|x)d
> l=qbeta(0.025,24.2,18.8)l
[1] 0.4142266
> u=qbeta(0.975,24.2,18.8)
> u
[1] 0.7058181
=
=
3 Conjugate Distributions
93
(n + + )
+x1 (1 )+nx1 d
( + x)( + n x)
0
Z 1
(n + + )
+x (1 )+nx1 d
( + x)( + n x) 0
(n + + )
( + x + 1)( + n x)
( + x)( + n x)
(n + + + 1)
(n + + )
( + x)( + x)( + n x)
( + x)( + n x) (n + + )(n + + )
+x
9.2 + 15
=
= 0.562797
++n
23 + 20
3 Conjugate Distributions
94
f (y |)f (|x)d
Z 1
(n + + )
N
=
y (1 )Ny
+x1 (1 )+nx1 d
y
(
+
x)(
+
n
x)
0
Z 1
(n + + )
N
=
y ++x1 (1 )Ny ++nx1 d
y
( + x)( + n x) 0
(n + + )
( + + n + N)
N
=
y
( + x)( + n x) ( + x + y )( + n x + N y )
95
96
3 Conjugate Distributions
3.4 Exchangeability
3 Conjugate Distributions
Independence?
3.4 Exchangeability
If f () = Unif(0,1), then
Z
=
x1 +x2 (1 )2x1 x2 d
It is true that
f (x1 , x2 ) =
(x1 + x2 + 1)(3 x1 x2 )
(4)
3 Conjugate Distributions
97
3.4 Exchangeability
3 Conjugate Distributions
Exchangeability
3.4 Exchangeability
98
99
100
3 Conjugate Distributions
3 Conjugate Distributions
Sequential Inference
Suppose we obtain an observation x1 and form the posterior
f (|x1 ) f (x1 |)f () and then we obtain a further observation x2 which
is conditionally independent of x1 given . The posterior on x1 , x2 is
given by:
f (|x1 , x2 ) f (x2 |, x1 ) f (|x1 )
f (x2 |) f (|x1 )
Todays posterior is tomorrows prior!
The resultant posterior is the same as if we have obtained the data
x1 , x2 together:
f (|x1 , x2 ) f (x1 , x2 |) f ()
f (x2 |) f (x1 |) f ()
Applied Bayesian Inference
3 Conjugate Distributions
101
3 Conjugate Distributions
103
102
104
3 Conjugate Distributions
3 Conjugate Distributions
Point Estimation
A single statistic is calculated from the sample data and used to
estimate the unknown parameter.
The statistic depends on the random sample, so it is random, and its
distribution is called its sampling distribution.
We call the statistic an estimator of the parameter and the value it
takes for the actual sample data an estimate.
There are various frequentist approaches for finding estimators, such
as
point estimation,
interval estimation,
For estimating the binomial parameter , the LS, MLE and UMVUE of
the population proportion is the sample proportion.
Applied Bayesian Inference
3 Conjugate Distributions
105
3 Conjugate Distributions
106
Bias
= ,
E[] = f
= bias()
2 + Var ().
MS()
= E[]
.
bias()
107
108
3 Conjugate Distributions
3 Conjugate Distributions
MSE Comparison
MSE Comparison
We will now compare the mean squared error of the Bayesian and the
frequentist estimator of the population proportion .
The frequentist estimator for is
X
f = ,
n
and
E[B ] =
1
n+2
1
n(1
(n+2)2
(1 )
Var (f ) =
n
(1 )
2
MS(f ) = 0 +
n
109
3 Conjugate Distributions
MSE Comparison
MS(f ) =
110
MSE Comparison
For example, suppose = 0.4 and the sample size is n = 10. Then
0.40.6
10
E[f ] =
3 Conjugate Distributions
n
n+2
Var (B ) =
Thus,
n+2
Var (X ) = n(1 )
n+2
Figure 8 shows the mean squared error for the Bayesian and the
frequentist estimator as a function of . Over most (but not all) of the
range, the Bayesian estimator (using uniform prior) performs better
that the frequentist estimator.
= 0.024
0.025
and
Bayes
frequentist
0.020
MS(B ) = 0.0169
0.010
MSE
0.015
0.005
MS(f ) = 0.025
0.0
and
0.0
MS(B ) = 0.01736
0.2
0.4
0.6
0.8
1.0
theta
111
112
3 Conjugate Distributions
3 Conjugate Distributions
Interval Estimation
where the critical value comes from the normal or t table. For the
sample proportion, an approximate (1 ) 100% confidence interval
for is given by:
s
f (1 f )
.
f tn1 (/2)
n
P(l u) = 1 .
In the frequentist interpretation, the parameter is fixed but unknown
and, before the sample is taken, the interval endpoints are random
because they depend on the data. After the sample is taken, and the
endpoints are calculated, there is nothing random, so the interval is
called a confidence interval for the parameter. Under the frequentist
paradigm, the correct interpretation for a (1 ) 100% confidence
interval is that (1 ) 100% of the random intervals calculated this
way will contain the true value. Often, the sampling distribution of the
estimator is approximately normal or tn1 distributed with mean equal
to the true value.
Applied Bayesian Inference
3 Conjugate Distributions
A Bayesian credible interval for the parameter on the other hand, has
the natural interpretation that we want. Because it is found from the
posterior distribution of , it has the coverage probability we want for
this specific data.
113
3 Conjugate Distributions
Hypothesis Testing
Example 3.1
Out of a random sample of 100 Hamilton residents, x = 26 said they
support a casino in Hamilton. Compare the frequentist 95% confidence
interval with the Bayesian credible interval (using a uniform prior).
114
Example 3.2
Suppose we wish to determine whether a new treatment is better than
the standard treatment. If so, , the proportion of patients who benefit
from the new treatment, should be higher than 0 , the proportion who
benefit from the standard treatment. It is known from historical records
that 0 = .6. A random group of 10 patients are given the new
treatment. X , the number who benefit from the treatment will be
Binomial(n, ). We observe x = 8 patients benefit. This is better than
we would expect if = 0.6. But, is it sufficiently better for us to
conclude that > 0.6 at the 5% level of significance?
The following table gives the null distribution of X :
x
f (x|0 )
115
0
.001
1
.0016
2
.0106
3
.0425
4
.1115
5
.2007
6
.2508
7
.2150
8
.1209
9
.0403
10
.0060
116
3 Conjugate Distributions
3 Conjugate Distributions
Frequentist Test
Bayesian Test
prior: Beta(1,1)
H0 : 0.6
H1 : > 0.6
data: x = 8, n x = 2
posterior: Beta(9,3)
P(H0 |x = 8) = P( 0.6|x = 8)
= pbeta(0.6, 9, 3)
3 Conjugate Distributions
= 0.1189
117
119
3 Conjugate Distributions
118
120
3 Conjugate Distributions
3 Conjugate Distributions
Exponential data
Gamma Prior
for x > 0.
exp()
()
Posterior density:
f (|x) n+1 exp((nx + )) Gamma( + n, + nx )
3 Conjugate Distributions
121
3 Conjugate Distributions
Exponential Example
122
Exponential Example
Example 3.3
123
124
3 Conjugate Distributions
3 Conjugate Distributions
3 Conjugate Distributions
125
3 Conjugate Distributions
127
126
128
3 Conjugate Distributions
3 Conjugate Distributions
Poisson Data
Gamma Prior
f (x|) =
Any process producing events which satisfy the above three axioms is
called a Poisson process and X , the number of events in a unit time
interal, is distributed as Poisson().
3 Conjugate Distributions
129
f () =
exp()
()
1 exp()
3 Conjugate Distributions
Calculating Posterior
130
Posterior density:
f (x) =
f (x|)f ()d
0
1 e x e
+x1 e(+1)
=
=
131
x e 1
e
d
x! ()
0
Z
1 +x1 (+1)
e
d
() x! 0
1 ( + x)
() x! ( + 1)+x
( + x 1)!
x
( + 1) ( + 1) ( 1)!x!
x
1
+x 1
x
+1
+1
Z
132
3 Conjugate Distributions
3 Conjugate Distributions
Negative Binomial
i.e. Negative-Binomial(, )
i.e. the no. of Bernoulli failures obtained before the th success when
Likelihood:
f (x|) =
which shows
Z
Neg-bin(x|, ) =
Poisson(x|)Gamma(|, )d
n
Y
f (xi |)
i=1
n
Y
i=1
xi e
xi !
1
Qn
i=1 xi !
Pn
i=1 xi
en
nx en
3 Conjugate Distributions
133
3 Conjugate Distributions
134
Poisson Example
Example 3.4
Suppose that causes of death are reviewed in detail for a city in the US
for a single year. It is found that 3 persons, out of a population of
200,000, died of asthma, giving a crude estimated asthma mortality
rate in the city of 1.5 per 100,000 persons per year. A Poisson
sampling model is often used for epidemiological data of this form. Let
represent the true underlying long-term asthma mortality rate in the
city (measured in cases per 100,000 persons per year). Reviews of
asthma mortality rates around the world suggest that mortality rates
above 1.5 per 100,000 people are rare in Western countries, with
typical asthma mortality rates around 0.6 per 100,000.
f () 1 exp()
Posterior density:
f (|x) f ()f (x|)
1 e nx en
+nx1 e(+n)
135
136
3 Conjugate Distributions
3 Conjugate Distributions
Poisson Example
Poisson Example
b) What is the posterior probability that the long-term death rate from
asthma in the city is more than 1.0 per 100,000 per year?
c) What is the posterior predictive distribution of a future observation
Y?
d) To consider the effect of additional data, suppose that ten years of
data are obtained for the city in this example with y = 30 deaths
over 10 years. Assuming the population is constant at 200,000,
and assuming the outcomes in the ten years are independent with
constant long-term rate , derive the posterior distribution of .
e) What is the posterior probability that the long-term death rate from
asthma in the city is more than 1.0 per 100,000 per year?
3 Conjugate Distributions
137
3 Conjugate Distributions
Poisson Example
138
Poisson Example
139
140
3 Conjugate Distributions
3 Conjugate Distributions
Normal Example
Example 3.5
According to Kennett and Ross (1983), Geochronology, London:
Longmans, the first apparently reliable datings for the age of Ennerdale
granophyre were obtained from the K/Ar method (which depends on
observing the relative proportions of potassium-40 and argon-40 in the
rock) in the 1960s and early 1970s, and these resulted in an estimate
of 370 20 million years. Later in the 1970s, measurements based on
the Rb/Sr method (depending on the relative proportions of
rubidium-87 and strontium-87) gave an age of 421 8 million years. It
appears that the errors marked are meant to be standard deviations,
and it seems plausible that the errors are normally distributed. If then a
scientist had the K/Ar measurements available in the early 1970s, this
could be the basis of her prior beliefs about the age of these rocks.
3 Conjugate Distributions
141
3 Conjugate Distributions
Normal Prior
Conjugate prior:
0 , 02
Posterior: |x N(1 , 12 )
where
1 =
are hyperparameters
NB:
1
f () =
exp 2 ( 0 )2
20
20
1
exp 2 ( 0 )2
20
Calculating Posterior
142
143
1
+ 12 x
02 0
1
+ 12
02
1
1
1
= 2+ 2
2
1
0
144
3 Conjugate Distributions
3 Conjugate Distributions
Calculating Posterior
f (|x) f (x|)f ()
"
#!
1 (x )2 ( 0 )2
exp
+
2
2
02
1
1
(x 2 2x + 2 ) 2 (2 20 + 20 )
2
2
20
"
!
!
#!
1 2 1
1
x
0
exp
+
2
+
+ const.
2
2 02
2 02
0
+ 2
2
1
1
2
0
+ const.
exp
1 2 1
1
2 1
+ 2
2
+ 1
exp
02
1
exp
2
1
1
2
1
02
1
x
2
1
2
2
2 + 02
2
+ 02
0
+ 12
0
3 Conjugate Distributions
145
3 Conjugate Distributions
146
1
(
202
0 )2
E[U] = E[E[U|V ]]
Now:
E[X ] = E[E[X |]] = E[] = 0
Var (X ) = E[Var (X |)] + Var (E[X |])
= E[ 2 ] + Var () = 2 + 02
Prof. Dr. Renate Meyer
147
148
3 Conjugate Distributions
3 Conjugate Distributions
Normal Example
Now back to Example 3.5:
1
(
212
1 ) 2
0.05
prior
posterior
likelihood
density
0.03
0.04
0.02
Now:
0.01
0.0
300
350
400
450
mu
3 Conjugate Distributions
149
3 Conjugate Distributions
Normal Example
Example 3.6
150
151
weight
375
392
393
397
398
399
400
401
402
403
404
405
frequency
1
1
1
1
2
7
4
12
8
6
9
5
weight
406
407
408
409
410
411
412
413
415
418
423
437
frequency
12
8
5
5
4
1
3
1
1
1
1
1
152
3 Conjugate Distributions
3 Conjugate Distributions
Normal Example
Calculting Posterior
iid
Questions:
1. How much does NB10 really weigh?
2. How certain are you given the data that the true weight of NB10 is
less than 405.25 g below 10g?
3. What is the underlying accuracy of the NB10 measuring process?
4. How accurately can you predict the 101st measurement?
3 Conjugate Distributions
153
1
+ n2 x
02 0
1
+ n2
02
n
1
1
= 2+ 2
n2
0
3 Conjugate Distributions
Calculating Posterior
154
Calculating Posterior
Why?
Reduction to the case of single data point of previous section:
f (x |)
iid
If X1 , . . . , Xn | N(, 2 )
Likelihood: f (x1 , . . . , xn |)
h
Q
Q
= ni=1 f (xi |)= ni=1 1 exp 21
2
= const. exp 12
Pn
i=1
i
xi 2
xi 2
| N(, 2 /n).
and X
. . . exp 21
/ n
2
155
156
3 Conjugate Distributions
3 Conjugate Distributions
Remarks
Remarks
1. If 02 = 2 then
n =
1
n2
P
0 + nx
0 + xi
=
n+1
n+1
n+1
2
2.
2
02
1
+ n2 x
02 0
1
+ n2
02
2
02 0
2
02
xi
+n
3 Conjugate Distributions
157
3 Conjugate Distributions
158
159
160
3 Conjugate Distributions
3 Conjugate Distributions
Example 3.7
0.6
0.5
Changes in blood pressure (in mmHg) were recorded for each of 100
patients, where negative numbers are decreases while on the drug
and positive numbers are increases:
prior
posterior
likelihood
density
0.3 0.4
0.2
0.0
0.1
380
400
mu
420
440
3 Conjugate Distributions
161
3 Conjugate Distributions
Example
162
Calculating Posterior
Let us assume that we dont know anything about the mean change in
blood pressure induced by the new drug and thus assume that can
attain any real value with equal probability. This gives a flat prior
distribution for on (, ), i.e.
f () 1.
(There is no proper continuous uniform distribution on (, ), but
you can think of being uniform on some finite interval (a, a), for
some large a and ignore the normalization constant, as it is not
needed for the application of Bayes theorem).
1 exp 2 / n
exp
12
x
/ n
2
163
164
3 Conjugate Distributions
3 Conjugate Distributions
Credible Intervals
95% posterior probability interval for :
iid
Normal(n , n2 ) with
In R:
n = x and
n2 = 2 /n.
> lu=qnorm(c(0.025,0.975),-7.99,sqrt(0.187489))
> lu
[1] -8.838664 -7.141336
In Example 3.7
n = 7.99
n2 = 4.332 /100 = 0.187489
3 Conjugate Distributions
165
3 Conjugate Distributions
166
Hypothesis Test
> p=pnorm(-7,-7.99,sqrt(0.187489))
> p
[1] 0.9888838
167
168
3 Conjugate Distributions
3 Conjugate Distributions
prior distribution:
| 2 N(0 , 2 /0 )
2 Inv-2 (0 , 02 )
where
0
n
0 +
x
0 + n
0 + n
= 0 + n
n =
n
n = 0 + n
N-Inv-2 (0 , 02 /0 , 0 , 02 )
n n2 = 0 02 + (n 1)s2 +
3 Conjugate Distributions
2 (0 /2+1)
N-Inv-2 (n , n2 /n , n , n2 )
1
2
2
f (, |x) ( )
exp 2 [0 0 + 0 (0 ) ]
2
1
2 n/2
2
2
( )
exp 2 [(n 1)s + n(x ) ]
2
1
169
0 n
(x 0 )2 .
0 + n
3 Conjugate Distributions
170
0
+ n2 x
2 0
, 0
0
+ n2
2
2
1
+
!
n
2
marginal posterior of 2 :
2 |x Inv-2 (n , n2 )
marginal posterior of :
f (|x)
n ( n )2
1+
n n2
(n +1)/2
tn (|n , n2 /n )
Prof. Dr. Renate Meyer
171
172
3 Conjugate Distributions
3 Conjugate Distributions
Sampling distribution:
Yi |i , 2 N(i , 2 ),
| 2 Np ( , 2 V)
i = 1, . . . , n,
where
X=
0
1
..
.
3 Conjugate Distributions
2 Inv-Gamma(a, b).
= (X
0 y + V1 )
= (X0 X + V 1 )1
n
=
a
+a
2
= SS + b
b
2
0 1
+ 0 V1 .
SS = y0 y
Y|, Nn (X, In )
and
with
a
, b)
Posterior is NIG(, V,
173
3 Conjugate Distributions
174
Weighted Average
with W = (X0 X + V1 )1 X0 X
175
176
4 WinBUGS Applications
4 WinBUGS Applications
nonconjugate priors
multiple parameters
The most successful approach, for reasons that we will discuss in the
subsequent sections, is based on simulation. That means, instead of
explicitly calculating the posterior and performing integrations, we
generate a sample from the posterior distribution and use that sample
to approximate any quantity of interest, e.g. approximate the posterior
mean by the sample mean etc.
4 WinBUGS Applications
177
4 WinBUGS Applications
178
WinBUGS Handouts
179
180
4 WinBUGS Applications
4 WinBUGS Applications
Example
Example 4.1
Predictor variables X1 , . . . , Xp
I
I
4 WinBUGS Applications
181
4 WinBUGS Applications
Cases
7
3
3
4
6
7
2
7
30
5
16
10
4
6
9
10
6
7
3
17
10
26
9
8
4
Model Assumptions
The explanatory variables are assumed fixed, their values denoted by
xi1 , . . . , xip for i = 1, . . . , n. Given the values of the explanatory
variables, the observations of the response variable are assumed
independent, normally distributed
Distance
560
220
340
80
150
330
110
210
1460
605
688
215
255
462
448
776
200
132
36
770
140
810
450
635
150
182
with
= 0 + 1 xi1 + + p xip
for i = 1, . . . , n
or in matrix notation:
Y|x Nn (, 2 I)
with
= X
where 2 and = (0 , 1 , . . . , p ) is the set of regression parameters, I
denotes the identity matrix, Y the vector of observations and X = (xij )
the n (p + 1) design matrix.
183
184
4 WinBUGS Applications
4 WinBUGS Applications
Prior Specification
In normal regression models, the simplest approach is to assume that
all parameters are a priori independent, i.e.
f (, ) =
p
Y
f (j )f ( )
j=0
for (i in 1:n){
y[i] ~ dnorm(mu[i],tau)
mu[i] <- beta0 + beta1*x1[i] + ... + betap*xp[i]
}
sigma2 <- 1/tau
sigma <- sqrt(sigma2)
N(j , cj2 )
Gamma(a, b)
for j = 0, . . . , p
b
a1
and Var( 2 ) =
b2
.
(a1)2 (a2)
4 WinBUGS Applications
185
4 WinBUGS Applications
beta0
beta1
....
betap
tau
~
~
186
dnorm(0.0,1.0E-4)
dnorm(0.0,1.0E-4)
dnorm(0.0,1.0E-4)
dgamma(0.001,0.001)
187
188
4 WinBUGS Applications
4 WinBUGS Applications
Interpretation of 0
Answers:
2. Calculate the posterior probability P(j > 0) and P(j < 0).
In WinBUGS, use the step function
p.betaj <- step(betaj)
i = 0c + 1c (xi1 x1 ) + + 1c (xip xp )
0c = expected value of Y when all covariates are equal to their means
4 WinBUGS Applications
189
4 WinBUGS Applications
190
Regression Example in R
time[]
16.68
11.5
12.03
14.88
13.75
..
.
35.1
17.9
52.32
18.75
19.83
10.75
END
distance[]
560
220
340
80
150
..
.
770
140
810
450
635
150
For some odd reason (bug in WinBUGS?), make sure there is a blank
line after END.
Prof. Dr. Renate Meyer
191
192
4 WinBUGS Applications
4 WinBUGS Applications
Regression Output in R
Call:
lm(formula = time ~ cases_cent + distance_cent)
Residuals:
Min
1Q Median
3Q
Max
-5.7880 -0.6629 0.4364 1.1566 7.4197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
22.384000
0.651895 34.337 < 2e-16 ***
cases_cent
1.615907
0.170735
9.464 3.25e-09 ***
distance_cent 0.014385
0.003613
3.981 0.000631 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 3.259 on 22 degrees of freedom
Multiple R-squared: 0.9596,
Adjusted R-squared: 0.9559
F-statistic: 261.2 on 2 and 22 DF, p-value: 4.687e-16
Applied Bayesian Inference
4 WinBUGS Applications
193
model
{
# likelihood
for (i in 1:n){
time[i] ~ dnorm( mu[i],tau)
mu[i] <- beta0 + beta1*(cases[i]-mean(cases[])) +
beta2* (distance[i]-mean(distance[]))
}
# prior distributions
tau ~ dgamma(0.001,0.001)
beta0 ~ dnorm(0.0,1.0E-4)
beta1 ~ dnorm(0.0,1.0E-4)
beta2 ~ dnorm(0.0,1.0E-4)
4 WinBUGS Applications
194
node
R2B
beta0
beta1
beta2
p.beta0
p.beta1
p.beta2
sigma2
195
mean
0.9516
22.37
1.61
0.01447
1.0
1.0
1.0
11.67
sd
0.01732
0.6681
0.1851
0.003931
0.0
0.0
0.0
4.175
MC error
7.742E-4
0.02255
0.005237
1.263E-4
3.162E-12
3.162E-12
3.162E-12
0.1866
2.5%
0.9063
21.15
1.254
0.006683
1.0
1.0
1.0
6.364
median
0.9551
22.35
1.606
0.0144
1.0
1.0
1.0
10.82
97.5%
0.9737
23.78
1.992
0.02251
1.0
1.0
1.0
22.7
196
start
1001
1001
1001
1001
1001
1001
1001
1001
4 WinBUGS Applications
4 WinBUGS Applications
A high value of the precision (low 2 ) indicates that the model can
accurately predict the expected value of Y . We can rescale this
quantity using the sample variance of the response variable Y , sY2 ,
using the RB2 statistic given by:
RB2 = 1
2
2
Radj
=1 2 ,
sY
where
2
1
=
1
.
sY2
sY2
2 =
4 WinBUGS Applications
n
1 X
(yi yi )2
np
197
4 WinBUGS Applications
MC error
0.0821
2.5%
13.71
xij j ,
i=1
198
Prediction in WinBUGS
p
X
mean
21.06
yi = 0 +
Missing Data
node
time[21]
with
i=1
median
21.02
97.5%
28.4
start
1001
sample
2000
199
200
4 WinBUGS Applications
4 WinBUGS Applications
Prediction in WinBUGS
In the linear regression Example 4.1, this means defining another
variable in the code with the same distribution as the original data and
values for the predictor variables for which we want to forecast, e.g.
cases=20 and distance=1000, and including this variable in the
dataset with value NA:
pred.time ~ dnorm( pmu,tau)
pmu<- beta0 + beta1*(20-mean(cases[])) +
beta2* (1000-mean(distance[]))
Running the model again and monitoring pred.time gives the
posterior predictive summary:
node
pred.time
mean
48.98
sd
3.71
MC error
0.07796
2.5%
41.73
median
49.0
97.5%
56.56
start
1001
4 WinBUGS Applications
sample
2000
201
4 WinBUGS Applications
Model Assessment
202
Model Assessment
203
204
4 WinBUGS Applications
4 WinBUGS Applications
Model Checking
yi E[Yirep ]
2. the standardised residuals: q
Var (Yirep )
3. the chance of getting a more extreme observation:
min(P(Yirep < yi ), P(Yirep yi ))
4. the chance of getting more surprising observation:
P(Yirep : f (Yirep ) f (yi ))
5. the predictive ordinate of the observation: f (yirep )
4 WinBUGS Applications
4 WinBUGS Applications
206
Assume the data has been divided into a training set z and an
evaluation set y. Then the posterior distribution of is based on z and
the predictive distribution above is given by
Z
f (yi |z) = f (yi |z, )f (|z)d
205
207
E[Yirep |z]
yi E[Yirep |z]
q
and sri =
Var (Yirep |z)
208
4 WinBUGS Applications
4 WinBUGS Applications
Cross-Validation Approach
4 WinBUGS Applications
209
4 WinBUGS Applications
210
WinBUGS Cross-Validation
Cross-Validation Approach
212
4 WinBUGS Applications
4 WinBUGS Applications
WinBUGS Cross-Validation
box plot: sr
[9]
4.0
for (i in 1:n){
r[i]<- time[i]-mu[i]
sr[i]<- (time[i]-mu[i])*sqrt(tau)
}
[4]
2.0
[21]
[10]
[18]
[11]
[2]
[7]
[3]
[5]
[19]
[13] [14]
[8]
[12]
[6]
[15]
[16]
[17]
[25]
[22]
0.0
[24]
[1]
[20]
[23]
-2.0
-4.0
4 WinBUGS Applications
213
4 WinBUGS Applications
node
p.smaller[1]
p.smaller[2]
p.smaller[3]
p.smaller[4]
p.smaller[5]
p.smaller[6]
p.smaller[7]
p.smaller[8]
p.smaller[9]
p.smaller[10]
p.smaller[12]
p.smaller[13]
p.smaller[14]
p.smaller[15]
p.smaller[16]
p.smaller[17]
p.smaller[18]
p.smaller[19]
p.smaller[20]
p.smaller[21]
p.smaller[22]
p.smaller[23]
p.smaller[24]
p.smaller[25]
214
215
mean
0.077
0.626
0.4875
0.9275
0.449
0.459
0.5915
0.6325
0.9555
0.7575
0.431
0.631
0.633
0.591
0.4285
0.571
0.8505
0.712
0.052
0.235
0.175
0.093
0.09
0.4685
sd
0.2666
0.4839
0.4998
0.2593
0.4974
0.4983
0.4916
0.4821
0.2062
0.4286
0.4952
0.4825
0.482
0.4916
0.4949
0.4949
0.3566
0.4528
0.222
0.424
0.38
0.2904
0.2862
0.499
MC error
0.005964
0.01051
0.01109
0.006629
0.009629
0.009853
0.01047
0.01033
0.004386
0.01117
0.009716
0.009968
0.0116
0.009021
0.012
0.01115
0.007033
0.009266
0.004984
0.008387
0.008043
0.006644
0.007328
0.01068
216
4 WinBUGS Applications
4 WinBUGS Applications
=
=
=
=
=
f (y(i) )
f (y)
Z
f (y(i) |)f ()
d
f (y)
Z
1 f (y|)f ()
d
f (yi |) f (y)
Z
1
f (|y)d
f (yi |)
1
E|y
f (yi |)
4 WinBUGS Applications
mean
34.12
9.383
8.959
31.2
8.929
8.712
9.184
9.37
6273.0
13.03
11.38
9.211
9.213
9.338
8.846
9.538
8.844
16.44
10.51
53.19
13.66
30.14
24.4
27.73
8.82
In WinBUGS:
like[i] <- sqrt(tau/(2*PI))*exp(-0.5*pow(sr[i],2))
p.inv[i] <- 1/like[i]
217
4 WinBUGS Applications
218
MC error
0.7646
0.04268
0.03359
0.4766
0.03761
0.03228
0.042
0.0396
3500.0
0.1362
0.0671
0.04563
0.03934
0.0409
0.03416
0.0458
0.03562
0.1572
0.06173
1.06
0.1599
1.025
0.237
0.5858
0.03519
!1
which is the harmonic mean of the likelihood function. But note that
harmonic means are notoriously unstable so care is required regarding
convergence!
sd
27.03
1.698
1.627
20.32
1.512
1.41
1.669
1.669
154700.0
6.565
2.956
1.792
1.586
1.699
1.423
2.268
1.473
6.838
2.532
49.06
7.111
53.81
9.003
21.34
1.473
N
1X
1
N
f (yi |(n) )
n=1
220
4 WinBUGS Applications
4 WinBUGS Applications
n
X
(yi E[Yi |])2
.
Var (Yi |)
4 WinBUGS Applications
221
4 WinBUGS Applications
222
i=1
223
for (i in 1:n){
#residuals and moments for observed data
r[i]<- time[i]-mu[i]
sr[i]<- (time[i]-mu[i])*sqrt(tau)
m3[i] <- pow(sr[i],3)
m4[i] <- pow(sr[i],4)
224
4 WinBUGS Applications
4 WinBUGS Applications
# Bayesian p-value:
node
skew.obs
skew.rep
p.skew
kurtosis.obs
kurtosis.rep
p.kurtosis
4 WinBUGS Applications
225
4 WinBUGS Applications
227
sd
0.8858
0.7959
0.499
2.754
2.023
0.4931
MC error
0.0185
0.01879
0.01028
0.05979
0.04379
0.01081
mean
0.09787
-0.02244
0.4685
3.783
3.045
0.417
226
228
4 WinBUGS Applications
4 WinBUGS Applications
measure of fit
measure of complexity
Problems:
e.g.
+ 2p
AIC = 2 log p(y |)
+ p log n
BIC = 2 log p(y |)
4 WinBUGS Applications
229
4 WinBUGS Applications
Deviance
230
measure of fit:
= E|y [D]
D
posterior mean of deviance
measure of complexity:
D()
pD = D
effective no. of parameters
+ pD = D()
+ 2pD
DIC = D
D() = 2 log p(y |) + 2 log p(y |sat )
The model with the smallest DIC value is preferred. DIC calculation is
implemented in WinBUGS.
231
232
4 WinBUGS Applications
4 WinBUGS Applications
DIC Output
We will illustrate the use of DIC for comparing four different models for
the softdrink Example 4.1
4 WinBUGS Applications
233
Model
Intercept
Cases
Distance
Cases + Distance
4 WinBUGS Applications
235
Dhat
207.061
140.477
167.503
127.030
pD
2.031
3.072
3.072
4.259
Dbar
209.092
143.549
170.575
131.289
DIC
211.123
146.622
173.647
135.547
234
236
4 WinBUGS Applications
4 WinBUGS Applications
ANOVA Models
Now
I
Corner Constraint:
Effect of baseline level (or reference category) is set to 0: 1 = 0
1 = 0
i = 0 + i , i = 2, . . . , I
ANOVA Model
Yij N(i , 2 )
i = 1, . . . , I,
j = 1, . . . , ni
or
Sum-to-zero Constraint:
where
I
i = 0 + i
0 overall common mean
i group-specific parameter
I
X
4 WinBUGS Applications
1 =
I
X
i=2
PI
0 =
i=1 i overall mean effect
i deviation of each level from this overall mean effect
237
4 WinBUGS Applications
ANOVA in WinBUGS
Assume data are given in pairs (xi , yi ), i = 1, . . . , n (n =
238
ANOVA Example
P
ni )
Example 4.2
McCarthy (2007) describes a dataset of weights of starlings at four
different locations.
#likelihood
for (i in 1:n){
y[i]
~ dnorm(mu[i],tau)
mu[i] ~ beta0 + beta[x[i]]
}
#corner constraint
beta[1] <- 0.0
#sum-to-zero constraint
#beta[1] <- - sum( beta[2:I] )
#prior
beta0 ~ dnorm(0.0,1.0E-4)
for (i in 2:I){
beta[i] ~ dnorm(0.0,1.0E-4)
}
Prof. Dr. Renate Meyer
or
i=1
1
I
i = 0
Location 1
78
88
87
88
83
82
81
80
80
89
239
Location 2
78
78
83
81
78
81
81
82
76
76
Location 3
79
73
79
75
77
78
80
78
83
84
Location 4
77
69
75
70
74
83
80
75
76
75
240
4 WinBUGS Applications
4 WinBUGS Applications
Classical ANOVA
R-Output
4 WinBUGS Applications
241
Response: Y
Df Sum Sq Mean Sq F value
Pr(>F)
loc
3 341.90 113.97 9.0053 0.0001390 ***
Residuals 36 455.60
12.66
--> summary.lm(star.aov)$coef
Estimate Std. Error
(Intercept)
83.6
1.124969
loc2
-4.2
1.590947
loc3
-5.0
1.590947
loc4
-8.2
1.590947
4 WinBUGS Applications
WinBUGS Code
Pr(>|t|)
5.325939e-41
1.218170e-02
3.342926e-03
9.372412e-06
242
WinBUGS Code
model
{ for (i in 1:40) {
mu[i] <- beta0 + beta[location[i]]
Y[i] ~ dnorm(mu[i], tau)
}
#prior, corner constraint
beta[1] <- 0
beta0 ~ dnorm(0.0,1.0E-4)
for (i in 2:4){
beta[i] ~ dnorm(0.0, 1.0E-6)
}
tau ~ dgamma(0.001, 0.001) # uninformative prio precision
}
#inits
list(beta0=70, beta=c(NA, 70, 70, 70), tau=1)
Prof. Dr. Renate Meyer
t value
74.313150
-2.639938
-3.142783
-5.154164
243
#data
location[] Y[]
1 78
...
1 89
2 78
...
2 76
3 79
...
3 84
4 77
...
4 75
END
Prof. Dr. Renate Meyer
244
4 WinBUGS Applications
4 WinBUGS Applications
WinBUGS Results
WinBUGS Results
Using the comparison tool of the Inference menu and clicking on
"boxplot" for beta:
node
mean
sd
MC error
2.5%
median
97.5%
start
sample
-4.204
-4.963
-8.143
83.58
0.07878
1.65
1.597
1.61
1.142
0.01887
0.03838
0.04041
0.03213
0.02757
4.333E-4
-7.302
-7.977
-11.26
81.31
0.04582
-4.162
-4.964
-8.168
83.59
0.07712
-0.9981
-1.699
-5.014
85.7
0.1183
1001
1001
1001
1001
1001
2000
2000
2000
2000
2000
0.0
beta[2]
beta[3]
beta[4]
beta0
tau
[2 ]
[3 ]
[4 ]
-5.0
-10.0
-15.0
4 WinBUGS Applications
245
4 WinBUGS Applications
246
Model Comparison
Let us compare the fit of this one-way ANOVA model with a model that
assumes no differences in the expected weights at the different
locations:
for (i in 1:40) {
Y[i] ~ dnorm(beta0, tau)
}
Model
ANOVA
Same Mean
Dbar
216.156
235.316
Dhat
211.053
233.229
pD
5.103
2.087
DIC
221.259
237.402
247
248
4 WinBUGS Applications
4 WinBUGS Applications
3 components of a LM:
Generalized Linear Models (GLMs) are a generalization of the linear
model for modelling of random variables from the exponential family,
thus including the Normal, Binomial, Poisson, Exponential and Gamma
distributions.
GLMs are one of the most important components of modern statistical
theory, unifying an approach to statistical modelling
Details on GLMs can be found in McCullagh and Nelder (1989),
Fahrmeir and Tutz (2001), and Dey, Ghosh, Mallick (2000)
4 WinBUGS Applications
249
3 components of a GLM:
I
4 WinBUGS Applications
250
Example 4.3
Fahrmeir and Tutz (1994) describe data provided by the Klinikum
Grosshadern, Munich, on infection from births by Caesarean section.
The response variable of interest is the occurrence or nonoccurrence
of infection, with three dichotomous covariates: whether the
Caesarean section was planned or not, whether any risk factors such
as diabetes, being overweight etc were present or not and whether
antibiotics were given as a prophylaxis. The aim was to analyse the
effects of the covariates on the risk of infection, especially whether
antibiotics can decrease the risk of infection.
Caesarean planned
Infection
yes
no
251
Not planned
Infection
yes
no
Antibiotics
Risk factors
No risk factors
1
0
17
2
11
0
87
0
No antibiotics
Risk factors
No risk factors
28
8
30
32
23
0
3
9
252
4 WinBUGS Applications
4 WinBUGS Applications
Yi |xi , i Bernoulli(i )
e
=
logistic cdf
1 + e
1
probit model: g() = (),
= ()
Normal cdf
complimentary log-log model: g() = log( log(1 ))
ORx,x+1 =
= 1 exp( exp())
4 WinBUGS Applications
= 0 + 1 x
1
= exp(0 ) exp(1 x)
1
odds(x + 1)
exp(0 ) exp(1 (x + 1)
=
= exp(1 )
odds(x)
exp(0 ) exp(1 x)
extreme-minimal-value cdf
Prof. Dr. Renate Meyer
log
253
4 WinBUGS Applications
254
WinBUGS Output
beta[1]
1.0
0.0
-1.0
-2.0
-3.0
1001
1500
2000
2500
3000
2500
3000
2500
3000
2500
3000
iteration
beta[2]
4.0
3.0
2.0
1.0
0.0
1001
1500
2000
iteration
beta[3]
-1.0
-2.0
-3.0
-4.0
-5.0
-6.0
1001
1500
2000
iteration
beta0
1.0
0.0
-1.0
-2.0
-3.0
1001
1500
2000
iteration
256
4 WinBUGS Applications
4 WinBUGS Applications
WinBUGS Output
WinBUGS Output
Centered Covariates
beta[1]
1.0
0.0
-1.0
-2.0
beta[1]
-3.0
-4.0
1001
1500
2000
2500
beta[2]
1.0
0.5
0.0
-0.5
-1.0
3000
iteration
1.0
0.5
0.0
-0.5
-1.0
beta[2]
4.0
20
40
20
40
lag
lag
3.0
2.0
1.0
beta[3]
0.0
1001
1500
2000
2500
beta0
1.0
0.5
0.0
-0.5
-1.0
3000
iteration
beta[3]
1.0
0.5
0.0
-0.5
-1.0
0
-1.0
-2.0
-3.0
-4.0
-5.0
-6.0
20
40
20
lag
1001
2000
1500
2500
40
lag
Uncentered Covariates
3000
iteration
beta0
-0.5
beta[1]
-1.0
-2.0
-2.5
1001
1500
2000
2500
beta[2]
1.0
0.5
0.0
-0.5
-1.0
-1.5
3000
1.0
0.5
0.0
-0.5
-1.0
0
iteration
20
40
20
lag
beta[3]
beta0
1.0
0.5
0.0
-0.5
-1.0
40
lag
1.0
0.5
0.0
-0.5
-1.0
0
20
40
20
lag
40
lag
4 WinBUGS Applications
257
4 WinBUGS Applications
WinBUGS Output
258
mean
-1.116
2.069
-3.333
-0.8242
0.3604
8.988
0.04017
sd
0.4392
0.4982
0.4921
0.5331
0.1639
4.894
0.02009
MC error
0.02788
0.03463
0.02534
0.04337
0.009911
0.3246
0.001003
2.5%
-1.993
1.157
-4.346
-1.961
0.1362
3.181
0.01295
median
-1.114
2.055
-3.316
-0.8118
0.3282
7.804
0.03628
97.5%
-0.2388
3.057
-2.393
0.1738
0.7878
21.26
0.09139
Dhat
226.588
227.041
224.152
pD
4.033
4.180
3.949
DIC
234.654
235.400
232.050
The complementary log-log link seems to give a better fit but there are
only minor differences in the values of DIC.
Dbar
230.621
231.221
228.101
259
260
4 WinBUGS Applications
4 WinBUGS Applications
Hierarchical Models
Hierarchical Models
Yij |j
f (yij |j )
j | f (j |)
Assuming all survival probabilities are the same will ignore potential
treatment differences between hospitals and will not fit the data
accurately.
f ()
4 WinBUGS Applications
261
4 WinBUGS Applications
262
Example 4.4
This example in the context of drug evaluation for possible clinical trial
application is taken from Gelman et al. (2004). A control group of 14
laboratory rats of type F344 is given a zero dose of a certain drug.
The aim is to estimate the probability of developing endometrial
stromal polyps (a certain tumor). The outcome is that 4 out of 14 rats
developed this tumor.
0/20
0/18
1/18
1/10
2/13
10/48
5/19
0/20
0/18
2/25
5/49
9/48
4/19
6/22
0/20
0/17
2/24
2/19
10/50
4/19
6/20
0/20
1/20
2/23
5/46
4/20
4/19
6/20
0/20
1/20
2/20
3/37
4/20
5/22
6/20
0/20
1/20
2/20
2/17
4/20
11/46
16/52
0/19
1/20
2/20
7/49
4/20
12/49
15/47
0/19
1/19
2/20
7/47
4/20
5/20
15/46
0/19
1/19
2/20
3/20
4/20
5/20
9/24
263
0.136 =
+
0.1032 =
( + )2 ( + + 1)
yields = 1.4 and = 8.6
Prof. Dr. Renate Meyer
264
4 WinBUGS Applications
4 WinBUGS Applications
Assumptions:
I
no time trend
Questions:
I
Can we use the same prior to make inference about the tumor
probabilities in the first 70 groups?
4 WinBUGS Applications
265
4 WinBUGS Applications
J
Y
266
f (j |)
i=1
f (, ) = f ()f (|)
By integration, the joint (unconditional or marginal) distribution is
#
Z "Y
J
f (1 , . . . , J ) =
f (j |) f ()d
i=1
f (, |y ) = f (y |, )f (, )
267
= f (y |)f (|)f ()
268
4 WinBUGS Applications
4 WinBUGS Applications
Hyperprior Distribution:
exp(i )
1 + exp(i )
N(, )
=
and specify the following diffuse hyperprior distrution for mean and
precision :
N(0, 0.001)
Gamma(0.001, 0.001)
Applied Bayesian Inference
4 WinBUGS Applications
269
# rat example
model
{ for (i in 1:71){
y[i] ~ dbin(theta[i],n[i])
theta[i] <- exp(mu[i])/(1+exp(mu[i]))
mu[i] ~ dnorm(nu,tau)
r[i]<-y[i]/n[i]
}
nu ~ dnorm(0.0,0.001)
tau ~ dgamma(0.001,0.001)
mtheta<-exp(nu)/(1+exp(nu))
}
#inits
list(nu=0,tau=1)
4 WinBUGS Applications
270
mean
0.1261
-1.941
2.399
0.2059
sd
0.01336
0.1224
1.134
0.077
MC error
3.035E-4
0.002774
0.03409
7.983E-4
2.5%
0.1002
-2.195
1.052
0.0827
median
0.126
-1.937
2.184
0.1965
97.5%
0.1526
-1.715
4.891
0.3825
From the boxplot and the "model fit" plot of j estimates against sample
proportions rj , we see that rates j are shrunk from their sample point
estimate rj = yj /nj , towards the population distribution with mean
0.126. Experiments with fewer observations are shrunk more and have
higher posterior variances. In contrast to the model with fixed prior
parameters, this full Bayesian hierarchical analysis has taken the
uncertainty in the hyperparameters into account.
Prof. Dr. Renate Meyer
271
bo x pl ot: the ta
0.6
[7 0 ]
[2 8 ]
0.4
[6 ]
[5 ]
[4 ]
[3 ]
0.2
[1 4 ]
[7 ]
[1[2
] ]
3]
[1[1
2]
[1 1 ]
[3 5 ]
[6 3 ]
[5 6 ]
[4 9 ]
[7 1 ]
[6 2 ]
[2 7 ]
[4 1 ]
[4 0 ]
4]
[3[3
3]
[4 8 ]
[4 7 ]
[5 5 ]
[5 4 ]
[6 9 ]
[6 8 ]
[6 1 ]
[2 6 ]
[1 9 ]
[6 7 ]
[4 6 ]
[2 5 ]
[1 0 ]
[9 ]
[8 ]
[4 2 ]
[2 1 ]
[2 0 ]
[1 7 ]
[1 8 ]
[1 6 ]
[1 5 ]
[2 4 ]
[2 3 ]
[2 2 ]
[3[3
8]
9]
[3 1 ]
[3 0[3
] 2]
[2 9 ]
[3 7 ]
[3 6 ]
[4 5 ]
[4 4 ]
[4 3 ]
[5[5
2]
3]
[5 1 ]
[5 0 ]
0]
[5[6
9]
[5 8 ]
[5 7 ]
[6 6 ]
[6 5 ]
[6 4 ]
0.0
272
4 WinBUGS Applications
4 WinBUGS Applications
m o de l fit: the ta
0.6
0.4
0.2
0.0
0.0
0.1
0.2
0.3
0.4
4 WinBUGS Applications
273
Pump
1
2
3
4
5
6
7
8
9
10
ti
94.50
15.70
62.90
126.00
5.24
31.40
1.05
1.05
2.10
10.50
xi
5
1
5
14
3
19
1
1
4
22
Applied Bayesian Inference
4 WinBUGS Applications
274
i = 1, . . . , 10
i = 1, . . . , 10.
275
model
{
for (i in 1 : N) {
theta[i] ~ dgamma(alpha, beta)
lambda[i] <- theta[i] * t[i]
x[i] ~ dpois(lambda[i])
}
alpha ~ dexp(1)
beta ~ dgamma(0.1, 1.0)
}
list(t=c(94.3,15.7,62.9,126,5.24,31.4,1.05,1.05,2.1,10.5),
x=c(5,1,5,14,3,19,1,1,4,22), N=10) #data
list(alpha = 1, beta = 1) #inits
276
4 WinBUGS Applications
4 WinBUGS Applications
mean
0.6874
0.9126 0.5411
0.0599
0.1012
0.08922
0.1148
0.5964
0.6067
0.9106
0.8997
1.599
1.995
sd
0.2723
0.01506
0.02496
0.07978
0.03818
0.03023
0.3127
0.137
0.7541
0.7396
0.7679
0.4327
MC error
0.007535
0.1771
3.49E-4
0.001012
5.284E-4
3.901E-4
0.004145
0.001753
0.01089
0.01236
0.01115
0.00605
2.5%
0.2806
0.8161
0.02099
0.00801
0.03137
0.06324
0.1508
0.3761
0.07487
0.07952
0.4925
1.254
median
0.6456
2.222
0.05683
0.08247
0.08349
0.1121
0.5445
0.595
0.7165
0.7016
1.467
1.966
4 WinBUGS Applications
97.5%
1.338
0.1184
0.3089
0.1786
0.1829
1.338
0.9082
2.845
2.732
3.444
2.917
277
4 WinBUGS Applications
278
failures
5
1
5
14
3
19
1
1
4
22
MLE
0.0530
0.0637
0.0795
0.1111
0.5725
0.6051
0.9524
0.9524
1.9048
2.0952
Bayesian
0.0599
0.1012
0.08922
0.1148
0.5964
0.6067
0.9106
0.8997
1.599
1.995
279
i s far from the common mean (0.7389) are shrunk more than
those near it.
280
4 WinBUGS Applications
4 WinBUGS Applications
4.0
4.0
[9]
3.0
3.0
[10]
[7]
[8]
2.0
2.0
1.0
[5]
[6]
1.0
0.0
[2]
[1]
[3]
[4]
0.0
0.0
4 WinBUGS Applications
281
2.0
3.0
1.0
4 WinBUGS Applications
283
282
284
4 WinBUGS Applications
4 WinBUGS Applications
Survival Analysis
Hazard Function
4 WinBUGS Applications
285
4 WinBUGS Applications
Hazard Function
286
Hazard Function
d
Since f (t) = dt
S(t), Definition 4.6 implies that
h(t) =
d
log S(t)
dt
(4.1)
(4.2)
(4.3)
287
288
4 WinBUGS Applications
4 WinBUGS Applications
S(t) = exp(t ),
The ratio of hazards for two individuals is constant over time. Often,
the effect on the covariates is assumed to be multiplicative, leading to
the hazard function
h(t, x) = h0 (t) exp(x0 )
t 1 ,
h(t) =
H(t) = t .
4 WinBUGS Applications
289
4 WinBUGS Applications
Partial Likelihood
290
Assumptions:
I n individuals, d have distinct event times, n d have right
censored survival times
I no ties, ordered survival times: y(1) , . . . , y(d)
I Rj = set of individuals who are at risk at time y(j) , jth risk set
exp(x0(j) )
n
Y
P
i=1
0
lRj exp(xl )
(4.4)
and
i =
0
1
if ti ci ,
if ti > ci .
291
292
4 WinBUGS Applications
4 WinBUGS Applications
n
Y
i=1
n
Y
i=1
n
Y
i=1
n
Y
i=1
n
Y
i=1
4 WinBUGS Applications
293
4 WinBUGS Applications
Censoring in WinBUGS
294
dweib(rho,gamma)I(t.cen[i],)
295
296
4 WinBUGS Applications
4 WinBUGS Applications
The next page gives survival times (in half-days) from the MAC
treatment trial, where "+" indicates a censored observation
4 WinBUGS Applications
297
Treatm.
1
2
1
2
2
2
2
2
1
1
1
1
1
1
2
1
2
2
2
1
1
1
2
2
Time
74+
248
272+
244
20+
64
88
148+
162+
184+
188+
198+
382+
436+
32+
64+
102
162+
182+
364+
18+
36+
160+
254
Unit
B
B
Treatm.
2
1
Time
4+
156+
C
E
E
E
E
E
E
E
E
2
1
2
2
1
1
1
2
2
20+
50+
64+
82
186+
214+
214
228+
262
H
H
H
H
H
H
K
K
K
2
1
1
1
1
2
1
1
2
22+
22+
74+
88+
148+
162
28+
70+
106+
4 WinBUGS Applications
Treatm.
1
2
1
2
2
1
1
2
1
1
2
2
Time
6
16+
76
80
202
258+
268+
368+
380+
424+
428+
436+
I
I
I
I
I
I
I
I
I
I
I
I
I
2
2
2
1
1
2
1
2
1
1
2
2
1
8
16+
40
120+
168+
174+
268+
276
286+
366
396+
466+
468+
Unit
F
F
F
F
F
F
F
F
F
F
F
F
298
ij = exp(0 + 1 xij )
so that
Tij Weibull(i , ij ).
iid
i Gamma(, ).
Thus, the mean of the i is one, corresponding to a constant baseline
hazard and variance 1 . We put a proper but low information
Gamma(3.0, 0.1) prior on , reflecting a prior guess for the standard
deviation of i of 301/2 0.18 allowing a fairly broad region of values
centered around one.
Prof. Dr. Renate Meyer
299
300
4 WinBUGS Applications
4 WinBUGS Applications
WinBUGS Output
Based on 10,000 iterations and burn-in of 5,000:
model{
for (i in 1 : 69) {
t[i] ~ dweib(rho[unit[i]], mu[i]) I(t.cen[i], )
mu[i] <-exp(beta0+beta1*x[i])
}
for (k in 1:11){
rho[k] ~ dgamma(alpha,alpha)
}
alpha ~ dgamma(3.0,0.1)
beta0 ~ dnorm(0.0,0.001)
beta1 ~ dnorm(0.0,0.001)
r <- exp(2.0*beta1)
}
4 WinBUGS Applications
301
node
alpha
beta0
beta1
r
rho[1]
rho[2]
rho[3]
rho[4]
rho[5]
rho[6]
rho[7]
rho[8]
rho[9]
rho[10]
rho[11]
mean
48.45
-6.788
0.5973
3.887
1.028
0.9848
0.972
0.999
1.066
0.9642
0.9724
1.038
0.9756
1.008
0.9616
sd
20.12
0.4114
0.2805
2.515
0.1078
0.1456
0.1414
0.1108
0.1024
0.08855
0.1169
0.1273
0.09325
0.12
0.1386
MC error
0.3892
0.01758
0.009956
0.08594
0.002538
0.003415
0.002471
0.004363
0.002894
0.002924
0.00354
0.003974
0.003106
0.002795
0.003722
median
45.61
-6.78
0.5894
3.251
1.029
0.9794
0.9696
1.0
1.064
0.9654
0.9709
1.038
0.9763
1.006
0.96
2.5%
18.47
-7.626
0.06683
1.143
0.8111
0.704
0.7016
0.7739
0.8667
0.7894
0.748
0.7931
0.7885
0.7667
0.6873
4 WinBUGS Applications
97.5%
95.32
-6.006
1.189
10.78
1.237
1.289
1.255
1.214
1.273
1.133
1.204
1.296
1.158
1.248
1.242
302
WinBUGS Output
303
304
4 WinBUGS Applications
4 WinBUGS Applications
State-space models are among the most powerful tools for dynamic
modeling and forecasting of time series and longitudinal data.
Overviews can be found in Fahrmeir and Tutz (1994) and Kuensch
(2001).
Observation equation:
yt = ht (t ) + vt
State equation:
t = gt (t1 ) + ut
gives the Markovian transition of state t1 to t where ut denotes an
error distribution. The ability to include knowledge of the system
behaviour in the statistical model is largely what makes state-space
modeling so attractive for biologists, economists, engineers and
physicists.
Applied Bayesian Inference
4 WinBUGS Applications
305
4 WinBUGS Applications
306
The data available for stock assessment purposes quite often consist
of a time series of annual catches Ct , t = 1, . . . , N, and relative
abundance indices It , t = 1, . . . , N, such as research survey catch
rates or catch-per-unit-effort (CPUE) indices from commercial
fisheries.
For example, the next table gives an historical dataset of catch-effort
data of South Atlantic albacore tuna (Thunnus alalunga) from 1967 to
1989. Catch is in thousands of tons and CPUE in (kg/100 hooks).
307
Year (t)
1967
1968
1969
1970
1971
..
.
Catch (Ct )
15.9
25.7
28.5
23.7
25.0
..
.
CPUE (It )
61.89
78.98
55.59
44.61
56.89
..
.
1987
1988
1989
37.5
25.9
25.3
23.36
22.36
21.91
308
4 WinBUGS Applications
4 WinBUGS Applications
Age-composition data are not available for this stock. This dataset has
previously been analysed by Polacheck et al. (1993).
new biomass =
old biomass
+ growth
+ recruitment
Objectives: estimation of
I
natural mortality
catch
4 WinBUGS Applications
4 WinBUGS Applications
310
(4.5)
(4.6)
311
312
4 WinBUGS Applications
4 WinBUGS Applications
4 WinBUGS Applications
313
4 WinBUGS Applications
314
State equations:
P1 | 2 = eu1 ,
Pt |Pt1 , K , r , 2 = (Pt1 + rPt1 (1 Pt1 ) Ct1 /K ) eut ,
t = 2, . . . , N
(4.7)
Observation equations:
It |Pt , q, 2 = qKPt evt ,
t = 1, . . . , N,
(4.8)
p(K , r , q, 2 , 2 , P1 , . . . , PN ) = p(K )p(r )p(q)p( 2 )p( 2 )p(P1 | 2 )
N
Y
where ut are iid normal with mean 0 and variance 2 , and vt are iid
normal with mean 0 and variance 2 .
Prof. Dr. Renate Meyer
p(Pt |Pt1 , K , r , 2 ).
i=2
315
316
4 WinBUGS Applications
4 WinBUGS Applications
p(I1 , . . . , IN |K , r , q, 2 , 2 , P1 , . . . , PN ) =
N
Y
p(It |Pt , q, 2 ).
(4.10)
t=1
p(q) 1/q,
2 inverse-gamma(3.79, 0.0102),
2 inverse-gamma(1.71, 0.0086).
i=2
N
Y
Applied Bayesian Inference
317
p(It |Pt , q, 2 )
Applied Bayesian
Inference
t=1
318
(4.11)
4 WinBUGS Applications
4 WinBUGS Applications
model {
# lognormal prior on K
K ~ dlnorm(5.042905,3.7603664)I(10,1000)
# lognormal prior on r
r ~ dlnorm(-1.151293,1.239084233)I(0.005,1.0)
# instead of improper (prop. to 1/q) use just proper IG
iq ~ dgamma(0.001,0.001)I(0.5,200)
q <- 1/iq
# inverse gamma on isigma2
isigma2 ~ dgamma(a0,b0)
sigma2 <- 1/isigma2
# inverse gamma on itau2
itau2
~ dgamma(c0,d0)
tau2
<- 1/itau2
Prof. Dr. Renate Meyer
319
Pmean[1] <- 0
P[1] ~ dlnorm(Pmean[1],isigma2) I(0.05,1.6)
for (i in 2:N) {
Pmean[i]<-log(max(P[i-1] + r*P[i-1]*(1-P[i-1]) - C[i-1]
P[i]
~ dlnorm(Pmean[i],isigma2)I(0.05,1.5)
}
for (i in 1:N) {
Imean[i] <- log(q*K*P[i])
I[i] ~ dlnorm(Imean[i],itau2)
}
P24 ~ dlnorm(Pmean24, isigma2)I(0.05,1.5)
Pmean24<- log(max(P[23] + r*P[23]*(1-P[23]) - C[23]/K,0.01)
MSP<- r*K/4
B_MSP<- K/2
E_MSP<- r/(2*q)
}
Prof. Dr. Renate Meyer
320
4 WinBUGS Applications
4 WinBUGS Applications
P[t-1]
is igm a2
Pm ed[t]
P[t]
C[t-1]
node
BMSP
EMSP
K
MSP
P[1]
P[2]
P[3]
P[4]
P[21]
P[22]
P[23]
P24
q
r
sigma2
tau2
Pm ed[t+1]
Im ed[t]
I[t]
iq
itau2
for(t IN 2 : N)
4 WinBUGS Applications
321
mean
135.5
0.6154
271.0
19.52
1.018
0.9944
0.8772
0.7825
0.4175
0.353
0.3271
0.2964
0.2486
0.3088
0.003105
0.01225
MC error
1.272
0.001935
2.544
0.05968
8.062E-4
0.001368
0.001485
0.001524
8.162E-4
9.208E-4
0.00103
0.001221
0.002411
0.003559
2.22E-5
2.778E-5
4 WinBUGS Applications
323
2.5%
87.2
0.4346
174.4
13.9
0.919
0.8737
0.7616
0.6711
0.3545
0.292
0.2573
0.2093
0.1449
0.1416
0.001132
0.005832
median
130.2
0.6148
260.4
19.76
1.016
0.986
0.8726
0.779
0.4156
0.35
0.3241
0.2926
0.244
0.3031
0.00261
0.01145
sd
32.44
0.09112
64.88
2.537
0.05427
0.07386
0.06548
0.06205
0.03452
0.03519
0.03964
0.04939
0.06136
0.09576
0.001912
0.004516
97.5%
2121
0.8002
424.2
23.94
1.133
1.164
1.019
0.9144
0.491
0.4296
0.4123
0.4028
0.3777
0.5104
0.008057
0.02327
322
324
4 WinBUGS Applications
4 WinBUGS Applications
4 WinBUGS Applications
325
4 WinBUGS Applications
326
The SV model used for analyzing these data can be written in the form
of a nonlinear state-space model:
Observation equations:
1
yt |t = exp
t u t ,
2
returns.dat
-0.320221363079782
1.46071929942995
-0.408629619810947
1.06096027386685
1.71288920763163
0.404314365893326
-0.905699012715806
.
.
.
2.22371628398118
Prof. Dr. Renate Meyer
iid
ut N(0, 1), t = 1, . . . , n.
(4.12)
State equations:
t |t1 , , , 2 = + (t1 ) + vt ,
iid
vt N(0, 2 ), t = 1, . . . , n,
(4.13)
with 0 N(, 2 ).
327
328
4 WinBUGS Applications
4 WinBUGS Applications
4 WinBUGS Applications
329
p(t |t1 , , , 2 ).
(4.14)
4 WinBUGS Applications
n
Y
t=1
330
p(y1 , . . . , yn |, , 2 , 0 , . . . , n )
The likelihood
is specified by the
observation equations (4.12) and the conditional independence
assumption:
n
Y
p(yt |t ).
Qn
(4.15)
2 )
t=1
Qn
t=1 p(yt |t ).
(4.16)
331
332
4 WinBUGS Applications
4 WinBUGS Applications
mu
theta[t-1]
phi
thmean[t]
itau2
The solid arrows indicate that given its parent nodes, each node v is
independent of all other nodes except descendants of v .
theta[t]
thmean[t+1]
yisigma2[t]
y[t]
for(t IN 1 : n)
4 WinBUGS Applications
333
4 WinBUGS Applications
334
mean
0.7163
-0.6927
0.9805
0.1493
sd
0.1244
0.3074
0.01081
0.03052
MC error
0.00958
0.02252
8.306E-4
0.002965
2.5%
0.5554
-1.176
0.9552
0.1033
median
0.6925
-0.735
0.9823
0.1435
97.5%
1.005
0.01074
0.9962
0.2196
335
336
4 WinBUGS Applications
4.10 Copulas
4 WinBUGS Applications
4.10 Copulas
Copulas
Applications of Copulas
4 WinBUGS Applications
337
4.10 Copulas
4 WinBUGS Applications
Definition of a Copula
338
4.10 Copulas
Definition 4.7
Theorem 4.8
C(u1 , . . . , ud ) = P(U1 u1 , . . . , Ud ud )
(4.17)
339
340
4 WinBUGS Applications
4.10 Copulas
4 WinBUGS Applications
Copula Density
4.10 Copulas
Frank Copula
(4.18)
C(u, v ) =
1
(eu 1)(ev 1)
log 1 +
e 1
(, )\{0}
Gumbel Copula
[1, )\{0}
2
C(u1 , u2 )
u1 u2
(0, 1]
Gaussian Copula
C(u, v ) = 1 (u), 1 (v )
where is the standard bivariate normal distribution function with
correlation , and is the standard normal distribution function.
Applied Bayesian Inference
4 WinBUGS Applications
341
4.10 Copulas
4 WinBUGS Applications
or equivalently:
(xi xj )(yi yj ) > 0
((xi xj )(yi yj ) < 0)
342
4.10 Copulas
343
344
4 WinBUGS Applications
4.10 Copulas
4 WinBUGS Applications
Parameter Estimation
Frank
2
2+
Z
1 t
4
=1
1
dt
0 et 1
=1
Gumbel
= 1 1
Gauss
4 WinBUGS Applications
345
4 WinBUGS Applications
346
4.10 Copulas
4.10 Copulas
Simulation Study
2
arcsin()
4.10 Copulas
library(copula)
library(R2WinBUGS)
p <- 2 # copula dimension
tau <- 0.8 # value of Kendalls tau
alpha<-2*tau/(1-tau) #relationship between tau and alpha
c.clayton<-archmCopula(family="clayton",dim=p,param=alpha)
347
348
4 WinBUGS Applications
4.10 Copulas
4 WinBUGS Applications
w[, 2]
Simulation Study
10000
20000
f (y1 , . . . , yn |) =
50000
i=1
60000
w[, 1]
4 WinBUGS Applications
349
4 WinBUGS Applications
To ensure that the Poisson means are all positive, we may have to add
a positive constant C to each l(i). This is equivalent to multiplying
the likelihood by a constant term enC . With this approach, the original
likelihood can be written as the product of Poisson likelihoods with
observations all equal to zero:
f (y|) =
i=1
0!
e(l(i)+C) =
0!
e(l(i))
4.10 Copulas
n
Y
el(i)
n
Y
(l(i) + C)0
n
Y
i=1
n
Y
l(i)0
=
40000
f (yi |) =
i=1
Figure 22: Scatterplot of 500 simulated values from Clayton copula with
Exp(0.0001) marginals.
Prof. Dr. Renate Meyer
n
Y
30000
4.10 Copulas
350
4.10 Copulas
n
Y
el(i)
1
1 el(i)
0
i=1
i=1
n
Y
fBernoulli (1|el(i) )
i=1
C <- 10000
for (i in 1:n){
zeros[i]<-0
zeros[i]~ dpois(zeros.mean[i])
zeros.mean[i]<- -l[i]+C
l[i]<- ...#expression of log-likelihood for obs. i
} Prof. Dr. Renate Meyer
Applied Bayesian Inference
351
i.e. the product of Bernoulli densities with success probability el(i) and
all observations equal to 1.
352
4 WinBUGS Applications
4.10 Copulas
4 WinBUGS Applications
n
Y
l(i)C
1
1e
l(i)C
0
i=1
n
Y
fBernoulli (1|el(i)C )
i=1
4 WinBUGS Applications
353
#Call WinBUGS
data=list(N=500,x=w[,1],y=w[,2])
inits=list(list(lambda1=0.001,lambda2=0.002,alpha=5))
parameters=c("lambda1","lambda2","alpha")
clayton.sim<-bugs(data,inits,parameters.to.save=parameters,
model.file="model_clayton.odc", n.chains=1,
n.iter=2000,n.burnin=1000,working.directory=getwd())
4.10 Copulas
4 WinBUGS Applications
354
4.10 Copulas
model
{lambda1 ~ dgamma(0.001,0.001) #Jeffreys prior
lambda2 ~ dgamma(0.001,0.001) #Jeffreys prior
alpha ~ dunif(0,100) #Uniform prior on alpha
# likelihood specification using zeros trick
C<-10000
for(i in 1:N) {
zeros[i] <-0
zeros[i] ~ dpois(mu[i])
mu[i]<- - l[i] +C
u[i] <- 1-exp(-lambda1*x[i])
v[i] <- 1-exp(-lambda2*y[i])
l[i]<-log((1+alpha)*
pow(pow(u[i],-alpha)+pow(v[i],-alpha)-1,-1/alpha-2)
*pow(u[i],-alpha-1)*pow(v[i],-alpha-1)*
lambda1*exp(-lambda1*x[i])*lambda2*exp(-lambda2*y[i])) }}
Applied Bayesian Inference
4.10 Copulas
355
mean
8.001
1.002E+7
9.434E-5
9.415E-5
sd
0.3863
2.507
3.815E-6
3.813E-6
MC error
0.02022
0.1517
4.306E-7
4.298E-7
2.5%
7.279
1.002E+7
8.75E-5
8.723E-5
median
8.007
1.002E+7
9.401E-5
9.383E-5
97.5%
8.789
1.002E+7
1.018E-4
1.017E-4
356
5 References
5 References
References I
References II
Carlin, B.P., Polson, N.G., and Stoffer, D.S. (1992). A Monte Carlo
approach to nonnormal and nonlinear state-space modeling. J.
Amer. Statist. Assoc. 87, 493500.
Carlin, B.P. and Louis, Th.A. (2008) Bayesian Methods for Data
Analysis, Chapman & Hall.
357
5 References
References IV
358
5 References
References III
I
359
Gelman, A., Carlin, J., Stern, H., Rubin, D. (2004), Bayesian Data
Analysis, Texts in Statistical Science, 2nd ed., Chapman& Hall,
London.
Prof. Dr. Renate Meyer
360
5 References
5 References
References V
I
References VI
Kuensch, H.R. (2001), State space and hidden Markov models, In:
Barndorff-Nielsen et al. (Ed.), Complex stochastic systems,
Chapman & Hall, London, 109174.
361
5 References
References VIII
I
Lawless J.F. (1982) Statistical Models and Methods for Life Time
Data. New York, Wiley.
362
5 References
References VII
I
363
364