Sie sind auf Seite 1von 12

Statistics 512 Notes 9: The Monte Carlo Method

Continued
The Monte Carlo method:
Consider a function
( ) g X
of a random vector
X
where
X
has density
( ) f X
. Consider the expected value of
( ) g X
:
[ ( )] ( ) ( ) E g X g x f x dx

.
Suppose we take an iid random samples
1
, ,
m
X X K
from
the density
( ) f X
.
Then by the law of large numbers
1
( )
[ ( )]
m
P
i
i
g X
E g X
m

The Monte Carlo method is to do a simulation to draw


1
, ,
m
X X K
from the density
( ) f X
and estimate
[ ( )] E g X

by
1
( )

[ ( )]
m
i
i
g X
E g X
m


.
In a simulation, we can make
m
as large as we want.
Standard error of the estimate is
2
1
1

[ ( )]
( )
( )
m
m
i
i
i
i
E g X
g X
g X
m
S
m

By the Central Limit Theorem, an approximate 95%


confidence interval for
[ ( )] E g X
is

[ ( )]

[ ( )] 1.96
E g X
E g X S t
Example: Monte Carlo estimation of

Define the unit square as a square centered at (0.5,0.5) with


sides of length 1 and the unit circle as the circle centered at
the origin with a radius of length 1. The ratio of the area of
the unit circle that lies in the first quadrant to the area of the
unit square is / 4 .
Let
1
U
and
2
U
be iid uniform (0,1) random variables. Let
1 2
( , ) g U U
=1 if
1 2
( , ) U U
is in the unit circle and 0
otherwise. Then
1 2
[ ( , )]
4
E g U U

.
Monte Carlo method: Repeat the experiment of drawing
1 2
( , ) X U U
,
1
U
and
2
U
iid uniform (0,1) random
variables, m times and estimate

by
1 2
1
( , )
4
n
i i
i
g U U
m


. An approximate 95% confidence interval for

is
2
1 2
1
1 2
1
( , )
( , )
1.96*4
m
m
i i
i
i i
i
g U U
g U U
m
m


,
t

(1)
Because
1 2
( , ) g U U
=0 or 1, (1) is equivalent to
(1 )
1.96*4
m


t
In R, the command runif(n) draws n iid uniform (0,1)
random variables.
Here is a function for estimating pi:
piest=function(m){
#
# Obtains the estimate of pi and its standard
# error for the simulation discussed in Example 5.8.1
#
# n is the number of simulations
#
# Draw u1, u2 iid uniform (0,1) random variables
u1=runif(m);
u2=runif(m);
cnt=rep(0,m);
# chk=Vector which checks if (u1,u2) is in the unit circle
chk=u1^2+u2^2-1;
# cnt[i]=1 if (u1,u2) is in unit circle
cnt[chk<0]=1;
# Estimate of pi
est=4*mean(cnt);
# Lower and upper confidence interval endpoints
lci=est-4*(mean(cnt)*(1-mean(cnt))/m)^.5;
uci=est+4*(mean(cnt)*(1-mean(cnt))/m)^.5;
list(estimate=est,lci=lci,uci=uci);
}
> piest(100000)
$estimate
[1] 3.13912
$lci
[1] 3.133922
$uci
[1] 3.144318
Back to Example 5.8.5:
The true size of the 0.05 nominal size t-test for random
samples of size 20 contaminated normal distribution A?
We want to estimate
1 20
[ { ( , , ) 1.729}] E I t x x > K
Monte Carlo method:
,1 ,20
1
1 20
{ ( , , ) 1.729}

[ { ( , , ) 1.729}]
m
i i
i
I t x x
E I t x x
m

>
>

K
K
where
,1 ,20
( , , )
i i
x x K
is a random sample of size 20 from the
contaminated normal distribution A.
[Here
1 20
( , , ) X X X K
and
( ) f X
is the density of a
random sample of size 20 from the contaminated normal
distribution A and
1 20
( ) { ( , , ) 1.729} g X I t X X > K
.]
How to draw a random observation from the contaminated
normal distribution A?
(1) Draw a Bernoulli random variable B with p=0.25;
(2) If B=0, draw a random observation from the
standard normal distribution. If B=1, draw a
random observation from the normal distribution
with mean 0 and standard deviation 25.
In R, the command rnorm(n,mean=0,sd=1) draws a random
sample of size n from the normal distribution with the
specified mean and SD. The command rbinom(n,size=1,p)
draws a random sample of size n from Bernoulli
distribution with probability of success p.
R function for obtaining Monte Carlo estimate
1 20

[ { ( , , ) 1.729}] E I t x x > K
empalphacn=function(nsims){
#
# Obtains the empirical level of the test discussed in
# Example 5.8.5
#
# nsims is the number of simulations
#
sigmac=25; # SD when observation is contaminated
probcont=.25; # Probability of contamination
alpha=.05; # Significance level for t-test
n=20; # Sample size
tc=qt(1-alpha,n-1); # Critical value for t-test
ic=0; # ic will count the number of times t-test is rejected
for(i in 1:nsims){
# Bernoulli random variable which determines whether
# each observation in sample is from standard normal or
# normal with SD sigmac
b=rbinom(n,size=1,prob=probcont);
# Sample observations from standard normal when b=0 and
# normal with SD sigmac when b=1
samp=rnorm(n,mean=0,sd=1+b*24);
# Calculate t-statistics for testing mu=0 based on sample
tstat=mean(samp)/(var(samp)^.5/n^.5);
# Check if we reject the null hypothesis for the t-test
if(tstat>tc){
ic=ic+1;
}
}
# Estimated true significance level equals proportion of
# rejections
empalp=ic/nsims;
# Standard error for estimate of true significance level
se=1.96*((empalp*(1-empalp))/nsims)^.5;
lci=empalp-1.96*se;
uci=empalp+1.96*se;
list(empiricalalpha=empalp,lci=lci,uci=uci);
}
> empalphacn(100000)
$empiricalalpha
[1] 0.04086
$lci
[1] 0.03845507
$uci
[1] 0.04326493
Based on these results the nominal 0.05 size t-test appears
to be slightly conservative when a sample of size 20 is
drawn from this contaminated normal distribution.
Generating random observations with given cdf F
Theorem 5.8.1: Suppose the random variable U has a
uniform (0,1) distribution. Let F be the cdf of a random
variable that is strictly increasing on some interval I, where
F=0 to the left of I and F=1 to the right of I. Then the
random variable
1
( ) X F U

has cdf F, where


1
(0) F

=left
endpoint of I and
1
(1) F

=right endpoint of I.
Proof: A uniform distribution on (0,1) has the CDF
( )
U
F u u
for
(0,1) u
. Using the fact that the CDF F is a
strictly monotone increasing function on the interval I, then
on
1
1
[ ] [ ( ) ]
= [ ( ( )) ( )]
= [ ( )]
= ( )
P X x P F U x
P F F U F x
P U F x
F x

Difficult to use this method when simulating random


variables whose inverse CDF cannot be obtained in closed
form.
Other methods for simulating a random variable:
(1) Accept-Reject Algorithm (Chapter 5.8.1)
(2) Markov chain Monte Carlo Methods (Chapter 11.4)
R commands for generating random variables
runif -- uniform random variables
rbinom -- binomial random variables
rnorm -- normal random variables
rt -- t random variables
rpois -- Poisson random variables
rexp -- exponential random variables
rgamma -- gamma random variables
rbeta -- beta random variables
rcauchy -- Cauchy random variables
rchisq -- chisquared random variables
rF -- F random variables
rgeom -- geometric random variables
rnbinom -- negative binomial random variables
Bootstrap Procedures
Bootstrap standard errors
1
, ,
n
X X K
iid with CDF F and variance
2
.
( )
2
1
1
2
1
n
n
X X
Var Var X X
n n n
+ +
_
+ +

,
L
L
.
( ) SD X
n

.
We estimate ( ) SD X by
( )
s
SE X
n

where
s
is the
sample standard deviation.
What about
1
{ ( , , )}
n
SD Median X X K
? This SD depends in
a complicated way on the distribution F of the Xs. How to
approximate it?
Real World:
1 1
, , ( , , )
n n n
F X X T Median X X K K
.
The bootstrap principle is to approximate the real world by
assuming that

n
F F where

n
F is the empirical CDF, i.e.,
the distribution that puts
1
n
probability on each of
1
, ,
n
X X K
. We simulate from

n
F by drawing one point at
random from the original data set.
Bootstrap World:
* * * * *
1 1

, , ( , , )
n n n n
F X X T Median X X K K
The bootstrap estimate of
1
{ ( , , )}
n
SD Median X X K
is
* *
1
{ ( , , )}
n
SD Median X X K
where
* *
1
, ,
n
X X K
are iid draws
from

n
F .
How to approximate
* *
1
{ ( , , )}
n
SD Median X X K
?
The Monte Carlo method.
( )
( ) [ ] { } [ ]
2
1 1
2
2
1 1
2
2
1 1
( ) ( )
1 1
( ) ( )
( ) ( ) ( )
m m
i i
i i
P
m m
i i
i i
g X g X
m m
g X g X
m m
E g X E g X Var g X


_


,
_


,
1

]


Bootstrap Standard Error Estimation for Statistic
1
( , , )
n n
T g X X K
:
1. Draw
* *
1
, ,
n
X X K
.
2. Compute
* * *
1
( , , )
n n
T g X X K
.
3. Repeat steps 1 and 2 m times to get
* *
,1 ,
, ,
n n m
T T K
4. Let
2
*
*
,
1 1 ,
1 1 m m
boot n i
i r n r
se T T
m m

_


,

The bootstrap involves two approximations:
not so small approx. error small approx. error

( ) ( )
n
F n n boot
F
SD T SD T se
R function for bootstrap estimate of SE(Median)
bootstrapmedianfunc=function(X,bootreps){
medianX=median(X);
# vector that will store the bootstrapped medians
bootmedians=rep(0,bootreps);
for(i in 1:bootreps){
# Draw a sample of size n from X with replacement and
# calculate median of sample
Xstar=sample(X,size=length(X),replace=TRUE);
bootmedians[i]=median(Xstar);
}
seboot=var(bootmedians)^.5;
list(medianX=medianX,seboot=seboot);
}
Example: In a study of the natural variability of rainfall, the
rainfall of summer storms was measured by a network of
rain gauges in southern Illinois for the year 1960.
>rainfall=c(.02,.01,.05,.21,.003,.45,.001,.01,2.13,.07,.01,.0
1,.001,.003,.04,.32,.19,.18,.12,.001,1.1,.24,.002,.67,.08,.00
3,.02,.29,.01,.003,.42,.27,.001,.001,.04,.01,1.72,.001,.14,.2
9,.002,.04,.05,.06,.08,1.13,.07,.002)
> median(rainfall)
[1] 0.045
> bootstrapmedianfunc(rainfall,10000)
$medianX
[1] 0.045
$seboot
[1] 0.02167736

Das könnte Ihnen auch gefallen