Beruflich Dokumente
Kultur Dokumente
Coursework 1
Emily Lawes
Tutorial Group 6
eml27
Section 1
X 1 , , X N are independent random variables all following the same Gamma(
,1) distribution ( >0) with corresponding density
1
x 1 ex , x> 0
f ( x| )= ()
0, x0
()
Where
1.1
X Gamma( ,1)
k >0
E ( X )= x f ( x ) dx
However, by using
we observe
E[ g ( X ) ]= g ( x ) f ( x ) dx
E [ X ]= x k f ( x| ) dx
k
x k
0
1 1 x
x e dx
()
1 (+k)1 x
x
e dx
( )
x(+k)1 ex dx
( ) 0
E[ X ]
+k
we can notice
( +k )
1
( ( +k ) ) =
( )
()
1.2
1
^ AM = X = X i
n i=1
Now to find an analytical express for the mean square error (MSE) of this
estimator using just
and
error of an estimator T as
2
MSE ( T )=E( ( T ) )
Thus,
2
MSE ( ^ AM ) =E [ ( ^ AM ) ]
2
E[ (^ AM E ( ^ AM ) + E ( ^ AM ) ) ]
E ( ^ AM )
Var ( ^ ) +
2
AM
Var
) (
1
1
X i +[ E X i 2]
n i=1
n i=1
^ AM = X
note that:
( E ( X ) 2 ]
Var ( X ) +
Important results from lectures:
E ( X ) =E(X )
Var ( X )=Var ( X )/n
( X 1 ) = (+1) ; E ( X 2 ) = (+2)
()
()
function.
( X)
E
2
Var ( X )=E ( X )
Var ( X )
2
+[ ( E ( X ) ) ]
n
2
E ( X 2 )( E ( X ) )
+
n
( +1 ) 2
()
( +1 )
2
()
( +2)
()
1 ( + 2 ) ( +1 )
MSE ( ^ AM ) =
n ()
()
)) (
( +1 )
+
()
X 1 , , X n Gamma( , 1)
where
,n
Because we are working with integer values of k here, what I can now do is
compute the above mean square error by using a fundamental property of the
gamma function, that is
(+1)
=
()
and
(+2)
=(+1) for importantly,
()
, >1
Thus,
MSE ( ^ AM ) =
1
( ( + 1 )( )2 ) + ( )2= 1 ( 2 + 2 )+ ( 0 )2=
n
n
n
1.3
^ GM =
( )
i=1
Xi
1
n
MSE( ^ GM )
2
E ( ^ GM )
i=1
Xi
[(( ) ) ]
i =1
Var
X i
[ ( ) ] ( [ ( ) ] )
n
i=1
Xi
1
n
+ E
i =1
Xi
1
n
[ ( ( ) ) ] ( [( ) ] ) ( [( ) ] )
n
1
n
1
n
,n
(( ) )
n
MSE ( ^ GM ) =MSE
in terms of a function of
i =1
Xi
1 2
n
Now, because
i=1
Xi
X1 , , Xn
1
n
+ E
i=1
Xi
1
n
E ( X i X j )=E ( X i ) E ( X j ) ; i j
( )
( )
E X i n =[
and that
(( ) )
n
i=1
1
n
]
( )
Xi
1
n
i=1
Xs are i.i.d thus all of their expected values are the same, thus
[ ]
n
k
i
X =
i=1
i=1
E [ X ]=E [ X
k
i
k n
(+ k )
=
()
[( ) ] [
n
Xi
1
n
i=1
1
(+ )
n
=
()
Xi
n
i=1
[ (( ) ) ]
n
i=1
Xi
1 2
n
=E
2n
[ ( ) ] [ ( ) ] ([ ( ) ] )
2
+
n
MSE ( ^ GM ) =
()
1
+
n
()
1
+
n
+
()
[ ][ ]
1
+
n
MSE ( ^ GM ) =2 2
()
( )
2
+
n
+
( )
( )
Here, because k in this example is not an integer value, the useful definition
used in the above question cannot be used, thus we can leave the answer like
this.
1.4
Show that
log ( X i)
n
1
n i=1
^ ML= 1
' ()
d
= log ( )=
d
()
where
is
In this case,
inv.digamma in R.
f (x)
1
x 1 ex , x> 0
f ( x| )= ()
0, x 0
The log likelihood function is defined as
n
log (
1
1 x
xi e )
( )
i=1
(log
( 1( ) )+ log ( x
)+ log ( ex ))
i
i=1
(log ( ( ) ) + ( 1 ) log ( xi ) x i)
i=1
'
d log L(x) n ( )
=
+ log ( xi )
d
()
i=1
Now setting this equation to be zero, we get
' ()
log ( x i )=n ( )
i=1
' ()
1
log ( x i )=
n i=1
()
n
1
log ( x i )= ( )
n i=1
Now substituting
=^ ML
( Xi)
n
1
log
n i=1
1 ( )
log ( x i )=
n
n i=1
n
1
1
log ( x i )= log ( X i )
n i=1
n i=1
( xi )
n
1
log
n i=1
1
( Xi)
n
implying that the log likelihood equals zero at this point, thus
1
log
n i=1
1
is the
1.5
1
^ ML= 1 log ( X i ) =inv . digamma
n i=1
Where
Harmonic Mean
^ HM =n
X i1
i=1
1
Arithmetic Mean (
1
)
X
Code
harmonic.mean<-function(x){
1/mean(1/x)
}
mse.HM.theta<-function(theta0,r,n){
x.samples<-matrix(rgamma(n=n*r,scale=1,shape=theta0),nrow=n,ncol=r)
theta.hat.sample<-apply(X=x.samples,MARGIN=2,harmonic.mean)
mean((theta.hat.sample-theta0)^2)
}
Example
> mse.HM.theta(30,1000,100)
[1] 1.306678
Root-mean square
1
^ RMS= X i2= Arithmetic Mean( X 2)
n i=1
Code
rms<-function(x){
sqrt(mean(x^2))
}
mse.RMS.theta<-function(theta0,r,n){
x.samples<-matrix(rgamma(n=n*r,scale=1,shape=theta0),nrow=n,ncol=r)
theta.hat.sample<-apply(X=x.samples,MARGIN=2,rms)
mean((theta.hat.sample-theta0)^2)
}
1.6
n=100 and
graph. Then I will superimpose the calculated exact mean square error for the
Arithmetic and Geometric mean discovered in sections 1.2 and 1.3.
Code
n<-100
n.theta<-100
theta<-seq(0.1,50,length.out=n.theta)
r<-1000
## Define the graph space as a 2x2 matrix
par(mfrow=c(2,2))
load(url("http://people.bath.ac.uk/kai21/Stats2A/CW/inv.digamma.R"))
## We define functions that will be used to be used in the apply function, note
that harmonic mean already exists in R however I have defined it
## As below for use on my personal computer without the inbuilt function
root.mean.square<-function(x){
sqrt(mean(x^2))
}
mle<-function(x){
inv.digamma(mean(log(x)))
}
harmonic.mean<-function(x){
1/mean(1/x)
}
## Define the actual geometric mean here to avoid confusion in line drawing
code
geometric.mean.actual <- (gamma(theta+2/n)/gamma(theta))^n2*theta*(gamma(theta+1/n)/gamma(theta))^n+theta^2
## Define what M is for each estimator, i.e. something which we would like to
store n.theta values in
M.hm<-rep(0,n.theta)
M.rms<-rep(0,n.theta)
M.mle<-rep(0,n.theta)
for (i in 1:n.theta){
## We use the gamma function with rate/scale 1 and scale theta at each point,
and store these in a matrix
## This fits the distribution of all the Xis.
## We then apply the estimator function on each value, stored in a matrix
## We then compute the MSE of each i
x.samples.hm<matrix(rgamma(n=n*r,rate=1,shape=theta[i]),nrow=n,ncol=r)
theta.hat.sample.hm<-apply(X=x.samples.hm,MARGIN=2,harmonic.mean)
M.hm[i]<-mean((theta.hat.sample.hm-theta[i])^2)
x.samples.rms<matrix(rgamma(n=n*r,rate=1,shape=theta[i]),nrow=n,ncol=r)
theta.hat.sample.rms<-apply(X=x.samples.rms,MARGIN=2,root.mean.square)
M.rms[i]<-mean((theta.hat.sample.rms-theta[i])^2)
x.samples.mle<matrix(rgamma(n=n*r,rate=1,shape=theta[i]),nrow=n,ncol=r)
theta.hat.sample.mle<-apply(X=x.samples.mle,MARGIN=2,mle)
M.mle[i]<-mean((theta.hat.sample.mle-theta[i])^2)
}
## Plot all three on the same graph, we can add additional lines by using the
lines function
plot(theta,M.hm,type="l",xlab=expression(theta),ylab="MSE",main=paste("All
MSE for r=",r," and n=",n,sep=""),ylim=c(0,2),col="blue")
legend(0.1,2,c("Harmonic","RMS","MLE"),lty=c(1,1,1),col=c("blue","green","red")
,cex=0.8,x.intersp=1,xjust=-0.1,y.intersp=0.2,bty="n")
lines(theta,M.rms,col="green")
lines(theta,M.mle,col="red")
## On a new plot (i.e. the second), plot the same three with additional actual
MSE for the Arithmetic Mean
plot(theta,M.hm,type="l",xlab=expression(theta),ylab="MSE",main=paste("All
MSE for r=",r," and n=",n," with the \n addition of the MSE for the Arithmetic
Mean",sep=""),ylim=c(0,2),col="blue")
legend(0.1,2,c("Harmonic","RMS","MLE","AM"),lty=c(1,1,1,1),lwd=c(1,1,1,2),col=
c("blue","green","red","black"),cex=0.8,x.intersp=1,xjust=0.1,y.intersp=0.2,bty="n")
lines(theta,M.rms,col="green")
lines(theta,M.mle,col="red")
lines(theta,theta/n,col="black",lwd="2")
Notice in the diagrams overleaf that the set of MSE values for
for each graph, this is because the code has reused the vector M[i], this is for
easy comparison between the graphs and consistency, however when run again
the graph will look slightly different as the values of M[i] would be brought from a
new set of random variables.
Looking at my code above, I could have increased the value of n.theta to make
the number of mean values plotted increase, however at a certain value the
graph can look very concentrated and busy. I chose 100 as it incorporates a good
visual description of the MSE for certain values and showing the variation
between different values (i.e how the increase of r affects the variation), along
with not taking too long for the code to compute and produce the graphs, so we
shouldnt pick a value of n.theta that is too large.
1.7
Section 2
Consider the following approximation to the sampling distribution of the
maximum likelihood estimator
1
^ ML= 1 log ( X i )
n i=1
^ ML N , , for large n
n
( )
so that the approximation is better for when the size of the random sample,
is large.
2.1
Assuming the approximation above is exact, derive analytical expressions for the
end-points of the corresponding 95% confidence interval for
Z=
^ ML
Thus,
Z=
^ ML
N (0,1)
To work out the 95% confidence interval, we want to find the end-points, c1 and
c2, which satisfy
P ( z 1.96 )=0.975
and
end points for Z. However we would also like to pivot this away from , ie have
in the middle of the inequality to find the end points for
We can notice that because we are following a
of
c 1 and
c 1=c 2
Hence,
P ( Z <|c1|) =0.95
^ ML
<|1.96| =0.95
I can notice that to solve for theta in this case may be fairly tricky, thus I can
square both sides of the inequality to obtain a quadratic in . This will also follow
( ) )
^ ML
<1.962 =0.95
(^ ML )
^ 2ML 2 ^ ML +2 1.962
P
<
=0.95
( ))
2
1.96
P ^ 2ML 2 ^ ML + 2<
=0.95
n
( (( ) )
1.96 2
P
+ 2 ^ ML + ^ 2ML <0 =0.95
n
2
Thus to solve this, we can make use of the quadratic formula, with
(( ) )
a=1,b=
Because
1.962
+2 ^ ML , c=^ 2ML
n
^ ML and
) (( ) )
(( )
1.962
+2 ^ ML
n
1.962 ^
1
=
+ ML
2n
2
( (
1.96 2
+ 2 ^ ML 4 ^ 2ML
n
2
2
1.96 2
+2 ^ ML 4 ^ 2ML
n
1.962 ^
1
+ ML
2n
2
((
1.96 2
1.962 ^
1
+ 2 ^ ML 4 ^ 2ML ,
+ ML +
n
2n
2
((
1.962
+ 2 ^ ML 4 ^ 2ML
n
2.2
Now we would like to produce a function in R which will return the Coverage error
for a value of theta, a value of n (the size of the random variable) and the value
of r (the Monte Carlo precision).
Coverage error is defined as
( )=P ( a 1 ( X 1 , , X n ) <<a 2 ( X 1 , , X n ) )
Where a1 and a2 are the confidence intervals we found in question 2.1, for a
value
Hence, to compute the coverage error, we have to work with a sample of size r,
that follows the distribution of the estimator
and variance
/n .
Rather than working with the digamma function in section one, we are assuming
that this new distribution to be true.
The value of
this function.
Hence, the code to obtain this coverage error is below:
Code
CE.theta.mle<-function(theta0,n,r){
##The samples of theta.hat follow a normal distribution with mean theta0 and
standard deviation
2.3
Now replicate this for all values of theta between 0 and 50, and graph these for
different values of n.
To do this, we would like to take values of theta between 0 and 50, under a
certain amount of precision (n.theta) and run the code from section 2.2 for all of
these values, to compute the coverage error and display it on a graph. We would
also like to run it for several values of n, namely 100, 500, 1000 to see the
difference (if any) that n makes to the values of the coverage error.
Code
n.theta<-200 # Number of values of theta we would like to show the CE for
theta<-seq(1,50,length.out=n.theta) # avoid very small values here which may
break in the intervals (because sqrts will produce imaginary numbers if negative
inside,this may happen due to the randomess of the distribution), choosing the
value to be 1 shouldnt affect the result much as we run over up to 50 so
difference is insignificant
r<-1000 # Monte Carlo precision
par(mfrow=c(1,1)) # We would like a single combined figure, so default the size
of the plot screen to 1x1
CE<-rep(0,n.theta) # Create a vector for CE with size n.theta
n.vals=c(100,500,1000) # Values of n we would like to use
colours<-c("blue","green","red") #Created a vector of colours that we can use to
change the colour of the line for different values of n
col.index=0 # Start the index off at one so we can run through it: using 'i' or 'n'
to index wouldn't work here, as these values range from 0,50 and 100,500,1000
respectively
plot(theta,0*theta,type="l",ylab="Coverage error (%)",main=paste("Coverage
error for r =",r,"and \n n =",paste(n.vals,collapse=",
",sep="")),xlab=expression(theta),ylim=c(-6,6),col="black")
legend(0.1,5,c(paste("n=",n.vals[1]),paste("n=",n.vals[2]),paste("n=",n.vals[3])),
lty=c(1,1,1),lwd=c(1,1,1),col=c(colours[1],colours[2],colours[3]),cex=0.8,x.inters
p=1,xjust=-0.1,y.intersp=0.5,bty="n")
for (n in n.vals){
# run through each value in n.vals
for (i in c(1:n.theta)){
CE[i]<-CE.theta.mle(theta[i],n,r) # run the function as before, but
for all values of theta[i] across (0,50)
}
col.index=col.index+1 # run through the colour index; this can be
extended if we want more n's plotted on the graph
2.5
Looking at the curves produced, I notice that the coverage error is randomised
around 0. A value of around zero means that we are close to the 95% confidence
interval, in both a positive and negative direction, i.e the value of theta lies
within the interval 95% of the time, plus or minus some random noise. This
means that, every time I run the simulation, we get a different plot of coverage
error. This is because we are following a normal distribution where the standard
deviation tends towards zero as n increases, this means the values fall within a
very small range, however there is always some small error when we take a
normal distribution, as the thetas are random variables.
The intervals are reliant on the value of n, and the value of the theta samples.
Because theta.sample follows a normal distribution, even though we may be
decreasing the standard deviation, the normal distribution is symmetric,
meaning that as we take the average of these errors, the value of n wont
significantly affect it, as whatever values lie below the mean may cancel out with
those above.
I can see on my graph that increasing the value of n doesnt seem to affect the
result very much, if at all. Looking at the code I had produced, I can see that,
whilst the standard deviation should be decreasing, the interval that we want to
have theta to lie between is also decreasing as well, they are both inversely
proportional to n. This means that as we are more close around the mean
(smaller standard deviation) for the values of theta.hat as n increases, we also
have that the value of theta can only lie between a smaller confidence interval.
Graphically this would explain why n doesnt seem to affect the result. If we
increase the Monte Carlo precision number, r, i.e. produce more samples of
theta.hat, then we will see that the coverage error does decrease (because there
are more variables to take the average over), but there will always be natural
fluctuations across all values of theta due to the randomness of the normal
distribution.
I can see that the value of theta also doesnt particularly affect its coverage error
value. This may be because there are no asymptotes at any value of theta
between 1 and 50, and we only will see problems arising when we take theta
very small (close to zero, as the random element of theta.hat.samples could
make the value less than 0), but I avoided these by suitably choosing my initial
value of theta.
Below I created a graph in R that displays the value of the interval a1 in terms of
theta, to display how despite the equation being quite complicated, it actually is
very close to being linear for the values between 0 and 50.
The above shows or implies that because the interval is (almost) linear, and in
fact maps theta to a value very close to theta, changing it across 0 to 50
shouldnt affect the graph of the coverage error very much, i.e. we arent going
to get any values of theta that always take substantially different values, such as
large negative values closer to -95 that we observe when looking at different
distributions (notably the binomial, with a mean estimator shown in the
coursework background section).