Sie sind auf Seite 1von 23

MA20226

Coursework 1
Emily Lawes
Tutorial Group 6
eml27

Section 1
X 1 , , X N are independent random variables all following the same Gamma(
,1) distribution ( >0) with corresponding density

1
x 1 ex , x> 0
f ( x| )= ()
0, x0
()

Where

is the Gamma function defined as

( )= u 1 eu du , for any >0


0

1.1

X Gamma( ,1)

for any real

k >0

E ( X )= x f ( x ) dx

However, by using

The Law of theUnconcious Statistician

we observe

E[ g ( X ) ]= g ( x ) f ( x ) dx

Thus, to work out what

E [ X ]= x k f ( x| ) dx
k

x k
0

1 1 x
x e dx
()

1 (+k)1 x
x
e dx
( )

x(+k)1 ex dx

( ) 0

E[ X ]

is, we use the above rule.

Now by definition of the Gamma function with parameter

+k

we can notice

that this equals

( +k )
1

( ( +k ) ) =
( )
()

1.2

Consider the estimator of

defined by the arithmetic mean, that is:

1
^ AM = X = X i
n i=1
Now to find an analytical express for the mean square error (MSE) of this
estimator using just

and

n , we remember that we define mean square

error of an estimator T as
2

MSE ( T )=E( ( T ) )

Thus,
2
MSE ( ^ AM ) =E [ ( ^ AM ) ]

2
E[ (^ AM E ( ^ AM ) + E ( ^ AM ) ) ]

E ( ^ AM )
Var ( ^ ) +
2

AM

As we are using the convention that

Var

) (

1
1
X i +[ E X i 2]

n i=1
n i=1

^ AM = X

note that:

( E ( X ) 2 ]
Var ( X ) +
Important results from lectures:

E ( X ) =E(X )
Var ( X )=Var ( X )/n

We can note from 1.1 that

( X 1 ) = (+1) ; E ( X 2 ) = (+2)
()

()

for the Gamma

function.

( X)
E

2
Var ( X )=E ( X )

Thus, following on from

Var ( X )
2
+[ ( E ( X ) ) ]
n

2
E ( X 2 )( E ( X ) )

+
n

( +1 ) 2

()

( +1 )
2

()
( +2)

()

1 ( + 2 ) ( +1 )
MSE ( ^ AM ) =

n ()
()

)) (

( +1 )
+

()

Which is an expression for the Expected value of

X 1 , , X n Gamma( , 1)

and are i.i.d. in terms of

where

,n

Because we are working with integer values of k here, what I can now do is
compute the above mean square error by using a fundamental property of the
gamma function, that is

(+1)
=
()

and

(+2)
=(+1) for importantly,
()

, >1

Thus,

MSE ( ^ AM ) =

1
( ( + 1 )( )2 ) + ( )2= 1 ( 2 + 2 )+ ( 0 )2=
n
n
n

1.3

Consider now the estimator of

^ GM =

( )
i=1

Xi

defined by the Geometric Mean, that is

1
n

MSE( ^ GM )

Find an analytical expression for the

2
E ( ^ GM )

i=1

Xi

[(( ) ) ]
i =1

Var

X i

[ ( ) ] ( [ ( ) ] )
n

i=1

Xi

1
n

+ E

i =1

Xi

1
n

[ ( ( ) ) ] ( [( ) ] ) ( [( ) ] )
n

1
n

1
n

,n

(( ) )
n

MSE ( ^ GM ) =MSE

in terms of a function of

i =1

Xi

1 2
n

Now, because

i=1

Xi

X1 , , Xn

1
n

+ E

i=1

Xi

1
n

are i.i.d, we can use the fact that


1

E ( X i X j )=E ( X i ) E ( X j ) ; i j

( )

( )

E X i n =[
and that

(( ) )
n

i=1

1
n
]
( )

Xi

1
n

i=1

from the fact that the

Xs are i.i.d thus all of their expected values are the same, thus

[ ]
n

k
i

X =

i=1

i=1

E [ X ]=E [ X
k
i

Thus, we obtained that

k n

(+ k )
=
()

[( ) ] [
n

Xi

1
n

i=1

1
(+ )
n
=
()

Xi
n

i=1

This is one part of the equation, now,

[ (( ) ) ]
n

i=1

Xi

1 2
n

=E

Thus, compiling this together, we obtain that


n

2n

[ ( ) ] [ ( ) ] ([ ( ) ] )

2
+
n
MSE ( ^ GM ) =
()

1
+
n

()

1
+
n
+
()

Expanding out the final bracket we get:


n

[ ][ ]

1
+
n
MSE ( ^ GM ) =2 2
()

( )

2
+
n
+
( )

( )

Here, because k in this example is not an integer value, the useful definition
used in the above question cannot be used, thus we can leave the answer like
this.

1.4
Show that

log ( X i)
n
1

n i=1
^ ML= 1

is the maximum likelihood estimator of

the digamma function defined as

' ()
d
= log ( )=
d
()

where

is

In this case,

is the inverse digamma function and can be found using

inv.digamma in R.

Step 1: Find the Log Likelihood expression for

f (x)

1
x 1 ex , x> 0
f ( x| )= ()
0, x 0
The log likelihood function is defined as
n

log L(x )= log f (x i ,)


i=1

log (

1
1 x
xi e )
( )


i=1

(log

( 1( ) )+ log ( x

)+ log ( ex ))
i


i=1

log ( ab ) =log ( a ) + log ( b)

By the property that


n

(log ( ( ) ) + ( 1 ) log ( xi ) x i)
i=1

From basic properties of the logarithmic function


Now to maximise this function, we need to find a theta such that when we
differentiate the log likelihood function, valued at theta, we get zero.
Thus,

d log L(x) n ' ( )


=
+ log ( xi )
d
( )
i=1

'

d log L(x) n ( )
=
+ log ( xi )
d
()
i=1
Now setting this equation to be zero, we get

' ()

log ( x i )=n ( )
i=1

' ()
1
log ( x i )=

n i=1
()
n

1
log ( x i )= ( )
n i=1
Now substituting

=^ ML

( Xi)
n

1
log
n i=1
1 ( )
log ( x i )=
n

n i=1
n

1
1
log ( x i )= log ( X i )

n i=1
n i=1

( xi )
n

We can say that the point estimate is

1
log
n i=1
1

as this satisfies the equation,

( Xi)
n

implying that the log likelihood equals zero at this point, thus

Maximum Likelihood Estimator for

1
log
n i=1
1

is the

1.5

The purpose of the following is to create a function R for three different


estimators for the unknown

, that is the Harmonic Mean, the root-mean

square, and the maximum likelihood estimator described in question 1.4

Maximum Likelihood Estimator


mean( log ( X ))

1
^ ML= 1 log ( X i ) =inv . digamma
n i=1
Where

is the inverse digamma function defined in question 1.4 above. We

can use the R function given in the Appendix, located at


http://people.bath.ac.uk/kai21/Stats2A/CW/inv.digamma.R
We can note here that the log function in R naturally uses the natural base.
Code
load(url("http://people.bath.ac.uk/kai21/Stats2A/CW/inv.digamma.R"))
mle<-function(x){
inv.digamma(mean(log(x)))
}
mse.MLE.theta<-function(theta0,r,n){
x.samples<-matrix(rgamma(n=n*r,scale=1,shape=theta0),nrow=n,ncol=r)
theta.hat.sample<-apply(X=x.samples,MARGIN=2,mle)
mean((theta.hat.sample-theta0)^2)
}
Example
> mse.MLE.theta(30,1000,100)
[1] 0.3095646

Harmonic Mean
^ HM =n

X i1
i=1

1
Arithmetic Mean (

1
)
X

Code
harmonic.mean<-function(x){
1/mean(1/x)
}

mse.HM.theta<-function(theta0,r,n){
x.samples<-matrix(rgamma(n=n*r,scale=1,shape=theta0),nrow=n,ncol=r)

theta.hat.sample<-apply(X=x.samples,MARGIN=2,harmonic.mean)
mean((theta.hat.sample-theta0)^2)
}
Example
> mse.HM.theta(30,1000,100)
[1] 1.306678

Root-mean square

1
^ RMS= X i2= Arithmetic Mean( X 2)
n i=1

Code

rms<-function(x){
sqrt(mean(x^2))
}
mse.RMS.theta<-function(theta0,r,n){
x.samples<-matrix(rgamma(n=n*r,scale=1,shape=theta0),nrow=n,ncol=r)
theta.hat.sample<-apply(X=x.samples,MARGIN=2,rms)
mean((theta.hat.sample-theta0)^2)
}

1.6

Now using the above functions, for

n=100 and

estimated mean square errors as functions of

( 0,50) I will plot the


and plot these onto the same

graph. Then I will superimpose the calculated exact mean square error for the
Arithmetic and Geometric mean discovered in sections 1.2 and 1.3.
Code
n<-100
n.theta<-100
theta<-seq(0.1,50,length.out=n.theta)
r<-1000
## Define the graph space as a 2x2 matrix
par(mfrow=c(2,2))
load(url("http://people.bath.ac.uk/kai21/Stats2A/CW/inv.digamma.R"))

## We define functions that will be used to be used in the apply function, note
that harmonic mean already exists in R however I have defined it
## As below for use on my personal computer without the inbuilt function
root.mean.square<-function(x){
sqrt(mean(x^2))
}
mle<-function(x){
inv.digamma(mean(log(x)))
}
harmonic.mean<-function(x){
1/mean(1/x)
}

## Define the actual geometric mean here to avoid confusion in line drawing
code
geometric.mean.actual <- (gamma(theta+2/n)/gamma(theta))^n2*theta*(gamma(theta+1/n)/gamma(theta))^n+theta^2
## Define what M is for each estimator, i.e. something which we would like to
store n.theta values in
M.hm<-rep(0,n.theta)
M.rms<-rep(0,n.theta)
M.mle<-rep(0,n.theta)
for (i in 1:n.theta){
## We use the gamma function with rate/scale 1 and scale theta at each point,
and store these in a matrix
## This fits the distribution of all the Xis.
## We then apply the estimator function on each value, stored in a matrix
## We then compute the MSE of each i

x.samples.hm<matrix(rgamma(n=n*r,rate=1,shape=theta[i]),nrow=n,ncol=r)
theta.hat.sample.hm<-apply(X=x.samples.hm,MARGIN=2,harmonic.mean)
M.hm[i]<-mean((theta.hat.sample.hm-theta[i])^2)

x.samples.rms<matrix(rgamma(n=n*r,rate=1,shape=theta[i]),nrow=n,ncol=r)
theta.hat.sample.rms<-apply(X=x.samples.rms,MARGIN=2,root.mean.square)
M.rms[i]<-mean((theta.hat.sample.rms-theta[i])^2)

x.samples.mle<matrix(rgamma(n=n*r,rate=1,shape=theta[i]),nrow=n,ncol=r)
theta.hat.sample.mle<-apply(X=x.samples.mle,MARGIN=2,mle)
M.mle[i]<-mean((theta.hat.sample.mle-theta[i])^2)
}
## Plot all three on the same graph, we can add additional lines by using the
lines function
plot(theta,M.hm,type="l",xlab=expression(theta),ylab="MSE",main=paste("All
MSE for r=",r," and n=",n,sep=""),ylim=c(0,2),col="blue")
legend(0.1,2,c("Harmonic","RMS","MLE"),lty=c(1,1,1),col=c("blue","green","red")
,cex=0.8,x.intersp=1,xjust=-0.1,y.intersp=0.2,bty="n")
lines(theta,M.rms,col="green")
lines(theta,M.mle,col="red")

## On a new plot (i.e. the second), plot the same three with additional actual
MSE for the Arithmetic Mean
plot(theta,M.hm,type="l",xlab=expression(theta),ylab="MSE",main=paste("All
MSE for r=",r," and n=",n," with the \n addition of the MSE for the Arithmetic
Mean",sep=""),ylim=c(0,2),col="blue")
legend(0.1,2,c("Harmonic","RMS","MLE","AM"),lty=c(1,1,1,1),lwd=c(1,1,1,2),col=
c("blue","green","red","black"),cex=0.8,x.intersp=1,xjust=0.1,y.intersp=0.2,bty="n")
lines(theta,M.rms,col="green")
lines(theta,M.mle,col="red")
lines(theta,theta/n,col="black",lwd="2")

## Repeat for the Geometric Mean, which is defined above.


plot(theta,M.hm,type="l",xlab=expression(theta),ylab="MSE",main=paste("All
MSE for r=",r," and n=",n," with the \n addition of the MSE for the Geometric
Mean",sep=""),ylim=c(0,2),col="blue")
legend(0.1,2,c("Harmonic","RMS","MLE","GM"),lty=c(1,1,1,1),lwd=c(1,1,1,2),col=
c("blue","green","red","black"),cex=0.8,x.intersp=1,xjust=0.1,y.intersp=0.2,bty="n")
lines(theta,M.rms,col="green")
lines(theta,M.mle,col="red")
lines(theta,geometric.mean.actual,col="black",lwd="2")

## Used variables for the legend:


## Location on the graph, legend names, line type (i.e solid), line width, colour,
text scaling,
## the x and y linespace, the justification of x, and finally removal of the legend
box/border.

Notice in the diagrams overleaf that the set of MSE values for

are the same

for each graph, this is because the code has reused the vector M[i], this is for
easy comparison between the graphs and consistency, however when run again
the graph will look slightly different as the values of M[i] would be brought from a
new set of random variables.
Looking at my code above, I could have increased the value of n.theta to make
the number of mean values plotted increase, however at a certain value the
graph can look very concentrated and busy. I chose 100 as it incorporates a good
visual description of the MSE for certain values and showing the variation
between different values (i.e how the increase of r affects the variation), along
with not taking too long for the code to compute and produce the graphs, so we
shouldnt pick a value of n.theta that is too large.

1.7

Looking at the graphs produced above, to determine whether there is a uniformly


best estimator, I should consider how the graphs of each behave as n increases. I
can see straight away that the two estimators with the lowest MSE for all values
of theta, are the Arithmetic Mean and the Maximum Likelihood Estimator. One of
which we have an analytical expression for the MSE and the other we do not, so
had to simulate the values of the MSE using the Monte Carlo estimator.
In RStudio, I have had a look into what happens when we increase n. Apart from
the spike at low values of theta, the harmonic mean plateaus to 1 for larger
theta, the RMS, AM and MLE all converge to zero uniformly, with the AM and MLE
having the lower value of the MSE for all values of theta. Looking into both the
AM and MLE, I notice that, whatever value of theta or n, their values are very
similar and the lines created lie on-top of one another. This means that either of
these estimators are best, and it depends whether we require an exact
analytical expression for the MLE or not.
For an estimator to be good, we would like it to be unbiased and have a small
mean square error. Though a biased estimator isnt always bad if our MSE stays
uniform and small. A biased estimator doesnt always imply that the mean
square error will be high, because of course you do get random error. We know
that the arithmetic mean is unbiased however, but as we dont have a specific
calculation for the mean square error of the MLE, its not clear whether it is
biased or not, it may have small variance but be biased. Thus, between the two, I
would say that the arithmetic mean which is known to be unbiased is the better
estimator, though as both give the same curve for the MSE, they are both
equally as good estimators as they converge to zero uniformly and as the same
rate as one another.

Section 2
Consider the following approximation to the sampling distribution of the
maximum likelihood estimator

1
^ ML= 1 log ( X i )
n i=1

^ ML N , , for large n
n

( )

so that the approximation is better for when the size of the random sample,

is large.

2.1
Assuming the approximation above is exact, derive analytical expressions for the
end-points of the corresponding 95% confidence interval for

As we usually work with N(0,1) distributions, (The Standard Normal), I will


transform this distribution into the standard normal for ease of use. We cant
think about using the T distribution here, because variance is also unknown and
is in terms of the mean.
2

The following formula can be applied to a Normal distribution N ( , ) :

Z=

^ ML

Thus,

Z=

^ ML

N (0,1)

To work out the 95% confidence interval, we want to find the end-points, c1 and
c2, which satisfy

P ( c 1< Z <c 2 )=0.95


Using the z value tables for the Normal (0,1) distribution, we obtain that

P ( z 1.96 )=0.975

and

P ( z 1.96 )=0.975 , thus these are our values for the

end points for Z. However we would also like to pivot this away from , ie have
in the middle of the inequality to find the end points for
We can notice that because we are following a
of

c 1 and

N (0,1) distribution, the values

c 2 are negatives of one another, i.e.

c 1=c 2

Hence,

P ( Z <|c1|) =0.95

^ ML

<|1.96| =0.95

I can notice that to solve for theta in this case may be fairly tricky, thus I can
square both sides of the inequality to obtain a quadratic in . This will also follow

2(1) distribution, (chi-squared with DoF 1)

( ) )
^ ML

<1.962 =0.95

(^ ML )

< 1.96 =0.95

^ 2ML 2 ^ ML +2 1.962
P
<
=0.95

( ))
2

1.96
P ^ 2ML 2 ^ ML + 2<
=0.95
n

( (( ) )

1.96 2
P
+ 2 ^ ML + ^ 2ML <0 =0.95
n
2

Thus to solve this, we can make use of the quadratic formula, with

(( ) )

a=1,b=

Because

1.962
+2 ^ ML , c=^ 2ML
n

^ ML and

are all observed values, there is no problem using these

in the end point intervals.


We can solve for here to work out the interval in which the inequality holds,
with a probability of 95%. That is,

) (( ) )

(( )

1.962
+2 ^ ML
n

1.962 ^
1
=
+ ML
2n
2

( (

1.96 2
+ 2 ^ ML 4 ^ 2ML
n
2
2

1.96 2
+2 ^ ML 4 ^ 2ML
n

Our 95% confidence interval for is then

1.962 ^
1
+ ML
2n
2

((

1.96 2
1.962 ^
1
+ 2 ^ ML 4 ^ 2ML ,
+ ML +
n
2n
2

((

1.962
+ 2 ^ ML 4 ^ 2ML
n

2.2
Now we would like to produce a function in R which will return the Coverage error
for a value of theta, a value of n (the size of the random variable) and the value
of r (the Monte Carlo precision).
Coverage error is defined as

CE a 1,a 2=( ( ) (1 ))100


Where

( )=P ( a 1 ( X 1 , , X n ) <<a 2 ( X 1 , , X n ) )

Where a1 and a2 are the confidence intervals we found in question 2.1, for a
value

of 0.05, i.e. 95% confidence.

Hence, to compute the coverage error, we have to work with a sample of size r,
that follows the distribution of the estimator

^ ML which we are told, at large

values of n, follows a normal distribution with mean

and variance

/n .

Rather than working with the digamma function in section one, we are assuming
that this new distribution to be true.
The value of

() for r Monte Carlo simulations can be taken as the mean of

the true value of

lies within the interval, i.e.

mean((theta 0)>a 1)(theta 0<a 2)

as we only want one value computed from

this function.
Hence, the code to obtain this coverage error is below:

Code
CE.theta.mle<-function(theta0,n,r){
##The samples of theta.hat follow a normal distribution with mean theta0 and
standard deviation

##sqrt(theta0/n). We choose size r because we dont have to apply an estimator


to find the
##theta.hat.samples, we just have to obtain r random variables
theta.hat.samples <-rnorm(r,mean=theta0,sd=sqrt(theta0/n))
##Now apply the intervals to the values of theta.hat.samples
a1<-3.8416/(2*n)+theta.hat.samples-0.5*sqrt(((3.8416/n)
+2*theta.hat.samples)^2-4*theta.hat.samples^2)
a2<-3.8416/(2*n)+theta.hat.samples+0.5*sqrt(((3.8416/n)
+2*theta.hat.samples)^2-4*theta.hat.samples^2)
## Now compute the coverage error, i.e. theta0 lies between a1 and a2, take
the mean over the r
## values, minus 0.95 and multiply by 100. Note that 3.8416 is 1.96 squared
(mean((theta0<a2)&(theta0>a1))-0.95)*100
}
Example:
> CE.theta.mle(10,100,1000)
[1] -0.8

2.3
Now replicate this for all values of theta between 0 and 50, and graph these for
different values of n.
To do this, we would like to take values of theta between 0 and 50, under a
certain amount of precision (n.theta) and run the code from section 2.2 for all of
these values, to compute the coverage error and display it on a graph. We would
also like to run it for several values of n, namely 100, 500, 1000 to see the
difference (if any) that n makes to the values of the coverage error.

Code
n.theta<-200 # Number of values of theta we would like to show the CE for
theta<-seq(1,50,length.out=n.theta) # avoid very small values here which may
break in the intervals (because sqrts will produce imaginary numbers if negative
inside,this may happen due to the randomess of the distribution), choosing the
value to be 1 shouldnt affect the result much as we run over up to 50 so
difference is insignificant
r<-1000 # Monte Carlo precision
par(mfrow=c(1,1)) # We would like a single combined figure, so default the size
of the plot screen to 1x1
CE<-rep(0,n.theta) # Create a vector for CE with size n.theta
n.vals=c(100,500,1000) # Values of n we would like to use
colours<-c("blue","green","red") #Created a vector of colours that we can use to
change the colour of the line for different values of n
col.index=0 # Start the index off at one so we can run through it: using 'i' or 'n'
to index wouldn't work here, as these values range from 0,50 and 100,500,1000
respectively
plot(theta,0*theta,type="l",ylab="Coverage error (%)",main=paste("Coverage
error for r =",r,"and \n n =",paste(n.vals,collapse=",
",sep="")),xlab=expression(theta),ylim=c(-6,6),col="black")
legend(0.1,5,c(paste("n=",n.vals[1]),paste("n=",n.vals[2]),paste("n=",n.vals[3])),
lty=c(1,1,1),lwd=c(1,1,1),col=c(colours[1],colours[2],colours[3]),cex=0.8,x.inters
p=1,xjust=-0.1,y.intersp=0.5,bty="n")
for (n in n.vals){
# run through each value in n.vals
for (i in c(1:n.theta)){
CE[i]<-CE.theta.mle(theta[i],n,r) # run the function as before, but
for all values of theta[i] across (0,50)
}
col.index=col.index+1 # run through the colour index; this can be
extended if we want more n's plotted on the graph

lines(theta,CE,type="l",col=colours[col.index]) # draw the line of the CE


for each n at the chosen colour
}

Graph of the coverage error of the maximum likelihood estimator ~ N( , n


for

n=100,500,100, plotted with different colours to distinguish them. The

monte carlo precision r is 1000, and the range of theta is (0,50).

2.5
Looking at the curves produced, I notice that the coverage error is randomised
around 0. A value of around zero means that we are close to the 95% confidence
interval, in both a positive and negative direction, i.e the value of theta lies
within the interval 95% of the time, plus or minus some random noise. This
means that, every time I run the simulation, we get a different plot of coverage
error. This is because we are following a normal distribution where the standard
deviation tends towards zero as n increases, this means the values fall within a
very small range, however there is always some small error when we take a
normal distribution, as the thetas are random variables.
The intervals are reliant on the value of n, and the value of the theta samples.
Because theta.sample follows a normal distribution, even though we may be
decreasing the standard deviation, the normal distribution is symmetric,
meaning that as we take the average of these errors, the value of n wont
significantly affect it, as whatever values lie below the mean may cancel out with
those above.
I can see on my graph that increasing the value of n doesnt seem to affect the
result very much, if at all. Looking at the code I had produced, I can see that,
whilst the standard deviation should be decreasing, the interval that we want to
have theta to lie between is also decreasing as well, they are both inversely
proportional to n. This means that as we are more close around the mean
(smaller standard deviation) for the values of theta.hat as n increases, we also
have that the value of theta can only lie between a smaller confidence interval.
Graphically this would explain why n doesnt seem to affect the result. If we
increase the Monte Carlo precision number, r, i.e. produce more samples of
theta.hat, then we will see that the coverage error does decrease (because there
are more variables to take the average over), but there will always be natural
fluctuations across all values of theta due to the randomness of the normal
distribution.
I can see that the value of theta also doesnt particularly affect its coverage error
value. This may be because there are no asymptotes at any value of theta
between 1 and 50, and we only will see problems arising when we take theta
very small (close to zero, as the random element of theta.hat.samples could
make the value less than 0), but I avoided these by suitably choosing my initial
value of theta.
Below I created a graph in R that displays the value of the interval a1 in terms of
theta, to display how despite the equation being quite complicated, it actually is
very close to being linear for the values between 0 and 50.

The above shows or implies that because the interval is (almost) linear, and in
fact maps theta to a value very close to theta, changing it across 0 to 50
shouldnt affect the graph of the coverage error very much, i.e. we arent going
to get any values of theta that always take substantially different values, such as
large negative values closer to -95 that we observe when looking at different
distributions (notably the binomial, with a mean estimator shown in the
coursework background section).

Das könnte Ihnen auch gefallen