Sie sind auf Seite 1von 4

Problems to hand in:

1. There is a file containing data on the lengths (in minutes) of 272 eruptions
of the Old Faithful geyser in Yellowstone National Park. Using R and some of
the methods discussed in class, answer the following questions.
(a) Do the data appear to be normal? If not, does it appear to be unimodal?
(b) Use the density function in R to estimate the density. Choose a variety of
bandwidths (the parameter bw) and describe how the estimates change as
the bandwidth changes. (c) Now assume that the data come from a mixture
of two normal distributions so that the density has the form:
!

where
are unknown parameters. Use the density estimates from
part (b) and any other appropriate methods to come up with educated
guesses of the values of these parameters. (Dont worry too much about
your final answers the process is much more important here.)
(d) (Not to hand in) Play around with the function den.splines in the spacings
handout; the code for this function is on Blackboard. In particular, see how
the density estimates vary as the parameter p varies in this function.
2. Suppose that X1,,Xn are independent random variables with common
density f(x) where f is symmetric around 0 (i.e. f(x) = f(x)) and is an
unknown location parameter. If Var(Xi) is finite then the sample mean X will
be a reasonable estimator of ; however, if f has heavy tails then X will be
less efficient than other estimators of , for example, the sample median.
An useful alternative to the sample mean is the -trimmed mean, which
trims the smallest and largest fractions of the data and averages the
middle order statistics. Specifically, if we define r = bnc (where bxc is the
integer part of x) then the -trimmed mean,

), is defined by
.

(a) Suppose (for simplicity) that bnc = b(n1)c and define


) to be trimmed mean with X(i) deleted from the sample. Find expressions for
); in particular, note that
) and
(b) Using the setup in part (a), show that the pseudo-values {i} are given by
for i = 1,,r + 1

for i = r + 2,,n r 1
for i = n r,,n
and give a formula for the jackknife estimator of variance of b(). (Think
about how you might use this variance estimator to choose an optimal
value of r.)
(c) there is a simple R function (called jackknife) to compute the jackknife
variance estimator of a trimmed mean. For a given sample, we can use the
interval

as an approximate 95% confidence interval for . Use the following R code to


estimate the coverage of this interval when n = 50 and X1,,X50 have a
Students t distribution with 3 degrees of freedom.
> cover <- 0
> for (i in 1:1000) {
+
+

x <- rt(50,3) # true theta is 0


xbar <- mean(x,trim=0.20)

jackvar <- jackknife(x,trim=0.2)

lower <- xbar - 1.96*sqrt(jackvar)

upper <- xbar + 1.96*sqrt(jackvar)

if (upper*lower<=0) cover <- cover + 1

> coverage <- cover/1000 # estimate of coverage


Repeat the procedure using a sample size of 200. Compare the results for n
= 50 and n = 200

den.splines <- function(x,p=5) {


library(splines)
n <- length(x)
x <- sort(x)
x1 <- c(NA,x)
x2 <- c(x,NA)
x2 <- c(x,NA)
sp <- (x2-x1)[2:n]
mid <- 0.5*(x1+x2)[2:n]
y <- n*sp
xx <- bs(mid,df=p) # create b-spline basis
r <- glm(y~xx,family=quasi(link="log",variance="mu^2"))
density <- exp(-r$linear.predictors)
r <- list(x=mid,density=density)
r
}

jackknife <- function(x,trim=0) {


loo <- NULL
n <- length(x)
xbar <- mean(x,trim=trim)
for (i in 1:n) {
xi <- x[-i]
loo <- c(loo,mean(xi,trim=trim))
}
pseudo <- n*xbar - (n-1)*loo
jackvar <- var(pseudo)/n
jackvar
}
3.600
4.200
4.533
1.867
4.633
2.233
4.500
4.067
4.667
4.850
2.617
2.800
4.333
4.500
1.933
1.883
4.800
4.600
2.400
3.917
4.150

1.800
1.750
3.600
4.833
2.000
4.500
4.000
4.933
3.750
3.683
4.067
4.333
1.983
4.083
4.617
4.583
4.100
1.783
4.800
4.550
2.350

3.333
4.700
1.967
1.833
4.800
1.750
1.983
3.950
1.867
4.733
4.250
1.833
4.633
1.800
1.917
4.250
3.966
4.367
2.000
4.083
4.933

2.283
2.167
4.083
4.783
4.716
4.800
5.067
4.517
4.900
2.300
1.967
4.383
2.017
3.967
2.083
3.767
4.233
3.850
4.150
2.417
2.900

4.533
1.750
3.850
4.350
1.833
1.817
2.017
2.167
2.483
4.900
4.600
1.883
5.100
2.200
4.583
2.033
3.500
1.933
1.867
4.183
4.583

2.883
4.800
4.433
1.883
4.833
4.400
4.567
4.000
4.367
4.417
3.767
4.933
1.800
4.150
3.333
4.433
4.366
4.500
4.267
2.217
3.833

4.700
1.600
4.300
4.567
1.733
4.167
3.883
2.200
2.100
1.700
1.917
2.033
5.033
2.000
4.167
4.083
2.250
2.383
1.750
4.450
2.083

3.600
4.250
4.467
1.750
4.883
4.700
3.600
4.333
4.500
4.633
4.500
3.733
4.000
3.833
4.333
1.833
4.667
4.700
4.483
1.883
4.367

1.950
1.800
3.367
4.533
3.717
2.067
4.133
1.867
4.050
2.317
2.267
4.233
2.400
3.500
4.500
4.417
2.100
1.867
4.000
1.850
2.133

4.350
1.750
4.033
3.317
1.667
4.700
4.333
4.817
1.867
4.600
4.650
2.233
4.600
4.583
2.417
2.183
4.350
3.833
4.117
4.283
4.350

1.833
3.450
3.833
3.833
4.567
4.033
4.100
1.833
4.700
1.817
1.867
4.533
3.567
2.367
4.000
4.800
4.133
3.417
4.083
3.950
2.200

3.917
3.067
2.017
2.100
4.317
1.967
2.633
4.300
1.783
4.417
4.167
4.817
4.000
5.000
4.167
1.833
1.867
4.233
4.267
2.333
4.450

3.567 4.500 4.150 3.817 3.917 4.450 2.000 4.283 4.767 4.533 1.850 4.250
1.983 2.250 4.750 4.117 2.150 4.417 1.817 4.467

Das könnte Ihnen auch gefallen