Beruflich Dokumente
Kultur Dokumente
1. There is a file containing data on the lengths (in minutes) of 272 eruptions
of the Old Faithful geyser in Yellowstone National Park. Using R and some of
the methods discussed in class, answer the following questions.
(a) Do the data appear to be normal? If not, does it appear to be unimodal?
(b) Use the density function in R to estimate the density. Choose a variety of
bandwidths (the parameter bw) and describe how the estimates change as
the bandwidth changes. (c) Now assume that the data come from a mixture
of two normal distributions so that the density has the form:
!
where
are unknown parameters. Use the density estimates from
part (b) and any other appropriate methods to come up with educated
guesses of the values of these parameters. (Dont worry too much about
your final answers the process is much more important here.)
(d) (Not to hand in) Play around with the function den.splines in the spacings
handout; the code for this function is on Blackboard. In particular, see how
the density estimates vary as the parameter p varies in this function.
2. Suppose that X1,,Xn are independent random variables with common
density f(x) where f is symmetric around 0 (i.e. f(x) = f(x)) and is an
unknown location parameter. If Var(Xi) is finite then the sample mean X will
be a reasonable estimator of ; however, if f has heavy tails then X will be
less efficient than other estimators of , for example, the sample median.
An useful alternative to the sample mean is the -trimmed mean, which
trims the smallest and largest fractions of the data and averages the
middle order statistics. Specifically, if we define r = bnc (where bxc is the
integer part of x) then the -trimmed mean,
), is defined by
.
for i = r + 2,,n r 1
for i = n r,,n
and give a formula for the jackknife estimator of variance of b(). (Think
about how you might use this variance estimator to choose an optimal
value of r.)
(c) there is a simple R function (called jackknife) to compute the jackknife
variance estimator of a trimmed mean. For a given sample, we can use the
interval
1.800
1.750
3.600
4.833
2.000
4.500
4.000
4.933
3.750
3.683
4.067
4.333
1.983
4.083
4.617
4.583
4.100
1.783
4.800
4.550
2.350
3.333
4.700
1.967
1.833
4.800
1.750
1.983
3.950
1.867
4.733
4.250
1.833
4.633
1.800
1.917
4.250
3.966
4.367
2.000
4.083
4.933
2.283
2.167
4.083
4.783
4.716
4.800
5.067
4.517
4.900
2.300
1.967
4.383
2.017
3.967
2.083
3.767
4.233
3.850
4.150
2.417
2.900
4.533
1.750
3.850
4.350
1.833
1.817
2.017
2.167
2.483
4.900
4.600
1.883
5.100
2.200
4.583
2.033
3.500
1.933
1.867
4.183
4.583
2.883
4.800
4.433
1.883
4.833
4.400
4.567
4.000
4.367
4.417
3.767
4.933
1.800
4.150
3.333
4.433
4.366
4.500
4.267
2.217
3.833
4.700
1.600
4.300
4.567
1.733
4.167
3.883
2.200
2.100
1.700
1.917
2.033
5.033
2.000
4.167
4.083
2.250
2.383
1.750
4.450
2.083
3.600
4.250
4.467
1.750
4.883
4.700
3.600
4.333
4.500
4.633
4.500
3.733
4.000
3.833
4.333
1.833
4.667
4.700
4.483
1.883
4.367
1.950
1.800
3.367
4.533
3.717
2.067
4.133
1.867
4.050
2.317
2.267
4.233
2.400
3.500
4.500
4.417
2.100
1.867
4.000
1.850
2.133
4.350
1.750
4.033
3.317
1.667
4.700
4.333
4.817
1.867
4.600
4.650
2.233
4.600
4.583
2.417
2.183
4.350
3.833
4.117
4.283
4.350
1.833
3.450
3.833
3.833
4.567
4.033
4.100
1.833
4.700
1.817
1.867
4.533
3.567
2.367
4.000
4.800
4.133
3.417
4.083
3.950
2.200
3.917
3.067
2.017
2.100
4.317
1.967
2.633
4.300
1.783
4.417
4.167
4.817
4.000
5.000
4.167
1.833
1.867
4.233
4.267
2.333
4.450
3.567 4.500 4.150 3.817 3.917 4.450 2.000 4.283 4.767 4.533 1.850 4.250
1.983 2.250 4.750 4.117 2.150 4.417 1.817 4.467