Sie sind auf Seite 1von 3

Spring 2014

Stat 431

Homework 1

Due on Friday, January 31 at 5 pm.


Please either turn in a hardcopy in class or use the STAT 431 box in Statistics Department (400
JMHH). For problems involving computation with R, please only report results. Do not turn in
your code.
1. Body temperatures (in F) were measured on 30 healthy individuals, resulting in the data
given in bodytemp.txt.
(a) Calculate the median and the IQR of the data.
(b) What is the inter-quartile range of a N (0, 1) distribution? What is the inter-quartile
range of a N (, 2 ) distribution? (Hint: You can use the normal table in the appendix
of the textbook to find the quartiles of the standard normal distribution.)
(c) Using part (b), report an approximate SD of the data based on the value of the IQR.
How does it compare with the actual sample SD?
(d) Make a normal QQ plot of the data. Is the normal assumption made in part (c) plausible
for this dataset?
2. The idea of a normal QQ plot can be extended to other distributions. Let x1 , . . . , xn be the
data from a random sample. To check if a specified distribution with distribution function
F fits this sample, we plot the sample quantiles against the theoretical quantiles. The ith
i
i
order statistic x(i) (possibly after rescaling) is the n+1
th sample quantile, and F 1 n+1
is
the corresponding theoretical quantile. Let us apply this method to check the exponentiality
of a dataset. The resulting plot is call an exponential QQ plot.
(a) For an exponential distribution Exp() with c.d.f. F (x) = 1 exp(x), show that the
i
n+1 th theoretical quantile is




i
1
n+1
F 1
= ln
.
n+1

n+1i
(b) Usually, is unknown. The typical solution is to obtain the theoretical quantiles from
the Exp(1) distribution function F (x) = 1 exp(x), and pair them with the sample
quantiles of rescaled data. [Recall that in the normal case, we obtain the theoretical
quantiles from the N (0, 1) distribution, and pair them with the sample quantiles of the
z-scores of the data.] As before, we check whether the plotted points are close to the
line y = x.
What would be a proper rescaling for the exponential QQ plot? Why?
(Hint: If X follows an exponential distribution with parameter , then x follows an
exponential distribution with parameter 1.)
3. Three different random samples are generated from three different underlying distributions.
For each random sample, the histogram, the boxplot and the normal QQ plot are all made.
However, the plots have been shuffled, so the three plots in each column are not necessarily
from the same sample. Please group the three plots generated from the same sample together
for all three samples.
1

(A)

(B)
4
2

0.9
0.8

1.5

0.7

1.0
0.5

0.6

(c)

10
0
1.5

2.0

2.5

3.0

0.6

0.7

0.8

(2)

(3)

3
0

Theoretical Quantiles

Theoretical Quantiles

0.9

3
2
1
0

Sample Quantiles

0
1

Sample Quantiles

(1)

1.0

0.5

2
0
2
4

0.0

30

Frequency

40

Frequency

20
0
2

20

60

40

50

80

150
100

Frequency

50

(b)

0
6

(a)

Sample Quantiles

2.0

2.5

3.0

(C)

Theoretical Quantiles

4. HousePrices.txt records the sale price [price] and square footage [BLDSQFT] for 439 houses.
(a) Make a normal QQ plot and a boxplot of the price variable. Is the empirical distribution
positively or negatively skewed?
(b) According to the direction of skewness, which transformations would be appropriate
for the price variable? [Please name at least two possible candidates.] Apply the two
possible transformations to the data, and use the normal QQ plot to judge which one is
more suitable for the current data.
(c) Make a scatter plot of the properly transformed price [as obtained in part (b)] vs. the
building square footage, with the least square line added on top of the data cloud. Report
the correlation coefficient for this data cloud.
5. Let X1 , . . . , X10 be a random sample of size 10 from a N (, 1) distribution.
2

(a) Consider testing H0 : = 0 v.s. H1 : 6= 0 using the following confidence interval


approach: we reject H0 if and only if 0 does not belong to the 95% confidence interval

+ 1.96/ 10].
1.96/ 10, X
[X
Show that the resulting test has significance level = 0.05.
(b) Now suppose we have observed x1 , . . . , x10 , and have computed
= 1.6
the sample mean x
out of these numbers. We reject H0 when |
x 0 | > 1.96/ 10. Find all the values of
0 such that we do not reject H0 : = 0 based on the above rejection rule. Compare
the values of 0 you find with the realization of the 95% CI for on this dataset.

Das könnte Ihnen auch gefallen