Sie sind auf Seite 1von 25

Chapter 5 Properties of a Random Sample

5.1 Population and sample (CB 5.1)

Suppose we are interested in the distribution or certain features of a collection


of data. We call this collection a population.

Suppose for some reason these data are not well documented or easily ac-
cessible and the distribution or features that we are interested in cannot be
readily computed. A simple example of this is household incomes. If we
wish to know the true average U.S. household income for this month then
we have a very big task on hand because we have to gather the information
from hundreds of millions of families.

A solution of this is to draw a sample from the population, in other words to


select a subset from the population, and use the sample information to make
inference on the truth. How to best do this and how to handle sampling
variability are among the most important issues in statistics

The populations features that might be of interest include: the shape of


the distribution (is it symmetric or skewed, does it have one single peak
or multiple peaks, etc.), whether a standard distribution (normal, gamma,
Weibull, Poisson, etc) could serve as a reasonable approximation, what is the
mean, variance, percentiles, etc.

Any number which can be computed from the population is called a parameter.
Common parameters of interest are the mean, variance, percentiles, mode
(most probable value).

A statistic is any number calculated from the sample data. Suppose the
sample data P are X1 , . . . , Xn . Examples of statistics
P are the sample mean
n n
= X̄ = n−1 i=1 Xi , sample variance = (n − 1)−1 i=1 (Xi − X̄)2 , sample
percentiles, sample range (= sample maximum − sample minimum).

Consider the experiment of drawing at random a sample from a population.


Before the sample is drawn, we can think of the sample values to be observed
as random variables. In that sense we can also think of any statistic computed

1
from these values as a random variable and speak about its distribution, called
sampling distribution. After the sample is drawn, then we see the value of
the statistic and there would be no distribution to speak of.

Typically we will assume that the population size is much bigger than the
sample size and that the sample observations are drawn from the population
independently of one another under very similar sampling conditions. As
such the random variables in the sample will be approximately independent
and have very similar distributions.

We say that a collection of rv’s X1 , . . . , Xn form a random sample if they are


iid. We will for the most part assume that this is case. In practice, this is of
course often violated. The iid. theory is nevertheless relavant since the iid.
model can be used as a fundamental building block for complicated models
of dependence.

5.2 Basic tools (CB 5.1)

In this section we review some basic tools that are useful for studying the
distributional properties of statistics.

Assume that X1 , . . . , Xn are iid.


(a) If E(X1 ) = µ then E(X̄) = µ.
(b) If Var(X1 ) = σ 2 then Var(X̄) = σ 2 /n.

Proof. In general, given rv’s X1 , . . . , Xn


à n !
X
Var Xi
i=1
à n
!2
X
= E (Xi − µi )
i=1
n
XX n
= E(Xi − µi )(Xj − µj )
i=1 j=1

2
n
X X
= E(Xi − µi )2 + E(Xi − µi )(Xj − µj )
i=1 1≤i6=j≤n
Xn X
= Var(Xi ) + Cov(Xi , Xj ).
i=1 1≤i6=j≤n
t
u

(c) If E(X1 ) = µ and Var(X1 ) = σ 2 then E(S 2 ) = σ 2 .

Proof.
n
" n #
1 X 1 X
ES 2 = E (Xi − X̄)2 = E Xi2 − (nX̄)2
n − 1 i=1 n−1 i=1
· 2
¸
n σ
= σ 2 + µ2 − − µ2 = σ 2 .
n−1 n

(d) If X1 has mgf M (t) then X̄ has mgf M n (t/n).


(e) If X1 has cdf F then maxni=1 Xi has cdf F n .

If X, Y have a joint pdf fX,Y then the joint pdf of X and Z = X + Y is

fX,Z (x, z) = fX,Y (x, z − x)

and hence the pdf of Z is


Z
fZ (z) = fX,Y (x, z − x)dx.

If X, Y are independent then


Z
fZ (z) = fX (x)fY (z − x)dx,

which is called the convolution of fX and fY .

The mgf of a rv (X1 , . . . , Xn ) is defined by


Pn
M (t1 , . . . , tn ) = Ee i=1 ti Xi .

3
As in the one-variable case, mgf’s are unique. Let (X, Y ) be bivariate normal
with pdf:
1 1
− 2(1−ρ x−µx 2
2 ) [( σx ) +(
y−µy 2 x−µx y−µy
σy ) −2ρ( σx )( σy )]
f (x, y) = p e
2πσx σy 1 − ρ2
Then
1 2 2 2 2
M (t1 , t2 ) = eµ1 t1 +µ2 t2 + 2 (σ1 t1 +2ρσ1 σ2 t1 t2 +σ2 t2 ) .

5.3 Sampling from the normal distribution (CB 5.3)

Theorem 5.3.1. Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then


(a) X̄ and S 2 are independent rv’s,
(b) X̄ ∼ N (µ, σ 2 /n),
(c) (n − 1)S 2 /σ 2 ∼ χ2 with n − 1 degrees of freedom. (Recall that χ2 (p) =
gamma (p/2, 2).

Proof. (b) is obvious so we focus on (a) and (c). We assume without loss
of generality that µ = 0, σ 2 = 1. We first prove (a). Write
n
2 1 X
S = (Xi − X̄)2
n − 1 i=1
à n
!
1 X
= (X1 − X̄)2 + (Xi − X̄)2
n−1 i=2
Ã !2 
n n
1  X X
= (Xi − X̄) + (Xi − X̄)2  ,
n−1 i=2 i=2

which show that S 2 is a function of X2 − X̄, . . . , Xn − X̄. If we can show that


these rv’s are jointly independent of X̄ then we are done. So this is what we
do now. Consider the transformation

U1 = X̄, Uj = Xj − X̄, 2 ≤ j ≤ n.

4
The tranformation is one-one from <n to <n . The inverse transformation is
n
X
X 1 = U1 − Ui , Xj = Uj + U1 , 2 ≤ j ≤ n
i=2

and the Jacobian is equal to n. Thus,


n Pn Pn
− 12 (u1 − i=2 ui )2 − 12 i=2 (ui +u1 )2
u) =
fU (u e e
(2π)n/2
n Pn Pn
− n2 u21 − 12 [( i=2 ui )2 + i=2 u2i ]
= e e .
(2π)n/2
Thus, (a) is proved. We now prove (c). Let
n n
1X 2 1 X
X̄n = Xi , S n = (Xi − X̄n )2 .
n i=1 n − 1 i=1

Write
n−1
X
(n − 1)Sn2 = (Xi − X̄n−1 + X̄n−1 − X̄n )2 + (Xn − X̄n )2
i=1
n−1
X n−1
X
2
= (Xi − X̄n−1 ) + 2 (Xi − X̄n−1 )(X̄n−1 − X̄n )
i=1 i=1
+(n − 1)(X̄n−1 − X̄n ) + (Xn − X̄n )2 .
2

Note that

(n − 1)(X̄n−1 − X̄n )2
à n−1 n
!2
n−1 n X X
= Xi − Xi
n2 n − 1 i=1 i=1
õ ¶Xn−1
!2
n−1 n
= −1 Xi − Xn
n2 n−1 i=1
n−1
= 2
(X̄n−1 − Xn )2
n

5
and
à n
!2
1X
(Xn − X̄n )2 = Xn − Xi
n i=1
à n−1
!2
n−1 n−1 1 X
= Xn − Xi
n n n − 1 i=1
µ ¶2
n−1 ¡ ¢2
= Xn − X̄n−1 .
n
Thus,
µ ¶2
n−1 n−1 ¡ ¢2
(n − 1)Sn2 = (n − 2
2)Sn−1+ 2
(Xn − X̄n−1 )2 + Xn − X̄n−1
n n
2 n−1
= (n − 2)Sn−1 + (Xn − X̄n−1 )2 .
n
If n = 2 then
1
(2 − 1)S22 = (X2 − X1 )2 ∼ χ21
2
since
1
√ (X2 − X1 ) ∼ N (0, 1).
2
Now suppose that (k − 1)Sk2 ∼ χ2k−1 (induction assumption) and we show
2
that kSk+1 ∼ χ2k . By the identity above,

2 k
kSk+1 = (k − 1)Sk2 + (Xk+1 − X̄k )2 .
k+1
Since Sk2 is independent of X̄k and Xk+1 , the two summands are indepen-
The first term is ∼ χ2k−1 by assumption and the second term is ∼ χ21
dent. q
k 2
since k+1 (Xk+1 − X̄k ) ∼ N (0, 1). Hence kSk+1 ∼ χ2k and (c) is proved by
induction. t
u

6
Definition. The Student’s t distribution with p degrees of freedom, where
p is any positive integer, has the pdf
¡ ¢
Γ p+1 1
f (x) = ¡ p2 ¢ .
Γ 2 (pπ)1/2 (1 + x2 /p)(p+1)/2
The F distribution with p, q degrees of freedom, where p, q are any positive
integers, has the pdf
¡ ¢ µ ¶p/2
Γ p+q2 p xp/2−1
f (x) = ¡ p ¢ ¡ q ¢ .
Γ 2 Γ 2 q [1 + (p/q)x](p+q)/2

Theorem 5.3.2. Let X, Y be independent rv’s, X ∼ N (0, 1) and Y ∼ χ2p


p
then X/ Y /p ∼ tp .

Proof. Let
p
U = X/ Y /p, V = Y.

The transformation is one-one with inverse tranformation


p
X = U V /p, Y = V
p
and Jacobian v/p. Hence
p p
fU,V (u, v) = fX (u v/p)fY (v) v/p
1 − 1 u2 v/p 1 p/2−1 −v/2
p
= √ e 2 v e v/p.
2π Γ(p/2)2p/2
Thus,
Z ∞
fU (u) = fU,V (u, v)dv
0 Z ∞
1 1 2
= √ e− 2 (1+u /p)v v (p+1)/2−1 dv
2π 0
1 1 Γ((p + 1)/2)
= √ .
2π Γ(p/2)2p/2 [ 21 (1 + u2 /p)](p+1)/2

7
Corollary 5.3.3. Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then
X̄−µ
√ ∼ tn−1 .
S/ n

Proof. By Theorem 5.3.1, X̄ and S 2 are independent and


√ (n − 1)S 2
(X̄ − µ)/(σ n) ∼ N (0, 1), 2
∼ χ2n−1 .
σ
Hence it follows from Theorem 5.3.2 that

X̄ − µ (X̄ − µ)/(σ n)
√ =p ∼ tn−1 .
S/ n [(n − 1)S 2 /σ 2 ]/(n − 1)
t
u

Theorem 5.3.4. Let X, Y be independent rv’s, X ∼ χ2p and Y ∼ χ2q then


X/p
Y /q ∼ Fp,q .

X/p
Proof. Let U = Y /q , V = X. Find the joint pdf of U, V and integrate v
out. t
u

2
Corollary 5.3.5. Let X1 , . . . , Xn be a random sample from N (µX , σX )
2
and Y1 , . . . , Ym be a random sample from N (µY , σY ) where the two random
2
samples are independent. Let SX and SY2 be the sample variances of the two
random samples. Then
2 2
SX /σX
∼ Fn−1,m−1 .
SY2 /σY2

Proof. By Theorem 5.3.1,


2
(n − 1)SX 2 (m − 1)SY2
2 ∼ χn−1 , 2 ∼ χ2m−1 .
σX σY
It follows from Theorem 5.3.4. that
2 2 2 2
SX /σX [(n − 1)SX /σX ]/(n − 1)
2 2 = 2 2 ∼ Fn−1,m−1 .
SY /σY [(m − 1)SY /σY ]/(m − 1)
t
u

8
5.4 Order statistics (CB 5.4)

Definition. The order statistics of a sample X1 , . . . , Xn are the sample


values placed in ascending order. The i-th order statistic is denoted by X(i) .

One can define a variety of statistics using the order statistics. The sample
meadian is
(
X((n+1)/2) if n is odd
M=
(X(n/2) + X(n/2+1) )/2 if n is even.

The median is a measure of central tendency which is robust against “out-


liers”. More generally, the sample (100)p-th percentile is equal to X{np} if
1 1
2n < p < .5 and X(n+1−{n(1−p)}) if .5 < p < 1 − 2n . The 25-th percentile is
called the first sample quartile, the 75-th percentile is called the third sample
quartile. Sample percentiles are estimates of the population percentiles.

Theorem 5.4.1. Let X1 , . . . , Xn be a random sample from a distribution


with cdf F . Then
n µ ¶
X n
P (X(j) ≤ x) = F k (x)(1 − F (x))n−k for all x.
k
k=j

Proof. Observe that the j-th smallest value is ≤ x if any only if the total
number of observations ≤ x is at least j. The latter probability is clearly
Xn µ ¶
n
F k (x)(1 − F (x))n−k .
k
k=j
t
u

As a application,

P ( max Xi ≤ x) = P (X(n) ≤ x) = F n (x),


1≤i≤n

and

P ( min Xi ≤ x) = P (X(1) ≤ x) = (1 − F (x))n .


1≤i≤n

9
Another application of this is that if the population distribution is discrete
and P (X1 = x) > 0 then

P (X(j) = x) = P (X(j) ≤ x) − P (X(j) ≤ x−)


Xn µ ¶
n
= [F k (x)(1 − F (x))n−k − F k (x−)(1 − F (x−))n−k ].
k
k=j

If the distribution of Xi is continuous then

P (Xi = Xj ) = 0

and hence the pr. of having ties in the order statistics is 0.

Theorem 5.4.2. Let X1 , . . . , Xn be a random sample from a distribution


with pdf f and cdf F . Then
(a) fX(1) ,...,X(n) (x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ), x1 < x2 < · · · < xn ,
n! j−1
(b) fX(j) (x) = (j−1)!(n−j)! f (x)F (x)(1 − F (x))n−j , 1 ≤ j ≤ n,
n!
(c) fX(i) ,X(j) (u, v) = (i−1)!(j−i−1)!(n−j)! f (u)F i−1 (u)(F (v)−F (u))j−i−1 (1−F (v))n−j ,
u < v, 1 ≤ i < j ≤ n.

Proof.
(a) For x1 < x2 < · · · < xn ,

fX(1) ,...,X(n) (x1 , . . . , xn )


P (x1 − ²1 < X(1) ≤ x1 + ²1 , . . . , xn − ²n < X(n) ≤ xn + ²n )
= lim .
²i ↓0
1≤i≤n
(2²1 )(2²2 ) · · · (2²n )

Observe that for small ²1 , . . . , ²n > 0 the intervals (x1 − ²1 , x1 + ²1 ],

10
(x2 − ²2 , x2 + ²2 ], . . . , (xn − ²n , xn + ²n ] are mutually exclusive and so
P (x1 − ²1 < X(1) ≤ x1 + ²1 , . . . , xn − ²n < X(n) ≤ xn + ²n )
X
= P (x1 − ²1 < Xi1 ≤ x1 + ²1 , . . . , xn − ²n ≤ Xin ≤ xn + ²n )
all permutations
i1 ,...,in

= n!P (x1 − ²1 < X1 ≤ x1 + ²1 , . . . , xn − ²n < Xn ≤ xn + ²n )


n
Y
= n! P (xi − ²i < X1 ≤ xi + ²i )
i=1

since the Xi are iid. Thus,


fX(1) ,...,X(n) (x1 , . . . , xn )
Y n
P (xi − ²i < X1 ≤ xi + ²i )
= lim n!
²i ↓0
1≤i≤n i=1
2²i
= n!f (x1 ) · · · f (xn ).

(b) For ² > 0, write


P (x − ² < X(j) ≤ x + ²) = A(²) + B(²)
where
A(²) = P (x − ² < X(j) ≤ x + ², exactly one of X1 , . . . , Xn is in (x − ², x + ²])
and
B(²) = P (x − ² < X(j) ≤ x + ², two or more of X1 , . . . , Xn are in (x − ², x + ²]).
Now
n!
A(²) = P (Xs ≤ x − ², 1 ≤ s ≤ j − 1, x − ² < Xj ≤ x + ²,
(j − 1)!(n − j)!
Xt > x + ², j + 1 ≤ t ≤ n)
n!
= F j−1 (x − ²)[F (x + ²) − F (x − ²)][1 − F (x + ²)]n−j .
(j − 1)!(n − j)!
Hence
A(²) n!
lim = F j−1 (x)f (x)[1 − F (x)]n−j .
²→0 ² (j − 1)!(n − j)!

11
It is clear (exercise) that
B(²)
lim = 0.
²→0 ²
The two combined give
P (x − ² < X(j) ≤ x + ²) A(²) B(²)
lim = lim( + )
²→0 ² ²→0 ² ²
n!
= f (x)F j−1 (x)(1 − F (x))n−j .
(j − 1)!(n − j)!

(c) The proof is similar to that of (b) and is left as an exercise.


t
u

Example. Let X1 , . . . , Xn be a random sample from a distribution with pdf


f and cdf F . The joint pdf of X(1) , X(n) is

fX(1) ,X(n) (x1 , x2 ) = n(n − 1)f (x1 )f (x2 )(F (x2 ) − F (x1 ))n−2 , x1 < x2 .

Hence the joint pdf of X(1) and X(n) − X(1) is

fX(1) ,X(n) −X(1) (u, v) = fX(1) ,X(n) (u, u + v)


= n(n − 1)f (u)f (u + v)(F (u + v) − F (u))n−2 , u ∈ <, v > 0.

Hence the pdf of X(n) − X(1) is


Z ∞
fX(n) −X(1) (u) = n(n − 1)f (u) f (u + v)(F (u + v) − F (u))n−2 du.
v=0

In special cases this integral has closed form. For example CB derives this
for the uniform distribution. t
u

5.5 Convergence concepts (CB 5.5)

Definition. A sequence of random variables X1 , X2 , . . . converges in prob-


p
ability to a random variable X, denoted by Xn −→ X, if for every ² > 0,

lim P (|Xn − X| > ²) = 0.


n→∞

12
Note that the random variables X1 , X2 , . . . are not assumed independent and
if fact in order to have convergence they have to be dependent. The target
random variable is sometimes nonrandom, i.e. P (X = c) = 1 for some
p
constant c, in which case we say that Xn converges in probability to c (Xn −→
c).

There are numerous ways to prove convergence in probability. One of which


is to use the convergence of moments: If we can show that

lim E|Xn − X|p = 0 for some p > 0 (1)


n→∞

then by Chebychev’s inequality,


1
P (|Xn − X| > ²) = P (|Xn − X|p > ²p ) ≤ p
E|Xn − X|p → 0.
²
Actually, if (1) holds then we say that Xn converges to X in Lp and is denoted
Lp
by Xn −→ X. So Lp convergence implies convergence in probability.

Theorem 5.5.1. (Weak Law of Large Numbers) Let X1 , . . . , Xn be iid. rv’s


with mean µ and variance σ 2 < ∞. Then X̄n converges to µ in L2 and in
probability.

Proof.
2 σ2
E|X̄n − µ| = Var(X̄n ) = → 0 as n → ∞.
n t
u
Let Xn be a statistic and assume that the population distribution has a
p
parameter θ. Xn is said to be a (weakly) consistent estimator of θ if Xn −→ θ.
Thus, the sample mean is a consistent estimator of the population mean.

Theorem 5.5.2. Suppose that X n = (Xn,1 , . . . , Xn,k ) and X = (X1 , . . . , Xk )


p
are such that Xn,j −→ Xj , 1 ≤ j ≤ k. If g : <k → < is continuous then
X n ) → g(X
g(X X ).

Proof. Recall that a continuous mapping is uniformly continuous on any


bounded interval. Let B be a fixed positive constant. For each ² > 0 there

13
exists δ such that for any x , y with max1≤j≤k |xj | ≤ B and max1≤j≤k |xj −yj | ≤
δ, we have
x) − g(yy )| ≤ ².
|g(x
Now write
X n ) − g(X
P (|g(X X )| > ²)
X n ) − g(X
= P (|g(X X )| > ², max |Xj | ≤ B)
1≤j≤k
X n ) − g(X
+P (|g(X X )| > ², max |Xj | > B)
1≤j≤k
≤ P ( max |Xn,j − Xj | > δ, max |Xj | ≤ B)
1≤j≤k 1≤j≤k
X n ) − g(X
+P (|g(X X )| > ², max |Xj | > B)
1≤j≤k
≤ P ( max |Xn,j − Xj | > δ) + P ( max |Xj | > B)
1≤j≤k 1≤j≤k
k
X k
X
≤ P (|Xn,j − Xj | > δ) + P (|Xj | > B).
j=1 j=1

The first term tends to 0 by assumptoion and so


k
X
X n ) − g(X
lim P (|g(X X )| > ²) ≤ P (|Xj | > B).
n→∞
j=1

Since the lhs is independent of B we can take B on the rhs as big as we please
and hence the lhs is 0. t
u

Example. Let X1 , X2 , . . . , Xn be iid with finite 4-th moment. Then the


sample variance is consistent for the population variance.

Proof. Write
n
1 X 2 n
Sn2 = Xi − X̄n2 .
n − 1 i=1 n−1
By WLLN and Theorem 5.5.2,
n n
1 X 2 n 1X 2 p
Xi = Xi −→ E(X 2 ).
n − 1 i=1 n − 1 n i=1

14
p
Also, since X̄n −→ E(X) we have
n p
X̄n2 −→ E 2 (X).
n−1
Thus
p
Sn2 −→ E(X 2 ) − E 2 (X).
t
u

Let’s revisit the notion of convergence in distribution:

Definition. Let X, X1 , X2 , . . . be rv’s with cdf’s, resp., FX , FX1 , FX2 , . . ..


d
Say that Xn converges in distribution to X, denoted by Xn −→ X, if

lim FXn (x) = FX (x) at every point x where FX is continuous.


n→∞

Convergence in distrubution can be proved


(a) by verifying the definition,
(b) by showing that the pmf/pdf of Xn converging to a limiting pmf/pdf
(Schéffe’s Theorem),
(c) by showing that the mgf of Xn converges to a limiting mgf.

Example. Let U1 , U2 , . . . be iid uniform (0,1). Show that Xn = n min1≤i≤n Ui


converges in distribution and identify the limit.

Proof.

P (Xn ≤ x) = 1 − P (Xn > x)


= 1 − P ( min Ui > x/n)
1≤i≤n
n
= 1− P (U1 >
x/n)
= 1 − (1 − x/n)n
→ 1 − e−x .

This shows that Xn converges in distribution to the expenential distribution


with mean 1.
15
t
u

p
Example. In the previous example, show that min1≤i≤n Xi −→ 0.

Proof. We need to show that


P (| min Xi − 0| > ε) → 0, ε > 0.
1≤i≤n

The lhs is equal to


P ( min Xi > ε) = P (Xi > ε for all i = 1, . . . , n) = (1 − ε)n → 0.
1≤i≤n
t
u

p d
Theorem 5.5.3. If Xn −→ X then Xn −→ X. The converse is true if
P (X = c) = 1 for some c.
p
Proof. Assume that Xn −→ X and let x be a point of continuity of P (X ≤
x). Then
|P (Xn ≤ x) − P (X ≤ x)|
= |P (Xn ≤ x, X ≤ x) + P (Xn ≤ x, X > x)
−P (Xn ≤ x, X ≤ x) − P (Xn > x, X ≤ x)|
≤ P (Xn ≤ x, X > x) + P (Xn > x, X ≤ x).
For any ε > 0,
0 ≤ P (Xn ≤ x, X > x)
= P (Xn ≤ x, X ∈ (x, x + ε]) + P (Xn ≤ x, X > x + ε)
≤ P (X ∈ (x, x + ε]) + P (|Xn − X| > ε)
→ P (X ∈ (x, x + ε]) as n → ∞
by convergence in pr. Since ε > 0 is arbitrary, we have
lim P (Xn ≤ x, X > x) ≤ lim P (X ∈ (x, x + ε]) = 0.
n→∞ ε↓0

Similarly, one can show


lim P (Xn > x, X ≤ x) = 0.
n→∞

16
t
u

Theorem 5.5.4. (Central Limit Theorem) Let X1 , X2 , . . . , Xn be iid with


mean µ and variance σ 2 . Define
X̄n − µ
Zn = √ .
σ/ n
Then
Z z
1 2
lim P (Zn ≤ z) = √ e−z /2 dz.
n→∞ −∞ 2π

Proof. The result is true as stated. However, let’s assume that the mgf
M (t) of Xi exists. Without loss of generality assume that µ = 0 and σ 2 = 1.
By the Taylor expansion,
µ ¶2
√ 0 t 1 00 t
M (t/ n) = M (0) + M (0) √ + M (∆n ) √
n 2 n

where ∆n is between 0 and t/ n. Clearly
M (0) = 1, M 0 (0) = 0,
so that
√ 1 t2 00
M (t/ n) = 1 + M (∆n )
n2
Since
n
1 X
Zn = √ Xi
n i=1
we have
· ¸n
n
√ 1 t2 00
MZn (t) = M (t/ n) = 1 + M (∆n ) .
n2
Since ∆n → 0 we have
M 00 (∆n ) → M 00 (0) = 1.
Hence
2
MZn (t) → et /2
= cdf of N (0, 1).
17
t
u

Suppose we wish to estimate µ by X̄n . The probability that the estimate is


off by at most δ is
P (|X̄n − µ| ≤ δ)
µ ¶
√ X̄n − µ √
= P − nδ/σ ≤ √ ≤ nδ/σ
σ/ n
¡ √ √ ¢
≈ P − nδ/σ ≤ Z ≤ nδ/σ .
If we want this probability to be, say, 95%, then we can solve
¡ √ √ ¢
.95 = P − nδ/σ ≤ Z ≤ nδ/σ
requiring

δ = 1.96σ/ n.

Example. Suppose Yn ∼ B(n, p). Then


Y − np d
p n −→ Z ∼ N (0, 1).
np(1 − p)

Proof. One can write


n
X
Yn = Xj
j=1

where the Xj are iid B(1, p). Since EXi = p =: µ, Var(Xi ) = p(1 − p) =: σ 2 ,
Pn
Yn − np j=1 (Xj − µ) X̄n − µ
p = √ = √ .
np(1 − p) nσ 2 σ/ n
Hence the result follows readily from the CLT. t
u

Example. Suppose Y ∼ Poisson (n). Then


Yn − n d
√ −→ Z ∼ N (0, 1).
n

18
Pn
Proof. We know that Yn has the same distribution as i=1 Xi where
X1 , X2 , . . . , Xn be iid Poisson (1). Hence
Pn
Yn − n d (Xi − 1)
√ = i=1√ →Z
n n
by the CLT. t
u

d p
Theorem 5.5.5. (Slutsky’s Theorem) If Xn −→ X and Yn −→ some con-
stant a then
d
(a) Xn Yn −→ aX
d
(b) Xn + Yn −→ X + a.
t
u

Example. Let X1 , X2 , . . . , Xn be iid with mean µ and variance σ 2 . Show


that
√ d
n(X̄n − µ)/S −→ Z ∼ N (0, 1).

Proof.
√ X̄n − µ σ
n(X̄n − µ)/S = √
σ/ n S
d p
where the first term −→ Z by the CLT and the second term −→ 1 by the
WLLN. The desired result follows from Slutsky’s Theorem. t
u
d
Theorem 5.5.6. (Continuous mapping theorem) If Xn −→ X and g is a
d
continuous function then g(Xn ) −→ g(X). t
u

Example. Let X1 , X2 , . . . , Xn be iid with finite 4th moment. Derive the


asymptotic distribution of the sample variance.

Proof.
n n
1 X 1 X n
Sn2 = (Xi − X̄n )2 = (Xi − µ)2 − (X̄n − µ)2 .
n − 1 i=1 n − 1 i=1 n−1

19
Hence
n
1 X 1 n
Sn2 2
−σ = [(Xi − X̄n )2 − σ 2 ] + σ2 − (X̄n − µ)2 .
n − 1 i=1 n−1 n−1

Thus,
n
n−1 2 2 1 X 1 1 √
√ (Sn − σ ) = √ [(Xi − µ)2 − σ 2 ] + √ σ 2 − √ [ n(X̄n − µ)]2 .
n n i=1 n n

By the CLT,
√ d
n(X̄n − µ) −→ N (0, σ 2 ).

By the continuous mapping theorem,


√ d
[ n(X̄n − µ)]2 −→ σ 2 χ21 .

Hence,
1 √ p
√ [ n(X̄n − µ)]2 −→ 0.
n
Also by the CLT,
n
1 X d
√ [(Xi − µ)2 − σ 2 ] −→ N (0, µ4 )
n i=1

where µ is the 4th central moment of X. It follows from Slutsky’s Theorem


that
n−1 2 d
√ (Sn − σ 2 ) −→ N (0, µ4 ),
n
and equivalently,
√ d
n(Sn2 − σ 2 ) −→ N (0, µ4 ).
t
u

20
Theorem 5.5.7. (Delta method) Let Yn be a sequence of rv’s and θ be a
constant such that
√ d
n(Yn − θ) −→ N (0, σ 2 ).
Let g be a function with a non-zero first derivative at θ. Then
√ d
n[g(Yn ) − g(θ)] −→ N (0, σ 2 [g 0 (θ)]2 ).

Proof. By the Taylor expansion


g(Yn ) = g(θ) + g 0 (θ)(Yn − θ) + Rn
where where the remainder Rn = o(Yn − θ). It then follows from the assump-
√ d
tion n(Yn − θ) −→ N (0, σ 2 ) that
√ √ p
nRn = n(Yn − θ)o(1) −→ 0.
As a result
√ √ √ d
n[g(Yn ) − g(θ)] = g 0 (θ) n(Yn − θ) + nRn −→ g 0 (θ)Z ∼ N (0, σ 2 [g 0 (θ)]2 ).
t
u

Example. Let X1 , X2 , . . . , Xn be iid exponential with mean 1/θ (i.e. pdf of


X1 is f (x) = θe−θx I(0,∞) (x)). Then a natural estimator of θ is 1/X̄n . What
is a (1 − α)100% confidence interval for θ? By the CLT,
√ d
n(X̄n − θ−1 ) −→ N (0, θ−2 ).
Let g(x) = 1/x, x > 0. Then g 0 (θ−1 ) = −θ2
√ 1 √ d
n( − θ) = n[g(X̄n ) − g(θ−1 )] −→ N (0, θ2 ).
X̄n
p
Further, since X̄n −→ θ−1 , by Slutsky’s Theorem we have
√ 1 d
nX̄n ( − θ) −→ N (0, 1).
X̄n
Hence an approximate (1 − α)100% is
1 1
± zα/2 √ .
X̄n nX̄n
21
t
u

Theorem 5.5.8. (Second-order delta method) Let Yn be a sequence of rv’s


and θ be a constant such that
√ d
n(Yn − θ) −→ N (0, σ 2 ).

Let g be a function such that g 0 (θ) = 0 and g 00 (θ) > 0. Then


00
d 2g (θ) 2
n[g(Yn ) − g(θ)] −→ σ χ1
2

Proof. By the Taylor expansion


Yn − θ
g(Yn ) = g(θ) + g 00 (θ) + Rn
2
where where the remainder Rn = o((Yn − θ)2 ). As before
√ p
nRn = [ n(Yn − θ)]2 o(1) −→ 0.

Hence

[ n(Yn − θ)]2 d Z2
n[g(Yn ) − g(θ)] = g 00 (θ) + nRn −→ g 00 (θ)σ 2
2 2 t
u

Example. Suppose that X1 , . . . , Xn are iid. with mean µ and variance σ 2 .


What is the asymptotic distribution of X̄n2 − µ2 ?

Solution. First assume that µ 6= 0. Applying the delta method with


f (x) = x2 , we have
√ d
n(X̄n2 − µ2 ) −→ N (0, σ 2 [2µ]2 ).

If µ = 0, by the second-order delta method,


d
n(Xn2 − 0) −→ σ 2 χ21 .

22
Example. Suppose that X1 , . . . , Xn are iid. with a continuous distribution.
Derive the asymptotic distribution of the empirical distribution
n
1X
Fn (x) = I(Xi ≤ x).
n i=1

Solution. The mean and variance of the random variable I(Xi ≤ x) are
F (x) and F (x)(1 − F (x)), respectively. By the CLT,
√ d
n(Fn (x) − F (x)) −→ N (0, F (x)(1 − F (x))).
t
u

Example. Suppose that X1 , . . . , Xn are iid. with a continuous distribution.


Derive the asymptotic distribution of X([np]) .

Solution. Assume first that the Xi are uniform. First it is clear that
p
X([np]) −→ to the p-th population percentile. Hence one should center X([np])
by p. The question is what should an be so that an (X([np]) − p) converges in
distribution. Observe that

P (an (X([np]) − p) ≤ x) = P (X([np]) ≤ p + x/an )


à n !
X
= P I(Xi ≤ p + x/an ) ≥ [np] .
i=1

Let
n
1 X
Yn (x) = √ [I(Xi ≤ p + x/an ) − (p + x/an )],
n i=1

and write

Yn (x) = Yn (0) + [Yn (x) − Yn (0)].

By the previous example,


d
Yn (0) −→ N (0, p(1 − p)).

23
Now,
n
1 X
Yn (x) − Yn (0) = √ [I(p < Xi ≤ p + x/an ) − x/an ]
n i=1

and hence
1
Var(Yn (x) − Yn (0)) = nVar(I(p < X1 ≤ p + x/an ))
n
= (x/an )(1 − x/an ) → 0 as n → ∞.

As a result,
d
Yn (x) −→ N (0, p(1 − p)) for each fixed x.

Now
à n !
X
P I(Xi ≤ p + x/an ) ≥ [np]
¡√i=1 ¢
= P nYn (x) + n(p + x/an ) ≥ [np]
µ ¶
[np] − n(p + x/an )
= P Yn (x) ≥ √ .
n

In order for this to converge, we need [np]−n(p+x/a



n
n)
to converge, which is
√ √
equivalent to an = c n for some constant c. Taking an = n, we have
[np] − n(p + x/an )
√ → −x
n
and therefore
µ ¶
[np] − n(p + x/an )
lim P Yn (x) ≥ √
n→∞ n
³p ´
= P p(1 − p)Z ≥ −x
³p ´
= P p(1 − p)Z ≤ x
p
= Φ(x/ p(1 − p)).

24
Thus,
√ d
n(X([np]) − p) −→ N (0, p(1 − p)).

Next assume that the Xi are absolutely continuous with a pdf f . For con-
venience assume that F is one-to-one with inverse function F −1 . We wish
to find constants an , bn such that an (X([np]) − F −1 (p)) converges in distribu-
tion. Note that our sample X1 , . . . , Xn has the same joint distribution as
F −1 (U1 ), . . . , F −1 (Un ) where the Ui are iid. Hence

an (X([np]) − F −1 (p)) = an (F −1 (U([np]) ) − F −1 (p)).

By the previous part and the delta method,


√ d
n(F −1 (U([np]) ) − F −1 (p)) −→ N (0, c2 p(1 − p))

where
1
c = (F −1 )0 (p) = .
f (F −1 (p))
This concludes the derivation. t
u

25

Das könnte Ihnen auch gefallen