Ch5 Guzel Ders Notu

Chapter 5 Properties of a Random Sample
5.1 Population and sample (CB 5.1)
Suppose we are interested in the distribution or certain features of a collection

of data. We call this collection a population.
Suppose for some reason these data are not well documented or easily ac-
cessible and the distribution or features that we are interested in cannot be
readily computed. A simple example of this is household incomes. If we
wish to know the true average U.S. household income for this month then
we have a very big task on hand because we have to gather the information
from hundreds of millions of families.
A solution of this is to draw a sample from the population, in other words to

select a subset from the population, and use the sample information to make
inference on the truth. How to best do this and how to handle sampling
variability are among the most important issues in statistics
The populations features that might be of interest include: the shape of

the distribution (is it symmetric or skewed, does it have one single peak
or multiple peaks, etc.), whether a standard distribution (normal, gamma,
Weibull, Poisson, etc) could serve as a reasonable approximation, what is the
mean, variance, percentiles, etc.
Any number which can be computed from the population is called a parameter.
Common parameters of interest are the mean, variance, percentiles, mode
(most probable value).
A statistic is any number calculated from the sample data. Suppose the
sample data P are X1 , . . . , Xn . Examples of statistics
P are the sample mean
n n
= X̄ = n−1 i=1 Xi , sample variance = (n − 1)−1 i=1 (Xi − X̄)2 , sample
percentiles, sample range (= sample maximum − sample minimum).
Consider the experiment of drawing at random a sample from a population.

Before the sample is drawn, we can think of the sample values to be observed
as random variables. In that sense we can also think of any statistic computed
1
from these values as a random variable and speak about its distribution, called
sampling distribution. After the sample is drawn, then we see the value of
the statistic and there would be no distribution to speak of.
Typically we will assume that the population size is much bigger than the
sample size and that the sample observations are drawn from the population
independently of one another under very similar sampling conditions. As
such the random variables in the sample will be approximately independent
and have very similar distributions.
We say that a collection of rv’s X1 , . . . , Xn form a random sample if they are

iid. We will for the most part assume that this is case. In practice, this is of
course often violated. The iid. theory is nevertheless relavant since the iid.
model can be used as a fundamental building block for complicated models
of dependence.
5.2 Basic tools (CB 5.1)
In this section we review some basic tools that are useful for studying the
distributional properties of statistics.
Assume that X1 , . . . , Xn are iid.

(a) If E(X1 ) = µ then E(X̄) = µ.
(b) If Var(X1 ) = σ 2 then Var(X̄) = σ 2 /n.
Proof. In general, given rv’s X1 , . . . , Xn

Ã n !
X
Var Xi
i=1
Ã n
!2
X
= E (Xi − µi )
i=1
n
XX n
= E(Xi − µi )(Xj − µj )
i=1 j=1
2
n
X X
= E(Xi − µi )2 + E(Xi − µi )(Xj − µj )
i=1 1≤i6=j≤n
Xn X
= Var(Xi ) + Cov(Xi , Xj ).
i=1 1≤i6=j≤n
t
u
(c) If E(X1 ) = µ and Var(X1 ) = σ 2 then E(S 2 ) = σ 2 .
Proof.
n
" n #
1 X 1 X
ES 2 = E (Xi − X̄)2 = E Xi2 − (nX̄)2
n − 1 i=1 n−1 i=1
· 2
¸
n σ
= σ 2 + µ2 − − µ2 = σ 2 .
n−1 n
(d) If X1 has mgf M (t) then X̄ has mgf M n (t/n).

(e) If X1 has cdf F then maxni=1 Xi has cdf F n .
If X, Y have a joint pdf fX,Y then the joint pdf of X and Z = X + Y is
fX,Z (x, z) = fX,Y (x, z − x)
and hence the pdf of Z is

Z
fZ (z) = fX,Y (x, z − x)dx.
If X, Y are independent then

Z
fZ (z) = fX (x)fY (z − x)dx,
which is called the convolution of fX and fY .
The mgf of a rv (X1 , . . . , Xn ) is defined by

Pn
M (t1 , . . . , tn ) = Ee i=1 ti Xi .
3
As in the one-variable case, mgf’s are unique. Let (X, Y ) be bivariate normal
with pdf:
1 1
− 2(1−ρ x−µx 2
2 ) [( σx ) +(
y−µy 2 x−µx y−µy
σy ) −2ρ( σx )( σy )]
f (x, y) = p e
2πσx σy 1 − ρ2
Then
1 2 2 2 2
M (t1 , t2 ) = eµ1 t1 +µ2 t2 + 2 (σ1 t1 +2ρσ1 σ2 t1 t2 +σ2 t2 ) .
5.3 Sampling from the normal distribution (CB 5.3)
Theorem 5.3.1. Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then

(a) X̄ and S 2 are independent rv’s,
(b) X̄ ∼ N (µ, σ 2 /n),
(c) (n − 1)S 2 /σ 2 ∼ χ2 with n − 1 degrees of freedom. (Recall that χ2 (p) =
gamma (p/2, 2).
Proof. (b) is obvious so we focus on (a) and (c). We assume without loss
of generality that µ = 0, σ 2 = 1. We first prove (a). Write
n
2 1 X
S = (Xi − X̄)2
n − 1 i=1
Ã n
!
1 X
= (X1 − X̄)2 + (Xi − X̄)2
n−1 i=2
Ã !2 
n n
1  X X
= (Xi − X̄) + (Xi − X̄)2  ,
n−1 i=2 i=2
which show that S 2 is a function of X2 − X̄, . . . , Xn − X̄. If we can show that

these rv’s are jointly independent of X̄ then we are done. So this is what we
do now. Consider the transformation
U1 = X̄, Uj = Xj − X̄, 2 ≤ j ≤ n.
4
The tranformation is one-one from <n to <n . The inverse transformation is
n
X
X 1 = U1 − Ui , Xj = Uj + U1 , 2 ≤ j ≤ n
i=2
and the Jacobian is equal to n. Thus,

n Pn Pn
− 12 (u1 − i=2 ui )2 − 12 i=2 (ui +u1 )2
u) =
fU (u e e
(2π)n/2
n Pn Pn
− n2 u21 − 12 [( i=2 ui )2 + i=2 u2i ]
= e e .
(2π)n/2
Thus, (a) is proved. We now prove (c). Let
n n
1X 2 1 X
X̄n = Xi , S n = (Xi − X̄n )2 .
n i=1 n − 1 i=1
Write
n−1
X
(n − 1)Sn2 = (Xi − X̄n−1 + X̄n−1 − X̄n )2 + (Xn − X̄n )2
i=1
n−1
X n−1
X
2
= (Xi − X̄n−1 ) + 2 (Xi − X̄n−1 )(X̄n−1 − X̄n )
i=1 i=1
+(n − 1)(X̄n−1 − X̄n ) + (Xn − X̄n )2 .
2
Note that
(n − 1)(X̄n−1 − X̄n )2
Ã n−1 n
!2
n−1 n X X
= Xi − Xi
n2 n − 1 i=1 i=1
Ãµ ¶Xn−1
!2
n−1 n
= −1 Xi − Xn
n2 n−1 i=1
n−1
= 2
(X̄n−1 − Xn )2
n
5
and
Ã n
!2
1X
(Xn − X̄n )2 = Xn − Xi
n i=1
Ã n−1
!2
n−1 n−1 1 X
= Xn − Xi
n n n − 1 i=1
µ ¶2
n−1 ¡ ¢2
= Xn − X̄n−1 .
n
Thus,
µ ¶2
n−1 n−1 ¡ ¢2
(n − 1)Sn2 = (n − 2
2)Sn−1+ 2
(Xn − X̄n−1 )2 + Xn − X̄n−1
n n
2 n−1
= (n − 2)Sn−1 + (Xn − X̄n−1 )2 .
n
If n = 2 then
1
(2 − 1)S22 = (X2 − X1 )2 ∼ χ21
2
since
1
√ (X2 − X1 ) ∼ N (0, 1).
2
Now suppose that (k − 1)Sk2 ∼ χ2k−1 (induction assumption) and we show
2
that kSk+1 ∼ χ2k . By the identity above,
2 k
kSk+1 = (k − 1)Sk2 + (Xk+1 − X̄k )2 .
k+1
Since Sk2 is independent of X̄k and Xk+1 , the two summands are indepen-
The first term is ∼ χ2k−1 by assumption and the second term is ∼ χ21
dent. q
k 2
since k+1 (Xk+1 − X̄k ) ∼ N (0, 1). Hence kSk+1 ∼ χ2k and (c) is proved by
induction. t
u
6
Definition. The Student’s t distribution with p degrees of freedom, where
p is any positive integer, has the pdf
¡ ¢
Γ p+1 1
f (x) = ¡ p2 ¢ .
Γ 2 (pπ)1/2 (1 + x2 /p)(p+1)/2
The F distribution with p, q degrees of freedom, where p, q are any positive
integers, has the pdf
¡ ¢ µ ¶p/2
Γ p+q2 p xp/2−1
f (x) = ¡ p ¢ ¡ q ¢ .
Γ 2 Γ 2 q [1 + (p/q)x](p+q)/2
Theorem 5.3.2. Let X, Y be independent rv’s, X ∼ N (0, 1) and Y ∼ χ2p

p
then X/ Y /p ∼ tp .
Proof. Let
p
U = X/ Y /p, V = Y.
The transformation is one-one with inverse tranformation

p
X = U V /p, Y = V
p
and Jacobian v/p. Hence
p p
fU,V (u, v) = fX (u v/p)fY (v) v/p
1 − 1 u2 v/p 1 p/2−1 −v/2
p
= √ e 2 v e v/p.
2π Γ(p/2)2p/2
Thus,
Z ∞
fU (u) = fU,V (u, v)dv
0 Z ∞
1 1 2
= √ e− 2 (1+u /p)v v (p+1)/2−1 dv
2π 0
1 1 Γ((p + 1)/2)
= √ .
2π Γ(p/2)2p/2 [ 21 (1 + u2 /p)](p+1)/2
7
Corollary 5.3.3. Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then
X̄−µ
√ ∼ tn−1 .
S/ n
Proof. By Theorem 5.3.1, X̄ and S 2 are independent and

√ (n − 1)S 2
(X̄ − µ)/(σ n) ∼ N (0, 1), 2
∼ χ2n−1 .
σ
Hence it follows from Theorem 5.3.2 that
√
X̄ − µ (X̄ − µ)/(σ n)
√ =p ∼ tn−1 .
S/ n [(n − 1)S 2 /σ 2 ]/(n − 1)
t
u
Theorem 5.3.4. Let X, Y be independent rv’s, X ∼ χ2p and Y ∼ χ2q then

X/p
Y /q ∼ Fp,q .
X/p
Proof. Let U = Y /q , V = X. Find the joint pdf of U, V and integrate v
out. t
u
2
Corollary 5.3.5. Let X1 , . . . , Xn be a random sample from N (µX , σX )
2
and Y1 , . . . , Ym be a random sample from N (µY , σY ) where the two random
2
samples are independent. Let SX and SY2 be the sample variances of the two
random samples. Then
2 2
SX /σX
∼ Fn−1,m−1 .
SY2 /σY2
Proof. By Theorem 5.3.1,

2
(n − 1)SX 2 (m − 1)SY2
2 ∼ χn−1 , 2 ∼ χ2m−1 .
σX σY
It follows from Theorem 5.3.4. that
2 2 2 2
SX /σX [(n − 1)SX /σX ]/(n − 1)
2 2 = 2 2 ∼ Fn−1,m−1 .
SY /σY [(m − 1)SY /σY ]/(m − 1)
t
u
8
5.4 Order statistics (CB 5.4)
Definition. The order statistics of a sample X1 , . . . , Xn are the sample

values placed in ascending order. The i-th order statistic is denoted by X(i) .
One can define a variety of statistics using the order statistics. The sample
meadian is
(
X((n+1)/2) if n is odd
M=
(X(n/2) + X(n/2+1) )/2 if n is even.
The median is a measure of central tendency which is robust against “out-

liers”. More generally, the sample (100)p-th percentile is equal to X{np} if
1 1
2n < p < .5 and X(n+1−{n(1−p)}) if .5 < p < 1 − 2n . The 25-th percentile is
called the first sample quartile, the 75-th percentile is called the third sample
quartile. Sample percentiles are estimates of the population percentiles.
Theorem 5.4.1. Let X1 , . . . , Xn be a random sample from a distribution

with cdf F . Then
n µ ¶
X n
P (X(j) ≤ x) = F k (x)(1 − F (x))n−k for all x.
k
k=j
Proof. Observe that the j-th smallest value is ≤ x if any only if the total
number of observations ≤ x is at least j. The latter probability is clearly
Xn µ ¶
n
F k (x)(1 − F (x))n−k .
k
k=j
t
u
As a application,
P ( max Xi ≤ x) = P (X(n) ≤ x) = F n (x),

1≤i≤n
and
P ( min Xi ≤ x) = P (X(1) ≤ x) = (1 − F (x))n .

1≤i≤n
9
Another application of this is that if the population distribution is discrete
and P (X1 = x) > 0 then
P (X(j) = x) = P (X(j) ≤ x) − P (X(j) ≤ x−)

Xn µ ¶
n
= [F k (x)(1 − F (x))n−k − F k (x−)(1 − F (x−))n−k ].
k
k=j
If the distribution of Xi is continuous then
P (Xi = Xj ) = 0
and hence the pr. of having ties in the order statistics is 0.
Theorem 5.4.2. Let X1 , . . . , Xn be a random sample from a distribution

with pdf f and cdf F . Then
(a) fX(1) ,...,X(n) (x1 , . . . , xn ) = n!f (x1 ) · · · f (xn ), x1 < x2 < · · · < xn ,
n! j−1
(b) fX(j) (x) = (j−1)!(n−j)! f (x)F (x)(1 − F (x))n−j , 1 ≤ j ≤ n,
n!
(c) fX(i) ,X(j) (u, v) = (i−1)!(j−i−1)!(n−j)! f (u)F i−1 (u)(F (v)−F (u))j−i−1 (1−F (v))n−j ,
u < v, 1 ≤ i < j ≤ n.
Proof.
(a) For x1 < x2 < · · · < xn ,
fX(1) ,...,X(n) (x1 , . . . , xn )

P (x1 − ²1 < X(1) ≤ x1 + ²1 , . . . , xn − ²n < X(n) ≤ xn + ²n )
= lim .
²i ↓0
1≤i≤n
(2²1 )(2²2 ) · · · (2²n )
Observe that for small ²1 , . . . , ²n > 0 the intervals (x1 − ²1 , x1 + ²1 ],
10
(x2 − ²2 , x2 + ²2 ], . . . , (xn − ²n , xn + ²n ] are mutually exclusive and so
P (x1 − ²1 < X(1) ≤ x1 + ²1 , . . . , xn − ²n < X(n) ≤ xn + ²n )
X
= P (x1 − ²1 < Xi1 ≤ x1 + ²1 , . . . , xn − ²n ≤ Xin ≤ xn + ²n )
all permutations
i1 ,...,in
= n!P (x1 − ²1 < X1 ≤ x1 + ²1 , . . . , xn − ²n < Xn ≤ xn + ²n )

n
Y
= n! P (xi − ²i < X1 ≤ xi + ²i )
i=1
since the Xi are iid. Thus,

fX(1) ,...,X(n) (x1 , . . . , xn )
Y n
P (xi − ²i < X1 ≤ xi + ²i )
= lim n!
²i ↓0
1≤i≤n i=1
2²i
= n!f (x1 ) · · · f (xn ).
(b) For ² > 0, write

P (x − ² < X(j) ≤ x + ²) = A(²) + B(²)
where
A(²) = P (x − ² < X(j) ≤ x + ², exactly one of X1 , . . . , Xn is in (x − ², x + ²])
and
B(²) = P (x − ² < X(j) ≤ x + ², two or more of X1 , . . . , Xn are in (x − ², x + ²]).
Now
n!
A(²) = P (Xs ≤ x − ², 1 ≤ s ≤ j − 1, x − ² < Xj ≤ x + ²,
(j − 1)!(n − j)!
Xt > x + ², j + 1 ≤ t ≤ n)
n!
= F j−1 (x − ²)[F (x + ²) − F (x − ²)][1 − F (x + ²)]n−j .
(j − 1)!(n − j)!
Hence
A(²) n!
lim = F j−1 (x)f (x)[1 − F (x)]n−j .
²→0 ² (j − 1)!(n − j)!
11
It is clear (exercise) that
B(²)
lim = 0.
²→0 ²
The two combined give
P (x − ² < X(j) ≤ x + ²) A(²) B(²)
lim = lim( + )
²→0 ² ²→0 ² ²
n!
= f (x)F j−1 (x)(1 − F (x))n−j .
(j − 1)!(n − j)!
(c) The proof is similar to that of (b) and is left as an exercise.

t
u
Example. Let X1 , . . . , Xn be a random sample from a distribution with pdf

f and cdf F . The joint pdf of X(1) , X(n) is
fX(1) ,X(n) (x1 , x2 ) = n(n − 1)f (x1 )f (x2 )(F (x2 ) − F (x1 ))n−2 , x1 < x2 .
Hence the joint pdf of X(1) and X(n) − X(1) is
fX(1) ,X(n) −X(1) (u, v) = fX(1) ,X(n) (u, u + v)

= n(n − 1)f (u)f (u + v)(F (u + v) − F (u))n−2 , u ∈ <, v > 0.
Hence the pdf of X(n) − X(1) is

Z ∞
fX(n) −X(1) (u) = n(n − 1)f (u) f (u + v)(F (u + v) − F (u))n−2 du.
v=0
In special cases this integral has closed form. For example CB derives this
for the uniform distribution. t
u
5.5 Convergence concepts (CB 5.5)
Definition. A sequence of random variables X1 , X2 , . . . converges in prob-

p
ability to a random variable X, denoted by Xn −→ X, if for every ² > 0,
lim P (|Xn − X| > ²) = 0.

n→∞
12
Note that the random variables X1 , X2 , . . . are not assumed independent and
if fact in order to have convergence they have to be dependent. The target
random variable is sometimes nonrandom, i.e. P (X = c) = 1 for some
p
constant c, in which case we say that Xn converges in probability to c (Xn −→
c).
There are numerous ways to prove convergence in probability. One of which

is to use the convergence of moments: If we can show that
lim E|Xn − X|p = 0 for some p > 0 (1)

n→∞
then by Chebychev’s inequality,

1
P (|Xn − X| > ²) = P (|Xn − X|p > ²p ) ≤ p
E|Xn − X|p → 0.
²
Actually, if (1) holds then we say that Xn converges to X in Lp and is denoted
Lp
by Xn −→ X. So Lp convergence implies convergence in probability.
Theorem 5.5.1. (Weak Law of Large Numbers) Let X1 , . . . , Xn be iid. rv’s

with mean µ and variance σ 2 < ∞. Then X̄n converges to µ in L2 and in
probability.
Proof.
2 σ2
E|X̄n − µ| = Var(X̄n ) = → 0 as n → ∞.
n t
u
Let Xn be a statistic and assume that the population distribution has a
p
parameter θ. Xn is said to be a (weakly) consistent estimator of θ if Xn −→ θ.
Thus, the sample mean is a consistent estimator of the population mean.
Theorem 5.5.2. Suppose that X n = (Xn,1 , . . . , Xn,k ) and X = (X1 , . . . , Xk )

p
are such that Xn,j −→ Xj , 1 ≤ j ≤ k. If g : <k → < is continuous then
X n ) → g(X
g(X X ).
Proof. Recall that a continuous mapping is uniformly continuous on any

bounded interval. Let B be a fixed positive constant. For each ² > 0 there
13
exists δ such that for any x , y with max1≤j≤k |xj | ≤ B and max1≤j≤k |xj −yj | ≤
δ, we have
x) − g(yy )| ≤ ².
|g(x
Now write
X n ) − g(X
P (|g(X X )| > ²)
X n ) − g(X
= P (|g(X X )| > ², max |Xj | ≤ B)
1≤j≤k
X n ) − g(X
+P (|g(X X )| > ², max |Xj | > B)
1≤j≤k
≤ P ( max |Xn,j − Xj | > δ, max |Xj | ≤ B)
1≤j≤k 1≤j≤k
X n ) − g(X
+P (|g(X X )| > ², max |Xj | > B)
1≤j≤k
≤ P ( max |Xn,j − Xj | > δ) + P ( max |Xj | > B)
1≤j≤k 1≤j≤k
k
X k
X
≤ P (|Xn,j − Xj | > δ) + P (|Xj | > B).
j=1 j=1
The first term tends to 0 by assumptoion and so

k
X
X n ) − g(X
lim P (|g(X X )| > ²) ≤ P (|Xj | > B).
n→∞
j=1
Since the lhs is independent of B we can take B on the rhs as big as we please
and hence the lhs is 0. t
u
Example. Let X1 , X2 , . . . , Xn be iid with finite 4-th moment. Then the

sample variance is consistent for the population variance.
Proof. Write
n
1 X 2 n
Sn2 = Xi − X̄n2 .
n − 1 i=1 n−1
By WLLN and Theorem 5.5.2,
n n
1 X 2 n 1X 2 p
Xi = Xi −→ E(X 2 ).
n − 1 i=1 n − 1 n i=1
14
p
Also, since X̄n −→ E(X) we have
n p
X̄n2 −→ E 2 (X).
n−1
Thus
p
Sn2 −→ E(X 2 ) − E 2 (X).
t
u
Let’s revisit the notion of convergence in distribution:
Definition. Let X, X1 , X2 , . . . be rv’s with cdf’s, resp., FX , FX1 , FX2 , . . ..

d
Say that Xn converges in distribution to X, denoted by Xn −→ X, if
lim FXn (x) = FX (x) at every point x where FX is continuous.

n→∞
Convergence in distrubution can be proved

(a) by verifying the definition,
(b) by showing that the pmf/pdf of Xn converging to a limiting pmf/pdf
(Schéffe’s Theorem),
(c) by showing that the mgf of Xn converges to a limiting mgf.
Example. Let U1 , U2 , . . . be iid uniform (0,1). Show that Xn = n min1≤i≤n Ui

converges in distribution and identify the limit.
Proof.
P (Xn ≤ x) = 1 − P (Xn > x)

= 1 − P ( min Ui > x/n)
1≤i≤n
n
= 1− P (U1 >
x/n)
= 1 − (1 − x/n)n
→ 1 − e−x .
This shows that Xn converges in distribution to the expenential distribution

with mean 1.
15
t
u
p
Example. In the previous example, show that min1≤i≤n Xi −→ 0.
Proof. We need to show that

P (| min Xi − 0| > ε) → 0, ε > 0.
1≤i≤n
The lhs is equal to

P ( min Xi > ε) = P (Xi > ε for all i = 1, . . . , n) = (1 − ε)n → 0.
1≤i≤n
t
u
p d
Theorem 5.5.3. If Xn −→ X then Xn −→ X. The converse is true if
P (X = c) = 1 for some c.
p
Proof. Assume that Xn −→ X and let x be a point of continuity of P (X ≤
x). Then
|P (Xn ≤ x) − P (X ≤ x)|
= |P (Xn ≤ x, X ≤ x) + P (Xn ≤ x, X > x)
−P (Xn ≤ x, X ≤ x) − P (Xn > x, X ≤ x)|
≤ P (Xn ≤ x, X > x) + P (Xn > x, X ≤ x).
For any ε > 0,
0 ≤ P (Xn ≤ x, X > x)
= P (Xn ≤ x, X ∈ (x, x + ε]) + P (Xn ≤ x, X > x + ε)
≤ P (X ∈ (x, x + ε]) + P (|Xn − X| > ε)
→ P (X ∈ (x, x + ε]) as n → ∞
by convergence in pr. Since ε > 0 is arbitrary, we have
lim P (Xn ≤ x, X > x) ≤ lim P (X ∈ (x, x + ε]) = 0.
n→∞ ε↓0
Similarly, one can show

lim P (Xn > x, X ≤ x) = 0.
n→∞
16
t
u
Theorem 5.5.4. (Central Limit Theorem) Let X1 , X2 , . . . , Xn be iid with

mean µ and variance σ 2 . Define
X̄n − µ
Zn = √ .
σ/ n
Then
Z z
1 2
lim P (Zn ≤ z) = √ e−z /2 dz.
n→∞ −∞ 2π
Proof. The result is true as stated. However, let’s assume that the mgf
M (t) of Xi exists. Without loss of generality assume that µ = 0 and σ 2 = 1.
By the Taylor expansion,
µ ¶2
√ 0 t 1 00 t
M (t/ n) = M (0) + M (0) √ + M (∆n ) √
n 2 n
√
where ∆n is between 0 and t/ n. Clearly
M (0) = 1, M 0 (0) = 0,
so that
√ 1 t2 00
M (t/ n) = 1 + M (∆n )
n2
Since
n
1 X
Zn = √ Xi
n i=1
we have
· ¸n
n
√ 1 t2 00
MZn (t) = M (t/ n) = 1 + M (∆n ) .
n2
Since ∆n → 0 we have
M 00 (∆n ) → M 00 (0) = 1.
Hence
2
MZn (t) → et /2
= cdf of N (0, 1).
17
t
u
Suppose we wish to estimate µ by X̄n . The probability that the estimate is

off by at most δ is
P (|X̄n − µ| ≤ δ)
µ ¶
√ X̄n − µ √
= P − nδ/σ ≤ √ ≤ nδ/σ
σ/ n
¡ √ √ ¢
≈ P − nδ/σ ≤ Z ≤ nδ/σ .
If we want this probability to be, say, 95%, then we can solve
¡ √ √ ¢
.95 = P − nδ/σ ≤ Z ≤ nδ/σ
requiring
√
δ = 1.96σ/ n.
Example. Suppose Yn ∼ B(n, p). Then

Y − np d
p n −→ Z ∼ N (0, 1).
np(1 − p)
Proof. One can write

n
X
Yn = Xj
j=1
where the Xj are iid B(1, p). Since EXi = p =: µ, Var(Xi ) = p(1 − p) =: σ 2 ,
Pn
Yn − np j=1 (Xj − µ) X̄n − µ
p = √ = √ .
np(1 − p) nσ 2 σ/ n
Hence the result follows readily from the CLT. t
u
Example. Suppose Y ∼ Poisson (n). Then

Yn − n d
√ −→ Z ∼ N (0, 1).
n
18
Pn
Proof. We know that Yn has the same distribution as i=1 Xi where
X1 , X2 , . . . , Xn be iid Poisson (1). Hence
Pn
Yn − n d (Xi − 1)
√ = i=1√ →Z
n n
by the CLT. t
u
d p
Theorem 5.5.5. (Slutsky’s Theorem) If Xn −→ X and Yn −→ some con-
stant a then
d
(a) Xn Yn −→ aX
d
(b) Xn + Yn −→ X + a.
t
u
Example. Let X1 , X2 , . . . , Xn be iid with mean µ and variance σ 2 . Show

that
√ d
n(X̄n − µ)/S −→ Z ∼ N (0, 1).
Proof.
√ X̄n − µ σ
n(X̄n − µ)/S = √
σ/ n S
d p
where the first term −→ Z by the CLT and the second term −→ 1 by the
WLLN. The desired result follows from Slutsky’s Theorem. t
u
d
Theorem 5.5.6. (Continuous mapping theorem) If Xn −→ X and g is a
d
continuous function then g(Xn ) −→ g(X). t
u
Example. Let X1 , X2 , . . . , Xn be iid with finite 4th moment. Derive the

asymptotic distribution of the sample variance.
Proof.
n n
1 X 1 X n
Sn2 = (Xi − X̄n )2 = (Xi − µ)2 − (X̄n − µ)2 .
n − 1 i=1 n − 1 i=1 n−1
19
Hence
n
1 X 1 n
Sn2 2
−σ = [(Xi − X̄n )2 − σ 2 ] + σ2 − (X̄n − µ)2 .
n − 1 i=1 n−1 n−1
Thus,
n
n−1 2 2 1 X 1 1 √
√ (Sn − σ ) = √ [(Xi − µ)2 − σ 2 ] + √ σ 2 − √ [ n(X̄n − µ)]2 .
n n i=1 n n
By the CLT,
√ d
n(X̄n − µ) −→ N (0, σ 2 ).
By the continuous mapping theorem,

√ d
[ n(X̄n − µ)]2 −→ σ 2 χ21 .
Hence,
1 √ p
√ [ n(X̄n − µ)]2 −→ 0.
n
Also by the CLT,
n
1 X d
√ [(Xi − µ)2 − σ 2 ] −→ N (0, µ4 )
n i=1
where µ is the 4th central moment of X. It follows from Slutsky’s Theorem

that
n−1 2 d
√ (Sn − σ 2 ) −→ N (0, µ4 ),
n
and equivalently,
√ d
n(Sn2 − σ 2 ) −→ N (0, µ4 ).
t
u
20
Theorem 5.5.7. (Delta method) Let Yn be a sequence of rv’s and θ be a
constant such that
√ d
n(Yn − θ) −→ N (0, σ 2 ).
Let g be a function with a non-zero first derivative at θ. Then
√ d
n[g(Yn ) − g(θ)] −→ N (0, σ 2 [g 0 (θ)]2 ).
Proof. By the Taylor expansion

g(Yn ) = g(θ) + g 0 (θ)(Yn − θ) + Rn
where where the remainder Rn = o(Yn − θ). It then follows from the assump-
√ d
tion n(Yn − θ) −→ N (0, σ 2 ) that
√ √ p
nRn = n(Yn − θ)o(1) −→ 0.
As a result
√ √ √ d
n[g(Yn ) − g(θ)] = g 0 (θ) n(Yn − θ) + nRn −→ g 0 (θ)Z ∼ N (0, σ 2 [g 0 (θ)]2 ).
t
u
Example. Let X1 , X2 , . . . , Xn be iid exponential with mean 1/θ (i.e. pdf of

X1 is f (x) = θe−θx I(0,∞) (x)). Then a natural estimator of θ is 1/X̄n . What
is a (1 − α)100% confidence interval for θ? By the CLT,
√ d
n(X̄n − θ−1 ) −→ N (0, θ−2 ).
Let g(x) = 1/x, x > 0. Then g 0 (θ−1 ) = −θ2
√ 1 √ d
n( − θ) = n[g(X̄n ) − g(θ−1 )] −→ N (0, θ2 ).
X̄n
p
Further, since X̄n −→ θ−1 , by Slutsky’s Theorem we have
√ 1 d
nX̄n ( − θ) −→ N (0, 1).
X̄n
Hence an approximate (1 − α)100% is
1 1
± zα/2 √ .
X̄n nX̄n
21
t
u
Theorem 5.5.8. (Second-order delta method) Let Yn be a sequence of rv’s

and θ be a constant such that
√ d
n(Yn − θ) −→ N (0, σ 2 ).
Let g be a function such that g 0 (θ) = 0 and g 00 (θ) > 0. Then

00
d 2g (θ) 2
n[g(Yn ) − g(θ)] −→ σ χ1
2
Proof. By the Taylor expansion

Yn − θ
g(Yn ) = g(θ) + g 00 (θ) + Rn
2
where where the remainder Rn = o((Yn − θ)2 ). As before
√ p
nRn = [ n(Yn − θ)]2 o(1) −→ 0.
Hence
√
[ n(Yn − θ)]2 d Z2
n[g(Yn ) − g(θ)] = g 00 (θ) + nRn −→ g 00 (θ)σ 2
2 2 t
u
Example. Suppose that X1 , . . . , Xn are iid. with mean µ and variance σ 2 .

What is the asymptotic distribution of X̄n2 − µ2 ?
Solution. First assume that µ 6= 0. Applying the delta method with

f (x) = x2 , we have
√ d
n(X̄n2 − µ2 ) −→ N (0, σ 2 [2µ]2 ).
If µ = 0, by the second-order delta method,

d
n(Xn2 − 0) −→ σ 2 χ21 .
22
Example. Suppose that X1 , . . . , Xn are iid. with a continuous distribution.
Derive the asymptotic distribution of the empirical distribution
n
1X
Fn (x) = I(Xi ≤ x).
n i=1
Solution. The mean and variance of the random variable I(Xi ≤ x) are
F (x) and F (x)(1 − F (x)), respectively. By the CLT,
√ d
n(Fn (x) − F (x)) −→ N (0, F (x)(1 − F (x))).
t
u
Example. Suppose that X1 , . . . , Xn are iid. with a continuous distribution.

Derive the asymptotic distribution of X([np]) .
Solution. Assume first that the Xi are uniform. First it is clear that
p
X([np]) −→ to the p-th population percentile. Hence one should center X([np])
by p. The question is what should an be so that an (X([np]) − p) converges in
distribution. Observe that
P (an (X([np]) − p) ≤ x) = P (X([np]) ≤ p + x/an )

Ã n !
X
= P I(Xi ≤ p + x/an ) ≥ [np] .
i=1
Let
n
1 X
Yn (x) = √ [I(Xi ≤ p + x/an ) − (p + x/an )],
n i=1
and write
Yn (x) = Yn (0) + [Yn (x) − Yn (0)].
By the previous example,

d
Yn (0) −→ N (0, p(1 − p)).
23
Now,
n
1 X
Yn (x) − Yn (0) = √ [I(p < Xi ≤ p + x/an ) − x/an ]
n i=1
and hence
1
Var(Yn (x) − Yn (0)) = nVar(I(p < X1 ≤ p + x/an ))
n
= (x/an )(1 − x/an ) → 0 as n → ∞.
As a result,
d
Yn (x) −→ N (0, p(1 − p)) for each fixed x.
Now
Ã n !
X
P I(Xi ≤ p + x/an ) ≥ [np]
¡√i=1 ¢
= P nYn (x) + n(p + x/an ) ≥ [np]
µ ¶
[np] − n(p + x/an )
= P Yn (x) ≥ √ .
n
In order for this to converge, we need [np]−n(p+x/a

√
n
n)
to converge, which is
√ √
equivalent to an = c n for some constant c. Taking an = n, we have
[np] − n(p + x/an )
√ → −x
n
and therefore
µ ¶
[np] − n(p + x/an )
lim P Yn (x) ≥ √
n→∞ n
³p ´
= P p(1 − p)Z ≥ −x
³p ´
= P p(1 − p)Z ≤ x
p
= Φ(x/ p(1 − p)).
24
Thus,
√ d
n(X([np]) − p) −→ N (0, p(1 − p)).
Next assume that the Xi are absolutely continuous with a pdf f . For con-
venience assume that F is one-to-one with inverse function F −1 . We wish
to find constants an , bn such that an (X([np]) − F −1 (p)) converges in distribu-
tion. Note that our sample X1 , . . . , Xn has the same joint distribution as
F −1 (U1 ), . . . , F −1 (Un ) where the Ui are iid. Hence
an (X([np]) − F −1 (p)) = an (F −1 (U([np]) ) − F −1 (p)).
By the previous part and the delta method,

√ d
n(F −1 (U([np]) ) − F −1 (p)) −→ N (0, c2 p(1 − p))
where
1
c = (F −1 )0 (p) = .
f (F −1 (p))
This concludes the derivation. t
u
25

Ch5 Guzel Ders Notu

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ch5 Guzel Ders Notu

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 5 Properties of a Random Sample

5.1 Population and sample (CB 5.1)

Suppose we are interested in the distribution or certain features of a collection

A solution of this is to draw a sample from the population, in other words to

The populations features that might be of interest include: the shape of

Consider the experiment of drawing at random a sample from a population.

We say that a collection of rv’s X1 , . . . , Xn form a random sample if they are

5.2 Basic tools (CB 5.1)

Assume that X1 , . . . , Xn are iid.

Proof. In general, given rv’s X1 , . . . , Xn

(c) If E(X1 ) = µ and Var(X1 ) = σ 2 then E(S 2 ) = σ 2 .

(d) If X1 has mgf M (t) then X̄ has mgf M n (t/n).

If X, Y have a joint pdf fX,Y then the joint pdf of X and Z = X + Y is

fX,Z (x, z) = fX,Y (x, z − x)

and hence the pdf of Z is

If X, Y are independent then

which is called the convolution of fX and fY .

The mgf of a rv (X1 , . . . , Xn ) is defined by

5.3 Sampling from the normal distribution (CB 5.3)

Theorem 5.3.1. Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then

which show that S 2 is a function of X2 − X̄, . . . , Xn − X̄. If we can show that

and the Jacobian is equal to n. Thus,

Theorem 5.3.2. Let X, Y be independent rv’s, X ∼ N (0, 1) and Y ∼ χ2p

The transformation is one-one with inverse tranformation

Proof. By Theorem 5.3.1, X̄ and S 2 are independent and

Theorem 5.3.4. Let X, Y be independent rv’s, X ∼ χ2p and Y ∼ χ2q then

Proof. By Theorem 5.3.1,

Definition. The order statistics of a sample X1 , . . . , Xn are the sample

The median is a measure of central tendency which is robust against “out-

Theorem 5.4.1. Let X1 , . . . , Xn be a random sample from a distribution

P ( max Xi ≤ x) = P (X(n) ≤ x) = F n (x),

P ( min Xi ≤ x) = P (X(1) ≤ x) = (1 − F (x))n .

P (X(j) = x) = P (X(j) ≤ x) − P (X(j) ≤ x−)

If the distribution of Xi is continuous then

and hence the pr. of having ties in the order statistics is 0.

Theorem 5.4.2. Let X1 , . . . , Xn be a random sample from a distribution

fX(1) ,...,X(n) (x1 , . . . , xn )

Observe that for small ²1 , . . . , ²n > 0 the intervals (x1 − ²1 , x1 + ²1 ],

= n!P (x1 − ²1 < X1 ≤ x1 + ²1 , . . . , xn − ²n < Xn ≤ xn + ²n )

since the Xi are iid. Thus,

(b) For ² > 0, write

(c) The proof is similar to that of (b) and is left as an exercise.

Example. Let X1 , . . . , Xn be a random sample from a distribution with pdf

Hence the joint pdf of X(1) and X(n) − X(1) is

fX(1) ,X(n) −X(1) (u, v) = fX(1) ,X(n) (u, u + v)

Hence the pdf of X(n) − X(1) is

5.5 Convergence concepts (CB 5.5)

Definition. A sequence of random variables X1 , X2 , . . . converges in prob-

lim P (|Xn − X| > ²) = 0.

There are numerous ways to prove convergence in probability. One of which

lim E|Xn − X|p = 0 for some p > 0 (1)

then by Chebychev’s inequality,

Theorem 5.5.1. (Weak Law of Large Numbers) Let X1 , . . . , Xn be iid. rv’s

Theorem 5.5.2. Suppose that X n = (Xn,1 , . . . , Xn,k ) and X = (X1 , . . . , Xk )

Proof. Recall that a continuous mapping is uniformly continuous on any

The first term tends to 0 by assumptoion and so

Example. Let X1 , X2 , . . . , Xn be iid with finite 4-th moment. Then the

Let’s revisit the notion of convergence in distribution:

Definition. Let X, X1 , X2 , . . . be rv’s with cdf’s, resp., FX , FX1 , FX2 , . . ..

lim FXn (x) = FX (x) at every point x where FX is continuous.

Convergence in distrubution can be proved

Example. Let U1 , U2 , . . . be iid uniform (0,1). Show that Xn = n min1≤i≤n Ui

P (Xn ≤ x) = 1 − P (Xn > x)