Beruflich Dokumente
Kultur Dokumente
Suppose for some reason these data are not well documented or easily ac-
cessible and the distribution or features that we are interested in cannot be
readily computed. A simple example of this is household incomes. If we
wish to know the true average U.S. household income for this month then
we have a very big task on hand because we have to gather the information
from hundreds of millions of families.
Any number which can be computed from the population is called a parameter.
Common parameters of interest are the mean, variance, percentiles, mode
(most probable value).
A statistic is any number calculated from the sample data. Suppose the
sample data P are X1 , . . . , Xn . Examples of statistics
P are the sample mean
n n
= X̄ = n−1 i=1 Xi , sample variance = (n − 1)−1 i=1 (Xi − X̄)2 , sample
percentiles, sample range (= sample maximum − sample minimum).
1
from these values as a random variable and speak about its distribution, called
sampling distribution. After the sample is drawn, then we see the value of
the statistic and there would be no distribution to speak of.
Typically we will assume that the population size is much bigger than the
sample size and that the sample observations are drawn from the population
independently of one another under very similar sampling conditions. As
such the random variables in the sample will be approximately independent
and have very similar distributions.
In this section we review some basic tools that are useful for studying the
distributional properties of statistics.
2
n
X X
= E(Xi − µi )2 + E(Xi − µi )(Xj − µj )
i=1 1≤i6=j≤n
Xn X
= Var(Xi ) + Cov(Xi , Xj ).
i=1 1≤i6=j≤n
t
u
Proof.
n
" n #
1 X 1 X
ES 2 = E (Xi − X̄)2 = E Xi2 − (nX̄)2
n − 1 i=1 n−1 i=1
· 2
¸
n σ
= σ 2 + µ2 − − µ2 = σ 2 .
n−1 n
3
As in the one-variable case, mgf’s are unique. Let (X, Y ) be bivariate normal
with pdf:
1 1
− 2(1−ρ x−µx 2
2 ) [( σx ) +(
y−µy 2 x−µx y−µy
σy ) −2ρ( σx )( σy )]
f (x, y) = p e
2πσx σy 1 − ρ2
Then
1 2 2 2 2
M (t1 , t2 ) = eµ1 t1 +µ2 t2 + 2 (σ1 t1 +2ρσ1 σ2 t1 t2 +σ2 t2 ) .
Proof. (b) is obvious so we focus on (a) and (c). We assume without loss
of generality that µ = 0, σ 2 = 1. We first prove (a). Write
n
2 1 X
S = (Xi − X̄)2
n − 1 i=1
à n
!
1 X
= (X1 − X̄)2 + (Xi − X̄)2
n−1 i=2
à !2
n n
1 X X
= (Xi − X̄) + (Xi − X̄)2 ,
n−1 i=2 i=2
U1 = X̄, Uj = Xj − X̄, 2 ≤ j ≤ n.
4
The tranformation is one-one from <n to <n . The inverse transformation is
n
X
X 1 = U1 − Ui , Xj = Uj + U1 , 2 ≤ j ≤ n
i=2
Write
n−1
X
(n − 1)Sn2 = (Xi − X̄n−1 + X̄n−1 − X̄n )2 + (Xn − X̄n )2
i=1
n−1
X n−1
X
2
= (Xi − X̄n−1 ) + 2 (Xi − X̄n−1 )(X̄n−1 − X̄n )
i=1 i=1
+(n − 1)(X̄n−1 − X̄n ) + (Xn − X̄n )2 .
2
Note that
(n − 1)(X̄n−1 − X̄n )2
à n−1 n
!2
n−1 n X X
= Xi − Xi
n2 n − 1 i=1 i=1
õ ¶Xn−1
!2
n−1 n
= −1 Xi − Xn
n2 n−1 i=1
n−1
= 2
(X̄n−1 − Xn )2
n
5
and
à n
!2
1X
(Xn − X̄n )2 = Xn − Xi
n i=1
à n−1
!2
n−1 n−1 1 X
= Xn − Xi
n n n − 1 i=1
µ ¶2
n−1 ¡ ¢2
= Xn − X̄n−1 .
n
Thus,
µ ¶2
n−1 n−1 ¡ ¢2
(n − 1)Sn2 = (n − 2
2)Sn−1+ 2
(Xn − X̄n−1 )2 + Xn − X̄n−1
n n
2 n−1
= (n − 2)Sn−1 + (Xn − X̄n−1 )2 .
n
If n = 2 then
1
(2 − 1)S22 = (X2 − X1 )2 ∼ χ21
2
since
1
√ (X2 − X1 ) ∼ N (0, 1).
2
Now suppose that (k − 1)Sk2 ∼ χ2k−1 (induction assumption) and we show
2
that kSk+1 ∼ χ2k . By the identity above,
2 k
kSk+1 = (k − 1)Sk2 + (Xk+1 − X̄k )2 .
k+1
Since Sk2 is independent of X̄k and Xk+1 , the two summands are indepen-
The first term is ∼ χ2k−1 by assumption and the second term is ∼ χ21
dent. q
k 2
since k+1 (Xk+1 − X̄k ) ∼ N (0, 1). Hence kSk+1 ∼ χ2k and (c) is proved by
induction. t
u
6
Definition. The Student’s t distribution with p degrees of freedom, where
p is any positive integer, has the pdf
¡ ¢
Γ p+1 1
f (x) = ¡ p2 ¢ .
Γ 2 (pπ)1/2 (1 + x2 /p)(p+1)/2
The F distribution with p, q degrees of freedom, where p, q are any positive
integers, has the pdf
¡ ¢ µ ¶p/2
Γ p+q2 p xp/2−1
f (x) = ¡ p ¢ ¡ q ¢ .
Γ 2 Γ 2 q [1 + (p/q)x](p+q)/2
Proof. Let
p
U = X/ Y /p, V = Y.
7
Corollary 5.3.3. Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then
X̄−µ
√ ∼ tn−1 .
S/ n
X/p
Proof. Let U = Y /q , V = X. Find the joint pdf of U, V and integrate v
out. t
u
2
Corollary 5.3.5. Let X1 , . . . , Xn be a random sample from N (µX , σX )
2
and Y1 , . . . , Ym be a random sample from N (µY , σY ) where the two random
2
samples are independent. Let SX and SY2 be the sample variances of the two
random samples. Then
2 2
SX /σX
∼ Fn−1,m−1 .
SY2 /σY2
8
5.4 Order statistics (CB 5.4)
One can define a variety of statistics using the order statistics. The sample
meadian is
(
X((n+1)/2) if n is odd
M=
(X(n/2) + X(n/2+1) )/2 if n is even.
Proof. Observe that the j-th smallest value is ≤ x if any only if the total
number of observations ≤ x is at least j. The latter probability is clearly
Xn µ ¶
n
F k (x)(1 − F (x))n−k .
k
k=j
t
u
As a application,
and
9
Another application of this is that if the population distribution is discrete
and P (X1 = x) > 0 then
P (Xi = Xj ) = 0
Proof.
(a) For x1 < x2 < · · · < xn ,
10
(x2 − ²2 , x2 + ²2 ], . . . , (xn − ²n , xn + ²n ] are mutually exclusive and so
P (x1 − ²1 < X(1) ≤ x1 + ²1 , . . . , xn − ²n < X(n) ≤ xn + ²n )
X
= P (x1 − ²1 < Xi1 ≤ x1 + ²1 , . . . , xn − ²n ≤ Xin ≤ xn + ²n )
all permutations
i1 ,...,in
11
It is clear (exercise) that
B(²)
lim = 0.
²→0 ²
The two combined give
P (x − ² < X(j) ≤ x + ²) A(²) B(²)
lim = lim( + )
²→0 ² ²→0 ² ²
n!
= f (x)F j−1 (x)(1 − F (x))n−j .
(j − 1)!(n − j)!
fX(1) ,X(n) (x1 , x2 ) = n(n − 1)f (x1 )f (x2 )(F (x2 ) − F (x1 ))n−2 , x1 < x2 .
In special cases this integral has closed form. For example CB derives this
for the uniform distribution. t
u
12
Note that the random variables X1 , X2 , . . . are not assumed independent and
if fact in order to have convergence they have to be dependent. The target
random variable is sometimes nonrandom, i.e. P (X = c) = 1 for some
p
constant c, in which case we say that Xn converges in probability to c (Xn −→
c).
Proof.
2 σ2
E|X̄n − µ| = Var(X̄n ) = → 0 as n → ∞.
n t
u
Let Xn be a statistic and assume that the population distribution has a
p
parameter θ. Xn is said to be a (weakly) consistent estimator of θ if Xn −→ θ.
Thus, the sample mean is a consistent estimator of the population mean.
13
exists δ such that for any x , y with max1≤j≤k |xj | ≤ B and max1≤j≤k |xj −yj | ≤
δ, we have
x) − g(yy )| ≤ ².
|g(x
Now write
X n ) − g(X
P (|g(X X )| > ²)
X n ) − g(X
= P (|g(X X )| > ², max |Xj | ≤ B)
1≤j≤k
X n ) − g(X
+P (|g(X X )| > ², max |Xj | > B)
1≤j≤k
≤ P ( max |Xn,j − Xj | > δ, max |Xj | ≤ B)
1≤j≤k 1≤j≤k
X n ) − g(X
+P (|g(X X )| > ², max |Xj | > B)
1≤j≤k
≤ P ( max |Xn,j − Xj | > δ) + P ( max |Xj | > B)
1≤j≤k 1≤j≤k
k
X k
X
≤ P (|Xn,j − Xj | > δ) + P (|Xj | > B).
j=1 j=1
Since the lhs is independent of B we can take B on the rhs as big as we please
and hence the lhs is 0. t
u
Proof. Write
n
1 X 2 n
Sn2 = Xi − X̄n2 .
n − 1 i=1 n−1
By WLLN and Theorem 5.5.2,
n n
1 X 2 n 1X 2 p
Xi = Xi −→ E(X 2 ).
n − 1 i=1 n − 1 n i=1
14
p
Also, since X̄n −→ E(X) we have
n p
X̄n2 −→ E 2 (X).
n−1
Thus
p
Sn2 −→ E(X 2 ) − E 2 (X).
t
u
Proof.
p
Example. In the previous example, show that min1≤i≤n Xi −→ 0.
p d
Theorem 5.5.3. If Xn −→ X then Xn −→ X. The converse is true if
P (X = c) = 1 for some c.
p
Proof. Assume that Xn −→ X and let x be a point of continuity of P (X ≤
x). Then
|P (Xn ≤ x) − P (X ≤ x)|
= |P (Xn ≤ x, X ≤ x) + P (Xn ≤ x, X > x)
−P (Xn ≤ x, X ≤ x) − P (Xn > x, X ≤ x)|
≤ P (Xn ≤ x, X > x) + P (Xn > x, X ≤ x).
For any ε > 0,
0 ≤ P (Xn ≤ x, X > x)
= P (Xn ≤ x, X ∈ (x, x + ε]) + P (Xn ≤ x, X > x + ε)
≤ P (X ∈ (x, x + ε]) + P (|Xn − X| > ε)
→ P (X ∈ (x, x + ε]) as n → ∞
by convergence in pr. Since ε > 0 is arbitrary, we have
lim P (Xn ≤ x, X > x) ≤ lim P (X ∈ (x, x + ε]) = 0.
n→∞ ε↓0
16
t
u
Proof. The result is true as stated. However, let’s assume that the mgf
M (t) of Xi exists. Without loss of generality assume that µ = 0 and σ 2 = 1.
By the Taylor expansion,
µ ¶2
√ 0 t 1 00 t
M (t/ n) = M (0) + M (0) √ + M (∆n ) √
n 2 n
√
where ∆n is between 0 and t/ n. Clearly
M (0) = 1, M 0 (0) = 0,
so that
√ 1 t2 00
M (t/ n) = 1 + M (∆n )
n2
Since
n
1 X
Zn = √ Xi
n i=1
we have
· ¸n
n
√ 1 t2 00
MZn (t) = M (t/ n) = 1 + M (∆n ) .
n2
Since ∆n → 0 we have
M 00 (∆n ) → M 00 (0) = 1.
Hence
2
MZn (t) → et /2
= cdf of N (0, 1).
17
t
u
where the Xj are iid B(1, p). Since EXi = p =: µ, Var(Xi ) = p(1 − p) =: σ 2 ,
Pn
Yn − np j=1 (Xj − µ) X̄n − µ
p = √ = √ .
np(1 − p) nσ 2 σ/ n
Hence the result follows readily from the CLT. t
u
18
Pn
Proof. We know that Yn has the same distribution as i=1 Xi where
X1 , X2 , . . . , Xn be iid Poisson (1). Hence
Pn
Yn − n d (Xi − 1)
√ = i=1√ →Z
n n
by the CLT. t
u
d p
Theorem 5.5.5. (Slutsky’s Theorem) If Xn −→ X and Yn −→ some con-
stant a then
d
(a) Xn Yn −→ aX
d
(b) Xn + Yn −→ X + a.
t
u
Proof.
√ X̄n − µ σ
n(X̄n − µ)/S = √
σ/ n S
d p
where the first term −→ Z by the CLT and the second term −→ 1 by the
WLLN. The desired result follows from Slutsky’s Theorem. t
u
d
Theorem 5.5.6. (Continuous mapping theorem) If Xn −→ X and g is a
d
continuous function then g(Xn ) −→ g(X). t
u
Proof.
n n
1 X 1 X n
Sn2 = (Xi − X̄n )2 = (Xi − µ)2 − (X̄n − µ)2 .
n − 1 i=1 n − 1 i=1 n−1
19
Hence
n
1 X 1 n
Sn2 2
−σ = [(Xi − X̄n )2 − σ 2 ] + σ2 − (X̄n − µ)2 .
n − 1 i=1 n−1 n−1
Thus,
n
n−1 2 2 1 X 1 1 √
√ (Sn − σ ) = √ [(Xi − µ)2 − σ 2 ] + √ σ 2 − √ [ n(X̄n − µ)]2 .
n n i=1 n n
By the CLT,
√ d
n(X̄n − µ) −→ N (0, σ 2 ).
Hence,
1 √ p
√ [ n(X̄n − µ)]2 −→ 0.
n
Also by the CLT,
n
1 X d
√ [(Xi − µ)2 − σ 2 ] −→ N (0, µ4 )
n i=1
20
Theorem 5.5.7. (Delta method) Let Yn be a sequence of rv’s and θ be a
constant such that
√ d
n(Yn − θ) −→ N (0, σ 2 ).
Let g be a function with a non-zero first derivative at θ. Then
√ d
n[g(Yn ) − g(θ)] −→ N (0, σ 2 [g 0 (θ)]2 ).
Hence
√
[ n(Yn − θ)]2 d Z2
n[g(Yn ) − g(θ)] = g 00 (θ) + nRn −→ g 00 (θ)σ 2
2 2 t
u
22
Example. Suppose that X1 , . . . , Xn are iid. with a continuous distribution.
Derive the asymptotic distribution of the empirical distribution
n
1X
Fn (x) = I(Xi ≤ x).
n i=1
Solution. The mean and variance of the random variable I(Xi ≤ x) are
F (x) and F (x)(1 − F (x)), respectively. By the CLT,
√ d
n(Fn (x) − F (x)) −→ N (0, F (x)(1 − F (x))).
t
u
Solution. Assume first that the Xi are uniform. First it is clear that
p
X([np]) −→ to the p-th population percentile. Hence one should center X([np])
by p. The question is what should an be so that an (X([np]) − p) converges in
distribution. Observe that
Let
n
1 X
Yn (x) = √ [I(Xi ≤ p + x/an ) − (p + x/an )],
n i=1
and write
23
Now,
n
1 X
Yn (x) − Yn (0) = √ [I(p < Xi ≤ p + x/an ) − x/an ]
n i=1
and hence
1
Var(Yn (x) − Yn (0)) = nVar(I(p < X1 ≤ p + x/an ))
n
= (x/an )(1 − x/an ) → 0 as n → ∞.
As a result,
d
Yn (x) −→ N (0, p(1 − p)) for each fixed x.
Now
à n !
X
P I(Xi ≤ p + x/an ) ≥ [np]
¡√i=1 ¢
= P nYn (x) + n(p + x/an ) ≥ [np]
µ ¶
[np] − n(p + x/an )
= P Yn (x) ≥ √ .
n
24
Thus,
√ d
n(X([np]) − p) −→ N (0, p(1 − p)).
Next assume that the Xi are absolutely continuous with a pdf f . For con-
venience assume that F is one-to-one with inverse function F −1 . We wish
to find constants an , bn such that an (X([np]) − F −1 (p)) converges in distribu-
tion. Note that our sample X1 , . . . , Xn has the same joint distribution as
F −1 (U1 ), . . . , F −1 (Un ) where the Ui are iid. Hence
where
1
c = (F −1 )0 (p) = .
f (F −1 (p))
This concludes the derivation. t
u
25