Sie sind auf Seite 1von 12

Lecture 8

Limit Theorems & Law of Large Numbers


Law of Large Numbers has to do with partial sum
n

Sn = Xj
j=1

of a sequence of r.v.’s. In the classical formulation, the weak law or the strong law of large
numbers is said to hold depending on whether
Sn − E[sn ]
→0
n
in probability or almost surely. This supposes E[Sn ] < ∞. A generalization is
S n − an
→0
bn
where {an } is a sequence of real numbers diverging to ∞ and {bn } is a sequence of positive
numbers diverging to ∞.

Recall we showed
Theorem: If Xn converges to 0 in Lp , then it coverges to 0 in probability. Conversely if {Xn }
is dominated by Y and Y is in Lp , then if Xn converges to 0 in probability then Xn → 0 in Lp .

From this we have if Zn is any sequence of r.v.’s � E [Zn2 ] → 0, then Zn → 0 in proba-


bility.

Also recall:
Theorem: If Xn → X in probability, then ∃ a subsequence nk � Xnk → X almost surely.

Thus Zn → 0 in probability ⇒ Znk → 0 almost surely.


Sn
Applied to n
, we have
� �
E[Sn2 ]
If E(Sn2 ) 2
= o(n ) this means →0
n2
� �
2 E[Sn2 ] 2
Also if we had O(n ) in place of o(n ) ⇒ → constant
n2
Sn
then n
→ 0 in probability

Note: � �2  � n �
�n � �
� 2�
E Sn = E  Xj  = E Xj2 + 2 Xj Xk
j=1 j=1 1≤j<k≤n

1
n
� �
� �
= E Xj2 + 2 E [Xj Xk ]
j=1 j<k
n
� � �
Note that E Xj2 has n terms
j=1

and E [Xj Xk ] has n(n − 1) terms
j<k

Thus E [Sn2 ] expands to n + (n(n − 1)) = n2 terms.

If each is bounded by a constant, then


� � � � � �
E Sn2 = O n2 not o n2 .

Thus this is not adequate to develop even a weak law. The problem is with the mixed terms.
If we could show either they don’t exist (i.e. =0) or that they decay quickly in some sense,
then we would be Ok.

Definition: Two r.v.’s X and Y are uncorrelated if and only if both have finite second
moments and
E[XY ] = E[X]E[Y ].
X and Y are orthogonal iff E[XY ] = 0.

In general r.v.’s in any family are uncorrelated [orthogonal] iff every pair of them are.

Consider
E [[X − E[X]][Y − E[Y ]]]
= E[X · Y ] − E[X] · E[Y ] + E[X]E[Y ] − E[X]E[Y ]
= E[X · Y ] − E[X]E[Y ].

In case of uncorrelated r.v.’s


= E[X]E[Y ] − E[X]E[Y ] = 0.
At first it appears the finite second moment assumption is unnecessary. However, it is needed
to show E[XY ] is finite by Cauchy-Schwartz inequality.

Also note that pairwise independence implies uncorrelatedness, but not vice versa (find
a counterexample!)

If {Xn } is a sequence of uncorrelated r.v.’s, then the sequence {Xn − E[Xn ]} is orthogo-
nal

In this case
σ 2 (Sn ) = E[Sn − E[Sn ]]2

2
n
� �
= E[Xj − E[Xj ]]2 + 2 E[Xj − E[Xj ]][Xk − E[Xk ]]
j=1 j<k
n
� n

2
= E[Xj − E[Xj ]] = σ 2 (Xj ).
j=1 j=1

Because there are only n terms on the r.h.s., if each is bounded by a fixed constant, then
σ 2 (Sn ) = O(n) = o(n2 ).

Thus we have,
� �
Theorem: If {Xn } is a sequence of uncorrelated random variables and E Xj2 < c for
some common upper bound, then
Sn − E(Sn )
→ 0 in probability.
n

This is the elementary form of the Weak Law of Large Numbers. (WLLN)

Theorem: (Strong Law of Large Numbers): If {Xj } is a sequence of uncorrelated r.v.’s


with E[Xj2 ] < c for some common upper bound, then
E[Sn ]
Sn − → 0 a.s.
n

Proof: Without loss of generality we may assume E[Xj ] = 0. (If not replace Xj by
Xj − E[Xj ])

By assumption
E[Sn2 ] ≤ cn

By Chebyshev’s Inequality
cn c
P [|Sn | > n�] ≤ 2 2
= 2.
n� n�
If we sum over n,
�∞
c
2
diverges.
n=1
n�

However if we sum over n2



� �∞
c
P [|Sn2 | 2
> n �] ≤ < ∞.
n=1 n=1
n �2
2

3
⇒ by Borel-Cantelli
P [|Sn2 | > n2 � i.o.] = 0.

But Xn → 0 a.s. iff ∀� > 0P [|Xn | > � i.o. ] = 0


Sn2
Thus n2
→ 0 a.s.

Now for each n ≥ 1,


Dn = max |Sk − Sn2 |
n2 ≤k<(n+1)2

(n+1) 2
� 2� �� �� �
E Dn ≤ 2nE �S(n+1)2 − Sn2 � = 2n σ 2 (Xj ) ≤ 4n2 c.
j=n2 +1

By Chebyshev’s Inequality
� � 4n2 c 4c
P Dn > n 2 � ≤ 2 4 = 2 2
�n �n
⇒ P [Sn > n2 � i.o. ] = 0.
Dn
⇒ → 0 a.s.
n2

|Sk | ≤ |Sk − Sn2 + Sn2 | ≤ |Sk − Sn2 | + |Sn2 | ≤ |Sn2 | + Dn


for n2 ≤ k < (n + 1)2

|Sk | |Sn2 | + Dn |Sn2 | + Dn


≤ ≤ → 0 a.s.
k k n2
|Sk |
⇒0≤ → 0 a.s.
k
This is the strong law of large numbers. Note that the hypotheses of both the WLLN and the
SLLN are satisfied if we have a sequence of independent r.v.’s that are uniformly bounded
or that are independent and identically distributed (iid) with finite second moment. The
present theorems actually have weaker conditions.

Borel’s SLLN.

Let ω ∈ [0, 1] and expand ω in decimal notation

ω = .x1 x2 x3 · · · xn · · ·

4
Note that examples such as
.314159999 · · · and .314160000 · · ·
are the same number.

Except for such cases, the decimal expansion is unique and these are only a countable
number of such cases.
(n)
Fix 0 ≤ k ≤ 9 and let νk (ω) = # of digits in the first n of ω equal to k.
(n)
νk (ω)
is the relative frequency of k.
n
If
(n)
νk (ω) n→∞
−→ ϕk (ω) exists.
n
then
ϕk (ω) is the frequency of k in ω

1
If ϕk (ω) = 10
for every k = 0, 1, · · · , 9, then ω is simply normal.

Borel SLLN: Except for a Borel set of measure 0, every number in [0,1] is simply nor-
mal.

Proof: Exercise: Hint see pages 60 and 110 of Kai Lai Chung’s book, A course in Prob-
ability Theory.

Weaker form of the WLLN.

Definition: Two sequences of r.v.’s {Xn } and {Yn } are said to be equivalent iff


P [Xn �= Yn ] < ∞
n=1

Theorem: If {Xn } and {Yn } are equivalent, then




(Xn − Yn ) converges a.s.
n=1

Furthermore if an � ∞, then
n
1 �
(Xj − Yj ) → 0 a.s.
an j=1

5
Proof: By Borel-Cantelli,


P [Xn �= Yn ] < ∞ implies
n=1

P [Xn �= Yn i.o. ] = 0. ⇒ ∃ a null set N � if ω ∈ Ω \ N then ∃n0 (ω) � n ≥ n0 (ω) ⇒


Xn (ω) = Yn (ω)
⇒ {Xn (ω)} and {Yn (ω)} differ only in a finite number of terms.

This implies


(Xn (ω) − Yn (ω))
n=1

consists of zeros from a certain point on. ⇒




(Xn (ω) − Yn (ω)) is finite ∀ω ∈ Ω \ N.
n=1

Indeed let ∞

(Xn (ω) − Yn (ω)) = Z(ω)∀ω ∈ Ω \ N
n=1
� �
�1 � n � 1
� �
⇒� (Xn (ω) − Yn (ω))� ≤ Z(ω)∀n ≥ n0 (ω), ω ∈ Ω \ N.
� an � an
j=1
n
1 �
⇒ (Xn (ω) − Yn (ω)) → 0∀ω ∈ Ω \ N.
an j=1

Corollary: With probability 1,



� n
1 �
Xn or Xj
n=1
an j=1

converges, diverges to +∞ or −∞, or fluctuates in the same way as



� n
1 �
Yn or Yj
n=1
an j=1

respectively. In particular,
n
1 �
Xj converges in probability iff
an j=1

6
n
1 �
Yj converges in probability.
an j=1

Proof: If n
1 �
Xj → X in probability, then
an j=1
n n n
1 � 1 � 1 �
Yj = (Yj − Xj ) + Xj → 0 + X = X in probability
an j=1 an j=1 an j=1

Khintchine’s WLLN: Let {Xn } be pairwise independent and identically distributed (iid)
with finite mean µ.
Then
Sn
→ µ in probability
n

Proof: Let F be the common distribution function


� ∞
µ = E[Xn ] = xdF (x)
−∞
� ∞
E[|Xn |] = |x|dF (x).
−∞
But ∞

E|X1 | < ∞ iff P (|X1 | > n) < ∞
n=1

Because all Xn have the same distribution.




P [|Xn | > n] < ∞
n=1

Define {Yn } by truncating at n, i.e.



Xn (ω) if |Xn (ω)| ≤ n
Yn (ω) =
0 if |Xn (ω) > n.


� ∞
� ∞

P (Xn �= Yn ) = P [|Xj | > j] ≤ P [|Xj | > j] < ∞
n=1 j=n j=1

⇒ Xn and Yn are equivalent.


n

Tn = Yj
j=1

7
Tn
Show n
→ µ in probability.

Yn are pairwise independent, hence uncorrelated


Since |Yn | < n, Yn have a finite second moment.
n
� n
� n �
2 2
� 2� �
σ (Tn ) = σ (Yi ) ≤ E Yj = X 2 dF (x)
j=1 j=1 j=1 |x|≤j

choose 0 < an < n � an → ∞ but an = o(n)


�n � �� � �
2 2
X dF = X dF + X 2 dF
j=1 |x|<j j≤an |x|<j an <j≤n |x|<j

� � � �
≤ an |X|dF + an |X|dF
j≤an |x|≤an an <j≤n |x|≤an

� �
+ n |X|dF
an <j≤n an <|x|≤n

� ∞ �
2
≤ nan |X|dF + n |X|dF
−∞ |x|>an

The first term is O(nan ) → c · n · an


c·n·an
but n2
= c ann → 0 since an = o(n)

⇒ o(nan ) = o(n2 )

But �
|X|dF → 0 as n → ∞
|x|>an

⇒ |X|dF = o(1)
|x|>an

n2 o(1) = o(n2 ).

⇒ σ 2 (Tn ) = o(n2 )

n
Tn − ETn 1�
⇒ = {Yj − E[Yj ]} → 0 in probability
n n j=1

8
But E[Yj ] → E[X] = µ ⇒
n
1�
E[Yj ] → µ
n j=1
n
Tn 1�
⇒ = Yj → µ in probability
n n j=1

But Yn and Xn are equivalent


n
Sn 1�
⇒ = Xj → µ in probability
n n j=1

Theorem: Let {Xn } be a sequence of independent r.v.’s with distribution functions {Fn }.
Let n

Sn = Xj .
j=1

Let {bn } be a sequence of real numbers � bn � ∞ and


n �

i) dFj (x) = o(1)
j=1 |X|>bn

n �
1 �
ii) 2 X 2 dFj (x) = o(1)
bn j=1 |X|>b

Then, if
n �

an = XdFj (x)
j=1 |x|≤bn

Sn −an
we have bn
→ 0 in probability.

Moreover suppose ∃λ > 0 � Fn (0) ≥ λ ∀n and


1 − Fn (0−) ≥ λ ∀n (i.e. Fn (0−) ≤ 1 − λ ∀n)

Sn −an
then if bn
→ 0 in probability

i) and ii) must hold. (The converse left as exercise)

Proof: Fn (0) ≥ λ iff P [Xn ≤ 0] ≥ λ

and Fn (0−) ≤ 1 − λ iff P [Xn ≥ 0] ≥ λ.

9
These conditions assume that none of the distributions is too far off center, if λ = 12 , then 0
is the median of the Fj .

Proof: For each n ≥ 1 and 1 ≤ j ≤ n



Xj |Xj | ≤ bn
Yn,j =
0 |Xj | > bn .
Let n

Tn = Yn,j
j=1

Condition i) becomes
n

P [Yn,y �= Xj ] = o(1)
j=1
� n
� n
� �
P [Tn �= Sn ] ≤ P (Yn,j �= Xj ) ≤ P [Yn,j �= Xj ] = o(1)
j=1 j=1

Condition ii) may be rewritten


n
�� �2 �
� Yn,j
E = o(1)
j=1
bn

Because {Ynj , 1 ≤ j ≤ n} are independent r.v.’s


� � � n � � � n
�� �2 �
Tn Y n,j Y n,j
σ2 = σ2 ≤ E = o(1)
bn j=1
b n j=1
b n

Tn −E(Tn )
⇒ bn
→ 0 in probability

By construction,


P [Xj �= Yn,j ] = o(1)
j=1


⇒ P [Xj �= Yn,j ] → 0 as n → ∞
j=1

⇒ Xj and Yn,j are equivalent.


⇒ Sn −E(T
bn
n)
→ 0 in probability.

Finally
n
� n �

E[Tn ] = E[Yn,j ] = XdFj (x) = an
j=1 j=1 |x|≤bn

10
Example: Let {Xn } be independent random variables with common distribution function
F � c
P [X1 = n] = P [X1 = −n] = 2 , n = 3, 4, · · ·
n log n
with
1
c = �∞ 1
2 n=3 n2 log n

� ∞
� c
P [X1 = n] = 2
n=−∞ n=3
n2 log n
n�=−2,−1,0,1,2


� �
� 1 1
=2 �∞ 1 = 1.
n=3
n2 log n 2 n=3 n2 log n

� � c
n dF = n
|x|>n k>n
k2 log k
But �
� 1 ∞
1

= dx
2
k log k x2 log x
k>n n

Integrate by parts
1 dx −1
f (x) = log x
, g � (x) = x2
, g(x) = − x1 , f � (x) = x log2 (x)
� �
f (x)g (x)dx = f (x)g(x) − f � (x)g(x)dx

� ∞ �∞ � ∞ � �
1 1 1 −1 1 −1
· 2 dx = · − dx
n log x x log x x n n x x

� � ∞ � ∞
1 1 −1 1 −1
= dx = + + dx
k>n
2
k log k n
2
x log x ∞ n log n n x2 log2 (x)
� �
1 1 1
= +o 2 = .
n log n n log (n) n log n

� � c 1 c
⇒n dF = n = · nc = .
|X|>n k>n
k2 log k n log n log n

In a similar way
� n
1 1 � ck 2 c
·n X 2 dF (x) = ∼ .
n2 |X| n k=3 k 2 log k log n

11
Conditions i) and ii) are satisfied with bn = n. Because of the symmetry of Fj , an = 0.
Sn
n
→ 0 in probability in spite of the fact that
� 2cn � 1
E|X1 | = = 2c =∞
n2 log n n log n
On the other hand,
c
P [|X1 | > n] ∼
n log n
c
⇒ P [|Xn | > n] ∼
n log n
�∞
⇒ P [|Xn | > n] = ∞
n=1

By Borel Cantelli
P [|Xn | > n i.o.] = 1
But
|Sn − Sn−1 | = |Xn | > n

n n
⇒ |Sn | > or |Sn−1 | > .
2 2
� n �
⇒ P |Sn | > i.o. = 1
2
Sn
⇒ � 0 a.s.
n
Thus we have convergence in probability but not convergence a.s.

WLLN but not SLLN.

12

Das könnte Ihnen auch gefallen