Lecture8 Weak Law

Lecture 8
Limit Theorems & Law of Large Numbers

Law of Large Numbers has to do with partial sum
n
�
Sn = Xj
j=1
of a sequence of r.v.’s. In the classical formulation, the weak law or the strong law of large
numbers is said to hold depending on whether
Sn − E[sn ]
→0
n
in probability or almost surely. This supposes E[Sn ] < ∞. A generalization is
S n − an
→0
bn
where {an } is a sequence of real numbers diverging to ∞ and {bn } is a sequence of positive
numbers diverging to ∞.
Recall we showed
Theorem: If Xn converges to 0 in Lp , then it coverges to 0 in probability. Conversely if {Xn }
is dominated by Y and Y is in Lp , then if Xn converges to 0 in probability then Xn → 0 in Lp .
From this we have if Zn is any sequence of r.v.’s � E [Zn2 ] → 0, then Zn → 0 in proba-

bility.
Also recall:
Theorem: If Xn → X in probability, then ∃ a subsequence nk � Xnk → X almost surely.
Thus Zn → 0 in probability ⇒ Znk → 0 almost surely.

Sn
Applied to n
, we have
� �
E[Sn2 ]
If E(Sn2 ) 2
= o(n ) this means →0
n2
� �
2 E[Sn2 ] 2
Also if we had O(n ) in place of o(n ) ⇒ → constant
n2
Sn
then n
→ 0 in probability
Note: � �2  � n �
�n � �
� 2�
E Sn = E  Xj  = E Xj2 + 2 Xj Xk
j=1 j=1 1≤j<k≤n
1
n
� �
� �
= E Xj2 + 2 E [Xj Xk ]
j=1 j<k
n
� � �
Note that E Xj2 has n terms
j=1
�
and E [Xj Xk ] has n(n − 1) terms
j<k
Thus E [Sn2 ] expands to n + (n(n − 1)) = n2 terms.
If each is bounded by a constant, then

� � � � � �
E Sn2 = O n2 not o n2 .
Thus this is not adequate to develop even a weak law. The problem is with the mixed terms.
If we could show either they don’t exist (i.e. =0) or that they decay quickly in some sense,
then we would be Ok.
Definition: Two r.v.’s X and Y are uncorrelated if and only if both have finite second
moments and
E[XY ] = E[X]E[Y ].
X and Y are orthogonal iff E[XY ] = 0.
In general r.v.’s in any family are uncorrelated [orthogonal] iff every pair of them are.
Consider
E [[X − E[X]][Y − E[Y ]]]
= E[X · Y ] − E[X] · E[Y ] + E[X]E[Y ] − E[X]E[Y ]
= E[X · Y ] − E[X]E[Y ].
In case of uncorrelated r.v.’s

= E[X]E[Y ] − E[X]E[Y ] = 0.
At first it appears the finite second moment assumption is unnecessary. However, it is needed
to show E[XY ] is finite by Cauchy-Schwartz inequality.
Also note that pairwise independence implies uncorrelatedness, but not vice versa (find
a counterexample!)
If {Xn } is a sequence of uncorrelated r.v.’s, then the sequence {Xn − E[Xn ]} is orthogo-
nal
In this case
σ 2 (Sn ) = E[Sn − E[Sn ]]2
2
n
� �
= E[Xj − E[Xj ]]2 + 2 E[Xj − E[Xj ]][Xk − E[Xk ]]
j=1 j<k
n
� n
�
2
= E[Xj − E[Xj ]] = σ 2 (Xj ).
j=1 j=1
Because there are only n terms on the r.h.s., if each is bounded by a fixed constant, then
σ 2 (Sn ) = O(n) = o(n2 ).
Thus we have,
� �
Theorem: If {Xn } is a sequence of uncorrelated random variables and E Xj2 < c for
some common upper bound, then
Sn − E(Sn )
→ 0 in probability.
n
This is the elementary form of the Weak Law of Large Numbers. (WLLN)
Theorem: (Strong Law of Large Numbers): If {Xj } is a sequence of uncorrelated r.v.’s

with E[Xj2 ] < c for some common upper bound, then
E[Sn ]
Sn − → 0 a.s.
n
Proof: Without loss of generality we may assume E[Xj ] = 0. (If not replace Xj by
Xj − E[Xj ])
By assumption
E[Sn2 ] ≤ cn
By Chebyshev’s Inequality
cn c
P [|Sn | > n�] ≤ 2 2
= 2.
n� n�
If we sum over n,
�∞
c
2
diverges.
n=1
n�
However if we sum over n2

∞
� �∞
c
P [|Sn2 | 2
> n �] ≤ < ∞.
n=1 n=1
n �2
2
3
⇒ by Borel-Cantelli
P [|Sn2 | > n2 � i.o.] = 0.
But Xn → 0 a.s. iff ∀� > 0P [|Xn | > � i.o. ] = 0

Sn2
Thus n2
→ 0 a.s.
Now for each n ≥ 1,

Dn = max |Sk − Sn2 |
n2 ≤k<(n+1)2
(n+1) 2
� 2� ��
E Dn ≤ 2nE �S(n+1)2 − Sn2 � = 2n σ 2 (Xj ) ≤ 4n2 c.
j=n2 +1
By Chebyshev’s Inequality
� � 4n2 c 4c
P Dn > n 2 � ≤ 2 4 = 2 2
�n �n
⇒ P [Sn > n2 � i.o. ] = 0.
Dn
⇒ → 0 a.s.
n2
|Sk | ≤ |Sk − Sn2 + Sn2 | ≤ |Sk − Sn2 | + |Sn2 | ≤ |Sn2 | + Dn

for n2 ≤ k < (n + 1)2
|Sk | |Sn2 | + Dn |Sn2 | + Dn

≤ ≤ → 0 a.s.
k k n2
|Sk |
⇒0≤ → 0 a.s.
k
This is the strong law of large numbers. Note that the hypotheses of both the WLLN and the
SLLN are satisfied if we have a sequence of independent r.v.’s that are uniformly bounded
or that are independent and identically distributed (iid) with finite second moment. The
present theorems actually have weaker conditions.
Borel’s SLLN.
Let ω ∈ [0, 1] and expand ω in decimal notation
ω = .x1 x2 x3 · · · xn · · ·
4
Note that examples such as
.314159999 · · · and .314160000 · · ·
are the same number.
Except for such cases, the decimal expansion is unique and these are only a countable
number of such cases.
(n)
Fix 0 ≤ k ≤ 9 and let νk (ω) = # of digits in the first n of ω equal to k.
(n)
νk (ω)
is the relative frequency of k.
n
If
(n)
νk (ω) n→∞
−→ ϕk (ω) exists.
n
then
ϕk (ω) is the frequency of k in ω
1
If ϕk (ω) = 10
for every k = 0, 1, · · · , 9, then ω is simply normal.
Borel SLLN: Except for a Borel set of measure 0, every number in [0,1] is simply nor-
mal.
Proof: Exercise: Hint see pages 60 and 110 of Kai Lai Chung’s book, A course in Prob-
ability Theory.
Weaker form of the WLLN.
Definition: Two sequences of r.v.’s {Xn } and {Yn } are said to be equivalent iff
∞
�
P [Xn �= Yn ] < ∞
n=1
Theorem: If {Xn } and {Yn } are equivalent, then

∞
�
(Xn − Yn ) converges a.s.
n=1
Furthermore if an � ∞, then
n
1 �
(Xj − Yj ) → 0 a.s.
an j=1
5
Proof: By Borel-Cantelli,
∞
�
P [Xn �= Yn ] < ∞ implies
n=1
P [Xn �= Yn i.o. ] = 0. ⇒ ∃ a null set N � if ω ∈ Ω \ N then ∃n0 (ω) � n ≥ n0 (ω) ⇒

Xn (ω) = Yn (ω)
⇒ {Xn (ω)} and {Yn (ω)} differ only in a finite number of terms.
This implies
∞
�
(Xn (ω) − Yn (ω))
n=1
consists of zeros from a certain point on. ⇒

∞
�
(Xn (ω) − Yn (ω)) is finite ∀ω ∈ Ω \ N.
n=1
Indeed let ∞
�
(Xn (ω) − Yn (ω)) = Z(ω)∀ω ∈ Ω \ N
n=1
� �
�1 � n � 1
� �
⇒� (Xn (ω) − Yn (ω))� ≤ Z(ω)∀n ≥ n0 (ω), ω ∈ Ω \ N.
� an � an
j=1
n
1 �
⇒ (Xn (ω) − Yn (ω)) → 0∀ω ∈ Ω \ N.
an j=1
Corollary: With probability 1,

∞
� n
1 �
Xn or Xj
n=1
an j=1
converges, diverges to +∞ or −∞, or fluctuates in the same way as

∞
� n
1 �
Yn or Yj
n=1
an j=1
respectively. In particular,
n
1 �
Xj converges in probability iff
an j=1
6
n
1 �
Yj converges in probability.
an j=1
Proof: If n
1 �
Xj → X in probability, then
an j=1
n n n
1 � 1 � 1 �
Yj = (Yj − Xj ) + Xj → 0 + X = X in probability
an j=1 an j=1 an j=1
Khintchine’s WLLN: Let {Xn } be pairwise independent and identically distributed (iid)
with finite mean µ.
Then
Sn
→ µ in probability
n
Proof: Let F be the common distribution function

� ∞
µ = E[Xn ] = xdF (x)
−∞
� ∞
E[|Xn |] = |x|dF (x).
−∞
But ∞
�
E|X1 | < ∞ iff P (|X1 | > n) < ∞
n=1
Because all Xn have the same distribution.

∞
�
P [|Xn | > n] < ∞
n=1
Define {Yn } by truncating at n, i.e.

�
Xn (ω) if |Xn (ω)| ≤ n
Yn (ω) =
0 if |Xn (ω) > n.
∞
� ∞
� ∞
�
P (Xn �= Yn ) = P [|Xj | > j] ≤ P [|Xj | > j] < ∞
n=1 j=n j=1
⇒ Xn and Yn are equivalent.

n
�
Tn = Yj
j=1
7
Tn
Show n
→ µ in probability.
Yn are pairwise independent, hence uncorrelated

Since |Yn | < n, Yn have a finite second moment.
n
� n
� n �
2 2
� 2� �
σ (Tn ) = σ (Yi ) ≤ E Yj = X 2 dF (x)
j=1 j=1 j=1 |x|≤j
choose 0 < an < n � an → ∞ but an = o(n)

�n � ��
2 2
X dF = X dF + X 2 dF
j=1 |x|<j j≤an |x|<j an <j≤n |x|<j
� � � �
≤ an |X|dF + an |X|dF
j≤an |x|≤an an <j≤n |x|≤an
� �
+ n |X|dF
an <j≤n an <|x|≤n
� ∞ �
2
≤ nan |X|dF + n |X|dF
−∞ |x|>an
The first term is O(nan ) → c · n · an

c·n·an
but n2
= c ann → 0 since an = o(n)
⇒ o(nan ) = o(n2 )
But �
|X|dF → 0 as n → ∞
|x|>an
�
⇒ |X|dF = o(1)
|x|>an
n2 o(1) = o(n2 ).
⇒ σ 2 (Tn ) = o(n2 )
n
Tn − ETn 1�
⇒ = {Yj − E[Yj ]} → 0 in probability
n n j=1
8
But E[Yj ] → E[X] = µ ⇒
n
1�
E[Yj ] → µ
n j=1
n
Tn 1�
⇒ = Yj → µ in probability
n n j=1
But Yn and Xn are equivalent

n
Sn 1�
⇒ = Xj → µ in probability
n n j=1
Theorem: Let {Xn } be a sequence of independent r.v.’s with distribution functions {Fn }.
Let n
�
Sn = Xj .
j=1
Let {bn } be a sequence of real numbers � bn � ∞ and

n �
�
i) dFj (x) = o(1)
j=1 |X|>bn
n �
1 �
ii) 2 X 2 dFj (x) = o(1)
bn j=1 |X|>b
Then, if
n �
�
an = XdFj (x)
j=1 |x|≤bn
Sn −an
we have bn
Moreover suppose ∃λ > 0 � Fn (0) ≥ λ ∀n and

1 − Fn (0−) ≥ λ ∀n (i.e. Fn (0−) ≤ 1 − λ ∀n)
Sn −an
then if bn
i) and ii) must hold. (The converse left as exercise)
Proof: Fn (0) ≥ λ iff P [Xn ≤ 0] ≥ λ
and Fn (0−) ≤ 1 − λ iff P [Xn ≥ 0] ≥ λ.
9
These conditions assume that none of the distributions is too far off center, if λ = 12 , then 0
is the median of the Fj .
Proof: For each n ≥ 1 and 1 ≤ j ≤ n

�
Xj |Xj | ≤ bn
Yn,j =
0 |Xj | > bn .
Let n
�
Tn = Yn,j
j=1
Condition i) becomes
n
�
P [Yn,y �= Xj ] = o(1)
j=1
� n
� n
� �
P [Tn �= Sn ] ≤ P (Yn,j �= Xj ) ≤ P [Yn,j �= Xj ] = o(1)
j=1 j=1
Condition ii) may be rewritten

n
�� 2 �
� Yn,j
E = o(1)
j=1
bn
Because {Ynj , 1 ≤ j ≤ n} are independent r.v.’s

� � � n � � � n
�� 2 �
Tn Y n,j Y n,j
σ2 = σ2 ≤ E = o(1)
bn j=1
b n j=1
b n
Tn −E(Tn )
⇒ bn
By construction,
∞
�
P [Xj �= Yn,j ] = o(1)
j=1
∞
�
⇒ P [Xj �= Yn,j ] → 0 as n → ∞
j=1
⇒ Xj and Yn,j are equivalent.

⇒ Sn −E(T
bn
n)
Finally
n
� n �
�
E[Tn ] = E[Yn,j ] = XdFj (x) = an
j=1 j=1 |x|≤bn
10
Example: Let {Xn } be independent random variables with common distribution function
F � c
P [X1 = n] = P [X1 = −n] = 2 , n = 3, 4, · · ·
n log n
with
1
c = �∞ 1
2 n=3 n2 log n
∞
� ∞
� c
P [X1 = n] = 2
n=−∞ n=3
n2 log n
n�=−2,−1,0,1,2
∞
� �
� 1 1
=2 �∞ 1 = 1.
n=3
n2 log n 2 n=3 n2 log n
� � c
n dF = n
|x|>n k>n
k2 log k
But �
� 1 ∞
1
∼
= dx
2
k log k x2 log x
k>n n
Integrate by parts
1 dx −1
f (x) = log x
, g � (x) = x2
, g(x) = − x1 , f � (x) = x log2 (x)
� �
f (x)g (x)dx = f (x)g(x) − f � (x)g(x)dx
�
� ∞ �∞ � ∞ � �
1 1 1 −1 1 −1
· 2 dx = · − dx
n log x x log x x n n x x
� � ∞ � ∞
1 1 −1 1 −1
= dx = + + dx
k>n
2
k log k n
2
x log x ∞ n log n n x2 log2 (x)
� �
1 1 1
= +o 2 = .
n log n n log (n) n log n
� � c 1 c
⇒n dF = n = · nc = .
|X|>n k>n
k2 log k n log n log n
In a similar way
� n
1 1 � ck 2 c
·n X 2 dF (x) = ∼ .
n2 |X| n k=3 k 2 log k log n
11
Conditions i) and ii) are satisfied with bn = n. Because of the symmetry of Fj , an = 0.
Sn
n
→ 0 in probability in spite of the fact that
� 2cn � 1
E|X1 | = = 2c =∞
n2 log n n log n
On the other hand,
c
P [|X1 | > n] ∼
n log n
c
⇒ P [|Xn | > n] ∼
n log n
�∞
⇒ P [|Xn | > n] = ∞
n=1
By Borel Cantelli
P [|Xn | > n i.o.] = 1
But
|Sn − Sn−1 | = |Xn | > n
n n
⇒ |Sn | > or |Sn−1 | > .
2 2
� n �
⇒ P |Sn | > i.o. = 1
2
Sn
⇒ � 0 a.s.
n
Thus we have convergence in probability but not convergence a.s.
WLLN but not SLLN.
12

Lecture8 Weak Law

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture8 Weak Law

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture 8

Limit Theorems & Law of Large Numbers

From this we have if Zn is any sequence of r.v.’s � E [Zn2 ] → 0, then Zn → 0 in proba-

Thus Zn → 0 in probability ⇒ Znk → 0 almost surely.

Thus E [Sn2 ] expands to n + (n(n − 1)) = n2 terms.

If each is bounded by a constant, then

In case of uncorrelated r.v.’s

Theorem: (Strong Law of Large Numbers): If {Xj } is a sequence of uncorrelated r.v.’s

However if we sum over n2

But Xn → 0 a.s. iﬀ ∀� > 0P [|Xn | > � i.o. ] = 0

Now for each n ≥ 1,

|Sk | ≤ |Sk − Sn2 + Sn2 | ≤ |Sk − Sn2 | + |Sn2 | ≤ |Sn2 | + Dn

|Sk | |Sn2 | + Dn |Sn2 | + Dn

Let ω ∈ [0, 1] and expand ω in decimal notation

Weaker form of the WLLN.

Theorem: If {Xn } and {Yn } are equivalent, then

P [Xn �= Yn i.o. ] = 0. ⇒ ∃ a null set N � if ω ∈ Ω \ N then ∃n0 (ω) � n ≥ n0 (ω) ⇒

consists of zeros from a certain point on. ⇒

Corollary: With probability 1,

converges, diverges to +∞ or −∞, or fluctuates in the same way as

Proof: Let F be the common distribution function

Because all Xn have the same distribution.

Define {Yn } by truncating at n, i.e.

⇒ Xn and Yn are equivalent.

Yn are pairwise independent, hence uncorrelated

choose 0 < an < n � an → ∞ but an = o(n)

The first term is O(nan ) → c · n · an

But Yn and Xn are equivalent

Let {bn } be a sequence of real numbers � bn � ∞ and

Moreover suppose ∃λ > 0 � Fn (0) ≥ λ ∀n and

i) and ii) must hold. (The converse left as exercise)

Proof: Fn (0) ≥ λ iﬀ P [Xn ≤ 0] ≥ λ

and Fn (0−) ≤ 1 − λ iﬀ P [Xn ≥ 0] ≥ λ.

Proof: For each n ≥ 1 and 1 ≤ j ≤ n

Condition ii) may be rewritten

Because {Ynj , 1 ≤ j ≤ n} are independent r.v.’s

⇒ Xj and Yn,j are equivalent.

WLLN but not SLLN.

Das könnte Ihnen auch gefallen