Beruflich Dokumente
Kultur Dokumente
of a sequence of r.v.’s. In the classical formulation, the weak law or the strong law of large
numbers is said to hold depending on whether
Sn − E[sn ]
→0
n
in probability or almost surely. This supposes E[Sn ] < ∞. A generalization is
S n − an
→0
bn
where {an } is a sequence of real numbers diverging to ∞ and {bn } is a sequence of positive
numbers diverging to ∞.
Recall we showed
Theorem: If Xn converges to 0 in Lp , then it coverges to 0 in probability. Conversely if {Xn }
is dominated by Y and Y is in Lp , then if Xn converges to 0 in probability then Xn → 0 in Lp .
Also recall:
Theorem: If Xn → X in probability, then ∃ a subsequence nk � Xnk → X almost surely.
Note: � �2 � n �
�n � �
� 2�
E Sn = E Xj = E Xj2 + 2 Xj Xk
j=1 j=1 1≤j<k≤n
1
n
� �
� �
= E Xj2 + 2 E [Xj Xk ]
j=1 j<k
n
� � �
Note that E Xj2 has n terms
j=1
�
and E [Xj Xk ] has n(n − 1) terms
j<k
Thus this is not adequate to develop even a weak law. The problem is with the mixed terms.
If we could show either they don’t exist (i.e. =0) or that they decay quickly in some sense,
then we would be Ok.
Definition: Two r.v.’s X and Y are uncorrelated if and only if both have finite second
moments and
E[XY ] = E[X]E[Y ].
X and Y are orthogonal iff E[XY ] = 0.
In general r.v.’s in any family are uncorrelated [orthogonal] iff every pair of them are.
Consider
E [[X − E[X]][Y − E[Y ]]]
= E[X · Y ] − E[X] · E[Y ] + E[X]E[Y ] − E[X]E[Y ]
= E[X · Y ] − E[X]E[Y ].
Also note that pairwise independence implies uncorrelatedness, but not vice versa (find
a counterexample!)
If {Xn } is a sequence of uncorrelated r.v.’s, then the sequence {Xn − E[Xn ]} is orthogo-
nal
In this case
σ 2 (Sn ) = E[Sn − E[Sn ]]2
2
n
� �
= E[Xj − E[Xj ]]2 + 2 E[Xj − E[Xj ]][Xk − E[Xk ]]
j=1 j<k
n
� n
�
2
= E[Xj − E[Xj ]] = σ 2 (Xj ).
j=1 j=1
Because there are only n terms on the r.h.s., if each is bounded by a fixed constant, then
σ 2 (Sn ) = O(n) = o(n2 ).
Thus we have,
� �
Theorem: If {Xn } is a sequence of uncorrelated random variables and E Xj2 < c for
some common upper bound, then
Sn − E(Sn )
→ 0 in probability.
n
This is the elementary form of the Weak Law of Large Numbers. (WLLN)
Proof: Without loss of generality we may assume E[Xj ] = 0. (If not replace Xj by
Xj − E[Xj ])
By assumption
E[Sn2 ] ≤ cn
By Chebyshev’s Inequality
cn c
P [|Sn | > n�] ≤ 2 2
= 2.
n� n�
If we sum over n,
�∞
c
2
diverges.
n=1
n�
3
⇒ by Borel-Cantelli
P [|Sn2 | > n2 � i.o.] = 0.
(n+1) 2
� 2� �� �� �
E Dn ≤ 2nE �S(n+1)2 − Sn2 � = 2n σ 2 (Xj ) ≤ 4n2 c.
j=n2 +1
By Chebyshev’s Inequality
� � 4n2 c 4c
P Dn > n 2 � ≤ 2 4 = 2 2
�n �n
⇒ P [Sn > n2 � i.o. ] = 0.
Dn
⇒ → 0 a.s.
n2
Borel’s SLLN.
ω = .x1 x2 x3 · · · xn · · ·
4
Note that examples such as
.314159999 · · · and .314160000 · · ·
are the same number.
Except for such cases, the decimal expansion is unique and these are only a countable
number of such cases.
(n)
Fix 0 ≤ k ≤ 9 and let νk (ω) = # of digits in the first n of ω equal to k.
(n)
νk (ω)
is the relative frequency of k.
n
If
(n)
νk (ω) n→∞
−→ ϕk (ω) exists.
n
then
ϕk (ω) is the frequency of k in ω
1
If ϕk (ω) = 10
for every k = 0, 1, · · · , 9, then ω is simply normal.
Borel SLLN: Except for a Borel set of measure 0, every number in [0,1] is simply nor-
mal.
Proof: Exercise: Hint see pages 60 and 110 of Kai Lai Chung’s book, A course in Prob-
ability Theory.
Definition: Two sequences of r.v.’s {Xn } and {Yn } are said to be equivalent iff
∞
�
P [Xn �= Yn ] < ∞
n=1
Furthermore if an � ∞, then
n
1 �
(Xj − Yj ) → 0 a.s.
an j=1
5
Proof: By Borel-Cantelli,
∞
�
P [Xn �= Yn ] < ∞ implies
n=1
This implies
∞
�
(Xn (ω) − Yn (ω))
n=1
Indeed let ∞
�
(Xn (ω) − Yn (ω)) = Z(ω)∀ω ∈ Ω \ N
n=1
� �
�1 � n � 1
� �
⇒� (Xn (ω) − Yn (ω))� ≤ Z(ω)∀n ≥ n0 (ω), ω ∈ Ω \ N.
� an � an
j=1
n
1 �
⇒ (Xn (ω) − Yn (ω)) → 0∀ω ∈ Ω \ N.
an j=1
respectively. In particular,
n
1 �
Xj converges in probability iff
an j=1
6
n
1 �
Yj converges in probability.
an j=1
Proof: If n
1 �
Xj → X in probability, then
an j=1
n n n
1 � 1 � 1 �
Yj = (Yj − Xj ) + Xj → 0 + X = X in probability
an j=1 an j=1 an j=1
Khintchine’s WLLN: Let {Xn } be pairwise independent and identically distributed (iid)
with finite mean µ.
Then
Sn
→ µ in probability
n
∞
� ∞
� ∞
�
P (Xn �= Yn ) = P [|Xj | > j] ≤ P [|Xj | > j] < ∞
n=1 j=n j=1
7
Tn
Show n
→ µ in probability.
� � � �
≤ an |X|dF + an |X|dF
j≤an |x|≤an an <j≤n |x|≤an
� �
+ n |X|dF
an <j≤n an <|x|≤n
� ∞ �
2
≤ nan |X|dF + n |X|dF
−∞ |x|>an
⇒ o(nan ) = o(n2 )
But �
|X|dF → 0 as n → ∞
|x|>an
�
⇒ |X|dF = o(1)
|x|>an
n2 o(1) = o(n2 ).
⇒ σ 2 (Tn ) = o(n2 )
n
Tn − ETn 1�
⇒ = {Yj − E[Yj ]} → 0 in probability
n n j=1
8
But E[Yj ] → E[X] = µ ⇒
n
1�
E[Yj ] → µ
n j=1
n
Tn 1�
⇒ = Yj → µ in probability
n n j=1
Theorem: Let {Xn } be a sequence of independent r.v.’s with distribution functions {Fn }.
Let n
�
Sn = Xj .
j=1
n �
1 �
ii) 2 X 2 dFj (x) = o(1)
bn j=1 |X|>b
Then, if
n �
�
an = XdFj (x)
j=1 |x|≤bn
Sn −an
we have bn
→ 0 in probability.
Sn −an
then if bn
→ 0 in probability
9
These conditions assume that none of the distributions is too far off center, if λ = 12 , then 0
is the median of the Fj .
Condition i) becomes
n
�
P [Yn,y �= Xj ] = o(1)
j=1
� n
� n
� �
P [Tn �= Sn ] ≤ P (Yn,j �= Xj ) ≤ P [Yn,j �= Xj ] = o(1)
j=1 j=1
Tn −E(Tn )
⇒ bn
→ 0 in probability
By construction,
∞
�
P [Xj �= Yn,j ] = o(1)
j=1
∞
�
⇒ P [Xj �= Yn,j ] → 0 as n → ∞
j=1
Finally
n
� n �
�
E[Tn ] = E[Yn,j ] = XdFj (x) = an
j=1 j=1 |x|≤bn
10
Example: Let {Xn } be independent random variables with common distribution function
F � c
P [X1 = n] = P [X1 = −n] = 2 , n = 3, 4, · · ·
n log n
with
1
c = �∞ 1
2 n=3 n2 log n
∞
� ∞
� c
P [X1 = n] = 2
n=−∞ n=3
n2 log n
n�=−2,−1,0,1,2
∞
� �
� 1 1
=2 �∞ 1 = 1.
n=3
n2 log n 2 n=3 n2 log n
� � c
n dF = n
|x|>n k>n
k2 log k
But �
� 1 ∞
1
∼
= dx
2
k log k x2 log x
k>n n
Integrate by parts
1 dx −1
f (x) = log x
, g � (x) = x2
, g(x) = − x1 , f � (x) = x log2 (x)
� �
f (x)g (x)dx = f (x)g(x) − f � (x)g(x)dx
�
� ∞ �∞ � ∞ � �
1 1 1 −1 1 −1
· 2 dx = · − dx
n log x x log x x n n x x
� � ∞ � ∞
1 1 −1 1 −1
= dx = + + dx
k>n
2
k log k n
2
x log x ∞ n log n n x2 log2 (x)
� �
1 1 1
= +o 2 = .
n log n n log (n) n log n
� � c 1 c
⇒n dF = n = · nc = .
|X|>n k>n
k2 log k n log n log n
In a similar way
� n
1 1 � ck 2 c
·n X 2 dF (x) = ∼ .
n2 |X| n k=3 k 2 log k log n
11
Conditions i) and ii) are satisfied with bn = n. Because of the symmetry of Fj , an = 0.
Sn
n
→ 0 in probability in spite of the fact that
� 2cn � 1
E|X1 | = = 2c =∞
n2 log n n log n
On the other hand,
c
P [|X1 | > n] ∼
n log n
c
⇒ P [|Xn | > n] ∼
n log n
�∞
⇒ P [|Xn | > n] = ∞
n=1
By Borel Cantelli
P [|Xn | > n i.o.] = 1
But
|Sn − Sn−1 | = |Xn | > n
n n
⇒ |Sn | > or |Sn−1 | > .
2 2
� n �
⇒ P |Sn | > i.o. = 1
2
Sn
⇒ � 0 a.s.
n
Thus we have convergence in probability but not convergence a.s.
12