Sie sind auf Seite 1von 21

Central Limit Theorem and convergence to stable laws in Mallows distance

Oliver Johnson and Richard Samworth Statistical Laboratory, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, UK.

October 3, 2005

Running Title: CLT and stable convergence in Mallows distance Keywords: Central Limit Theorem, Mallows distance, probability metric, stable law, Wasserstein distance

Abstract

We give a new proof of the classical Central Limit Theorem, in the Mallows (L r -Wasserstein) distance. Our proof is elementary in the sense that it does not require complex analysis, but rather m akes use of a simple subadditive inequality related to this metric. T he key is to analyse the case where equality holds. We provide some results con- cerning rates of convergence. We also consider convergence to stable distributions, and obtain a bound on the rate of such convergence.

1 Introduction and main results

The spirit of the Central Limit Theorem, that normalised sums of indepen- dent random variables converge to a normal distribution, can be understood in different senses, according to the distance used. For example, in addition to the standard Central Limit Theorem in the sense of weak convergence, we mention the proofs in Prohorov (1952) of L 1 convergence of densities, in Gnedenko and Kolmogorov (1954) of L convergence of densities, in Barron

1

(1986) of convergence in relative entropy and in Shimizu (1975) and Johnson and Barron (2004) of convergence in Fisher information.

In this paper we consider the Central Limit Theorem with resp ect to the Mallows distance and prove convergence to stable laws in the infinite variance setting. We study the rates of convergence in both cases.

Definition 1.1 For any r > 0 , we define the Mallows r -distance between probability distribution functions F X and F Y as

d r ( F X , F Y ) = inf

(X,Y ) E| X Y | r 1/r

,

where the infimum is taken over pairs ( X, Y ) whose marginal distribution functions are F X and F Y respectively, and may be infinite. Where it causes no confusion, we write d r ( X, Y ) for d r ( F X , F Y ) .

Define F r to be the set of distribution functions F such that | x| r dF ( x) < . Bickel and Freedman (1981) show that for r 1, d r is a metric on F r . If r < 1, then d r r is a metric on F r . In considering stable convergence, we shall also be concerned with the case where the absolute r th moments are not finite.

Throughout the paper, we write Z µ,σ 2 for a N ( µ, σ 2 ) random variable, Z σ 2 for a N (0 , σ 2 ) random variable, and Φ µ,σ 2 and Φ σ 2 for their respective distribution functions. We establish the following main theorems:

Theorem 1.2 Let X 1 , X 2 ,

dom variables with mean zero and finite variance σ 2 > 0 , and let S n =

( X 1 +

be independent and identically distributed ran-

+ X n ) / n. Then

d 2 ( S n , Z σ 2 ) = 0 .

lim

n

Moreover, Theorem 3.2 shows that for any r 2, if d r ( X i , Z σ 2 ) < , then lim n d r ( S n , Z σ 2 ) = 0 . Theorem 1.2 implies the standard Central Limit Theorem in the sense of weak convergence (Bickel and Freedman 1981, Lemma 8.3).

Theorem 1.3 Fix α (0 , 2), and let X 1 , X 2 , variables (where EX i = 0 , if α > 1 ), and S n = (X 1 +

2

be independent random + X n ) /n 1/α . If there

exists an α -stable random variable Y such that sup i d β ( X i , Y ) < for some β ( α, 2], then lim n d β ( S n , Y ) = 0 . In fact

d β ( S n , Y ) 2 1/β

n

1

n

i=1

β ( X i , Y ) 1,

d β

so in the identically distributed case the rate of convergen ce is O ( n 1/β 1/α ) .

See also Rachev and R¨uschendorf (1992,1994), who obtain similar results using different techniques in the case of identically distributed X i and strictly symmetric Y . In Lemma 5.3 we exhibit a large class C K of distribution functions F X for which d β ( X, Y ) K , so the theorem can be applied.

Theorem 1.2 follows by understanding the subadditivity of d 2 2 ( S n , Z σ 2 ) (see Equation (4)). We consider the powers-of-two subsequence T k = S 2 k , and use R´enyi’s method, introduced in R´enyi (1961) to provide a proof of convergence to equilibrium of Markov chains; see also Kendall (1963). This technique was also used in Csisz´ar (1965) to show convergence to Haar mea- sure for convolutions of measures on compact groups, and in Shimizu (1975) to show convergence of Fisher information in the Central Limit Theorem. The method has four stages:

1. Consider independent and identically distributed random variables X 1 and X 2 with mean µ and variance σ 2 > 0, and write D ( X ) for d 2 2 ( X, Z µ,σ 2 ). In Proposition 2.4, we observe that

D X 1 + X 2

2

D ( X 1 ) ,

(1)

with equality if and only if X 1 , X 2 Z µ,σ 2 . Hence D ( T k ) is decreasing and bounded below, so converges to some D .

2. In Proposition 2.5, we use a compactness argument to show that there exists a strictly increasing sequence k r and a random variable T such that

Further,

D ( T k r ) = D ( T ) .

lim

r

r

→∞ D T k r + T = D T + T ,

k

r

2

2

D ( T k r +1 ) = lim

lim

r

where the T k r and T are independent copies of T k r and T respectively.

3

3. We combine these two results: since D ( T k r ) and D ( T k r +1 ) are both subsequences of the convergent subsequence D ( T k ), they must have a common limit. That is,

D

= D ( T ) = D T + T ,

2

so by the condition for equality in Proposition 2.4, we deduce that

T N (0 , σ 2 ) and D = 0.

4. Proposition 2.4 implies the standard subadditive relation

( m + n) D ( S m+ n ) mD ( S m ) + nD ( S n ) .

Now Theorem 6.6.1 of Hille (1948) implies that D ( S n ) converges to inf n D ( S n ) = 0.

The proof of Theorem 1.3 is given in Section 5.

2 Subadditivity of Mallows distance

The Mallows distance and related metrics originated with a transportation problem posed by Monge in 1781 (Rachev 1984, Dudley 1989, pp.329–330). Kantorovich generalised this problem, and considered the distance obtained by minimising Ec ( X, Y ), for a general metric c (known as the cost function), over all joint distributions of pairs (X, Y ) with fixed marginals. This distance is also known as the Wasserstein metric. Rachev (1984) reviews applications to differential geometry, infinite-dimensional linear programming and infor- mation theory, among many others. Mallows (1972) focused on the metric which we have called d 2 , while d 1 is sometimes called the Gini index.

In Lemma 2.3 below, we review the existence and uniqueness of the con- struction which attains the infimum in Definition 1.1, using the concept of a quasi-monotone function.

Definition 2.1 A function k : R 2 R induces a signed measure µ k on R 2 given by

µ k { ( x, x ] × ( y, y ]} = k ( x, y ) + k ( x , y ) k ( x, y ) k ( x , y ) .

We say that k is quasi-monotone if µ k is a non-negative measure.

4

The function k ( x, y ) = −| x y | r is quasi-monotone for r 1, and if r > 1 then the measure µ k is absolutely continuous, with a density which is positive Lebesgue almost everywhere. Tchen (1980, Corollary 2.1) gives the following result, a two-dimensional version of integration by parts.

Lemma 2.2 Let k ( x, y ) be a quasi-monotone function and let H 1 ( x, y ) and H 2 ( x, y ) be distribution functions with the same marginals, where H 1 ( x, y ) H 2 ( x, y ) for all x, y . Suppose there exists an H 1 - and H 2 - integrable function g ( x, y ) , bounded on compact sets, such that k ( x B , y B ) g ( x, y ) , where x B = ( B ) x B . Then

k ( x, y ) dH 2 ( x, y ) k ( x, y ) dH 1 ( x, y ) = H ( x, y ) H ( x, y ) k ( x, y ) .

2

1

Here H ( x, y ) = P ( X < x, Y < y ) , where ( X, Y ) have joint distribution function H i .

i

Lemma 2.3 For r 1 , consider the joint distribution of pairs ( X, Y ) where X and Y have fixed marginals F X and F Y , both in F r . Then

E| X Y | r E| X Y | r ,

(2)

where X = F

X

1

( U ) , Y = F

Y

1

( U ) and U U (0 , 1). For r > 1 , equality is

attained only if ( X, Y ) ( X , Y ) .

Proof Observe, as in Fr´echet (1951), that if the random variables X, Y have fixed marginals F X and F Y , then

P ( X x, Y y ) H + ( x, y ) ,

(3)

where H + ( x, y ) = min(F X ( x) , F Y ( y )). This bound is achieved by

taking

U U (0 , 1) and setting X = F

X

1

( U ) , Y = F

Y

1

( U ).

Thus, by Lemma 2.2, with k ( x, y ) = −| x y | r , for r 1, and taking

H 1 ( x, y )

= P ( X x, Y y ) and H 2 = H + , we deduce that

E| X Y | r E| X Y | r = {H + ( x, y ) H 1 ( x, y ) } k ( x, y ) 0 ,

so ( X , Y ) achieves the infimum in the definition of the Wasserstein distance.

5

Finally, since taking r > 1 implies that the measure µ k has a strictly pos- itive density with respect to Lebesgue measure, we can only have equality in (2) if P ( X x, Y y ) = min { F X ( x) , F Y ( y ) } Lebesgue almost everywhere. But the joint distribution function is right-continuous, so this condition de- termines the value of P ( X x, Y y ) everywhere.

the value of P ( X ≤ x, Y ≤ y ) everywhere. Using the construction

Using the construction in Lemma 2.3, Bickel and Freedman (1981) establish that if X 1 and X 2 are independent and Y 1 and Y 2 are independent, then

d 2 2 ( X 1 + X 2 , Y 1 + Y 2 ) d 2 2 ( X 1 , Y 1 ) + d 2 2 ( X 2 , Y 2 ) . (4)

Similar subadditive expressions arise in the proof of convergence of Fisher information in Johnson and Barron (2004). By focusing on the case r = 2 in Definition 1.1, and by using the theory of L 2 spaces and projections, we establish parallels with the Fisher information argument.

We prove Equation (4) below, and further consider the case of equality in this relation. Major (1978, p.504) gives an equivalent construction to

that given in Lemma 2.3. If F Y is a continuous distribution function, then

F Y ( Y ).

F Y ( Y ) U (0 , 1), so we generate Y F Y and take X = F

1

X

Recall that if EX = µ and Var X = σ 2 , we write D ( X ) for d 2 2 ( X, Z µ,σ 2 ).

Proposition 2.4 If X 1 , X 2 are independent, with finite variances σ , σ > 0 , then for any t (0 , 1),

1

2

2

2

D tX 1 + 1 tX 2 tD ( X 1 ) + (1 t) D ( X 2 ) ,

with equality if and only if X 1 and X 2 are normal.

Proof We consider bounding D ( X 1 + X 2 ) for independent X 1 and X 2 with mean zero, since the general result follows on translation and rescaling.

Φ σ ( Y ) =

We generate independent Y N (0 , σ ), and take X = F

i

i

i

X

i

2

1

2

i

i

2

2

h i ( Y ), say, for i = 1 , 2. Further, writing σ 2 = σ + σ , we define Y = Y + Y and set X = F + X 2 Φ σ 2 ( Y + Y ) = h( Y + Y ), say. Then

i

X

1

1

1

2

1

1

2

2

1

2

d 2 2 ( X 1 + X 2 , Y 1 + Y 2 ) =

E( X Y ) 2

E( X + X Y Y ) 2

1

2

1

2

= E( X Y ) 2 + E( X Y ) 2

= d 2 2 ( X 1 , Y 1 ) + d 2 2 ( X 2 , Y 2 ) .

1

1

2

2

6

Equality holds if and only if (X + X , Y + Y ) has the same distribution as ( X , Y ). By our construction of Y = Y + Y , this means that (X +

X

2 =

, Y + Y ) has the same distribution as (X , Y + Y ), so P { X + X

h( Y + Y ) } = P { X = h( Y + Y ) } = 1. Thus, if equality holds, then

1

2

2

1

2

1

2

1

1

2

1

1

2

1

2

2

1

h 1 ( Y ) + h 2 ( Y ) = h( Y + Y ) almost surely .

1

2

1

2

(5)

Brown (1982) and Johnson and Barron (2004), showed that equality holds in Equation (5) if and only if h, h 1 , h 2 are linear. In particular, Proposition 2.1 of (Johnson and Barron 2004) implies that there exist constants a i and b i such that

E{ h( Y + Y ) h 1 ( Y ) h 2 ( Y ) } 2 2 σ σ

1

2

1

2

2

2

( σ + σ ) 2

1

2

2

2

1

2

E{ h 1 ( Y ) a 1 Y b 1 } 2 + E{ h 2 ( Y ) a 2 Y b 2 } 2 .(6)

1

1

2

2

Hence, if Equation (5) holds, then h i ( u ) = a i u + b i almost everywhere. Since

Y

and X have the same mean and variance, it follows that a i = 1, b i = 0.

i

i

Hence h 1 ( u ) = h 2 ( u ) = u and X = Y

i

i

.

( u ) = h 2 ( u ) = u and X = Y ∗

+ X n ) / n is a normalised sum of

independent and identically distributed random variables of mean zero and finite variance σ 2 .

Recall that T k = S 2 k , where S n = (X 1 +

Proposition 2.5 There exists a strictly increasing sequence ( k r ) N and a random variable T such that

D ( T k r ) = D ( T ) .

lim

r

If T k r and T are independent copies of T k r and T respectively, then

r

→∞ D T k r + T = D T + T

k

r

2

2

D ( T k r +1 ) = lim

lim

r

.

Proof Since Var ( T k ) = 1 for all k , the sequence ( T k ) is tight. Therefore, by Prohorov’s theorem, there exists a strictly increasing sequence ( k r ) and a random variable T such that

T

k r

d

T

7

(7)

as r → ∞ . Moreover, the proof of Lemma 5.2 of Brown (1982) shows that the sequence ( T r ) is uniformly integrable. But this, combined with Equation (7) implies that lim r d 2 ( T k r , T ) = 0 (Bickel and Freedman 1981, Lemma 8.3(b)). Hence

2

k

D ( T k r ) = d 2 2 ( T k r , Z σ 2 )

{ d 2 ( T k r , T ) + d 2 ( T, Z σ 2 ) } 2 d 2 2 ( T, Z σ 2 ) = D ( T )

as r → ∞ . Similarly, d 2 2 ( T, Z σ 2 ) ≤ { d 2 ( T, T k r ) + d 2 ( T k r , Z σ 2 ) } 2 , yielding the opposite inequality. This proves the first part of the proposition.

d T + T as

r → ∞ , and E( T k r + T k r ) 2 E( T + T ) 2 , and then use the same argument as in the first part of the proposition.

For the second part, it suffices to observe that T k r + T

k

r

part, it suffices to observe that T k r + T k r Combining Propositions 2.4

Combining Propositions 2.4 and 2.5, as described in Section 1, the proof of Theorem 1.2 is now complete.

3 Convergence of d r for general r

The subadditive inequality (4) arises in part from a moment inequality; that is, if X 1 and X 2 are independent with mean zero, then E| X 1 + X 2 | r E| X 1 | r + E| X 2 | r , for r = 2. Similar results imply that for r 2, we have lim n d r ( S n , Z σ 2 ) = 0. First, we prove the following lemma:

Lemma 3.1 Consider independent random variables V 1 , V 2 ,

where for some r 2 and for all i, E| V i | r < and E| W i | r < . Then for any m, there exists a constant c ( r ) such that

and W 1 , W 2 ,

d r r ( V 1 +

c ( r )

+ V m , W 1 +

+ W m )

m

i=1

d r r ( V i , W i ) +

m

i=1

d 2 2 ( V i , W i ) r/2 .

8

.,

Proof We consider independent U i U (0 , 1), and set V = F

W

i

V

i

= F

1

W

( U i ). Then

1

( U i ) and

d r r ( V 1 +

E

+ V m , W 1 +

m

i=1

( V W )

i

i

r

+ W m )

c ( r )

m

i=1

E | V W | r +

i

i

m

i=1

| 2 r/2

E | V W

i

i

as required. This final line is an application of Rosenthal’s inequality (Petrov 1995, Theorem 2.9) to the sequence (V W ).

i

i

Theorem 2.9) to the sequence ( V − W ). i ∗ ∗ i Using Lemma

Using Lemma 3.1, we establish the following theorem.

Theorem 3.2 Let X 1 , X 2 ,

dom variables with mean zero, variance σ 2 > 0 and E| X 1 | r < for some

r 2 . If S n = (X 1 +

be independent and identically distributed ran-

+ X n ) / n, then

d r ( S n , Z σ 2 ) = 0 .

lim

n

Proof Theorem 1.2 covers the case of r = 2, so need only consider r > 2. We use a scaled version of Lemma 3.1 twice. First, we use V i = X i , W i N (0 , σ 2 ) and m = n, in order to deduce that, by monotonicity of the r -norms:

d r r ( S n , Z σ 2 )

c ( r ) n 1 r/2 d r r ( X 1 , Z σ 2 ) + d 2 2 ( X 1 , Z σ 2 ) r/2

c ( r ) n 1 r/2 + 1

d r r ( X 1 , Z σ 2 ) ,

so that d r r ( S n , Z σ 2 ) is uniformly bounded in n, by K , say. Then, for general

n,

Lemma 3.1, take

take m = n/N , and u = n ( m 1)N N . In

define N = n ,

V i

= X (i 1)N +1 +

V m = X (m 1)N +1 +

+ X iN , for i = 1 , + X n ,

, m 1

and W i N (0 , Nσ 2 ) for i = 1 ,

Now the uniform bound above gives, on rescaling,

, m 1, W m N (0 , uσ 2 ) independently.

d r r ( V i , W i ) = N r/2 d r r ( S N , Z σ 2 ) N r/2 K for i = 1 ,

m 1

9

and d r r ( V m , W m ) = u r/2 d r r ( S u , Z σ 2 ) N r/2 K . Further d 2 2 ( V i , W i ) = Nd 2 2 ( S N , Z σ 2 )

for i = 1 ,

using Lemma 3.1 again, we obtain

m 1 and d 2 2 ( V m , W m ) = ud 2 2 ( S u , Z σ 2 ) Nd 2 2 ( S 1 , Z σ 2 ). Hence,

d r r (S n , Z σ 2 )

=

1

n r/2 d r

r ( V 1 +

+ V m , W 1 +

+ W m )

c

( r )

n r/2

m

i=1

d r r ( V i , W i ) +

m

i=1

d 2 2 ( V i , W i ) r/2

c ( r ) mK N r/2

n

r/2

+

N ( m 1)

n

d 2 2 ( S N , Z σ 2 ) + N 2 ( S 1 , Z σ 2 ) r/2

n

d

2

c ( r )

mK

1) r/2 + d 2

2 ( S N , Z σ 2 ) +

( m

1

1 d 2 2 ( S 1 , Z σ 2 ) r/2 .

m

This converges to zero since lim n d 2 ( S N , Z σ 2 ) = 0.

n → ∞ d 2 ( S N , Z σ 2 ) = 0. 4

4 Strengthening subadditivity

Under certain conditions, we obtain a rate for the convergence in Theorem 1.2. Equation (1) shows that D ( T k ) is decreasing. Since D ( T k ) is bounded below, the difference sequence D ( T k ) D ( T k +1 ) converges to zero, As in Johnson and Barron (2004) we examine this difference sequence, to show that its convergence implies convergence of D ( T k ) to zero.

Further, in the spirit of Johnson and Barron (2004), we hope that if the difference sequence is small, then equality ‘nearly’ holds in Equation (5), and so the functions h, h 1 , h 2 are ‘nearly’ linear. This implies that if Cov (X, Y ) is close to its maximum, then X is be close to h( Y ) in the L 2 sense.

Following del Barrio, et al. (1999), we define a new distance quantity D ( X ) = inf m,s 2 d 2 2 ( X, Z m,s 2 ) . Notice that D ( X ) = 2 σ 2 2 σk 2 σ 2 , where

and Φ 1 are increasing

functions, so k 0 by Chebyshev’s rearrangement lemma. Using results of

del Barrio et al. (1999), it follows that

k = F

0

1

1

X

( x 1 ( x) dx. This follows since F

X

1

D ( X ) = σ 2 k 2 = D ( X ) D 4 ( σ X 2 ) 2 ,

10

and convergence of D ( S n ) to zero is equivalent to convergence of D ( S n ) to zero.

Proposition 4.1 Let X 1 and X 2 be independent and identically distributed random variables with mean µ , variance σ 2 > 0 and densities (with respect to Lebesgue measure). Defining g ( u ) = Φ µ,σ 2 F (X 1 + X 2 )/ 2 ( u ) , if the derivative g ( u ) c for all u then

1

D X 1 + X 2

2

1 c D ( X 1 ) + cD ( X 1 ) 2

2

8

σ 2

1 4 D ( X 1 ) .

c

Proof As before, translation invariance allows us to take EX i = 0. For random variables X, Y , we consider the difference term Equation (3) and

F X ( u ), and h( u ) = g 1 ( u ). The function k ( x, y ) = −{ x

h( y ) } 2 is quasi-monotone and induces the measure k ( x, y ) = 2 h ( y ) dxdy . Taking H 1 ( x, y ) = P ( X x, Y y ) and H 2 ( x, y ) = min { F X ( x) , F Y ( y ) } in Lemma 2.2 implies that

write g ( u ) = F

1

Y

E{ X h( Y ) } 2 = 2 h ( y ) {H 2 ( x, y ) H 1 ( x, y ) } dxdy,

since E{ X h( Y ) } 2 = 0. By assumption h ( y ) 1 /c , so

E{ X h( Y ) } 2 2 c { Cov (X , Y ) Cov (X, Y )) } .

Again take Y , Y independent N (0 , σ 2 )

1

2

h i ( Y ). Then define Y = Y + Y Then there exist a and b such that

i

1

2

and

and set X

i

1

= F

X

i

F Y i ( Y ) =

i

1

take X = F + X 2 F Y 1 + Y 2 ( Y ).

X

1

d 2 2 ( X 1 , Y 1 ) + d 2 2 ( X 2 , Y 2 ) d 2 2 ( X 1 + X 2 , Y 1 + Y 2 )

= E( X + X Y Y ) 2 E( X Y ) 2

1

2

1

2

= 2Cov (X , Y ) 2Cov (X + X , Y + Y )

c E{ X + X h( Y + Y ) } 2

= c E{ h 1 ( Y ) + h 2 ( Y ) h( Y + Y ) } 2

1

2

1

2

2

2

1

1

2

2

1

1

c E{ h 1 ( Y ) aY b} 2 cD ( X 1 ) ,

1

1

where the penultimate inequality follows by Equation (6). Recall that D ( X ) 2 σ 2 , so that D ( X ) = D ( X ) D ( X ) 2 / (4 σ 2 ) D ( X ) / 2. The result follows on rescaling.

( X ) = D ( X ) − D ( X ) 2 / (4

11

We briefly discuss the strength of the condition imposed. If X has mean zero, distribution function F X and continuous density f X , define the scale invariant quantity

1

C ( X ) = inf (Φ σ 2 F X ) ( u ) = inf

u

p

(0, 1)

f X ( F 1 ( p ))

X

( p )) = inf

p

(0,

φ σ 2

1

σ

2

1) σ f X ( F 1

X

(

p ))

φ 1 ( p )) .

We want to understand when C ( X ) > 0.

Example 4.2 If X U (0 , 1), then C ( X ) = 1 / 12 sup x φ ( x) = π/ 6 .

Lemma 4.3 If X has mean zero and variance σ 2 then C ( X ) 2 σ 2 / ( σ 2 + median( X ) 2 ) .

Proof By the Mean Value Inequality, for all p

|

so that

σ 2 + F

X

Φ

1

σ

2

( p ) | = | Φ

1

σ

2

( p ) Φ

1

σ

2

(1 / 2)| ≥ C ( X ) | F

X

1

( p ) F

X

1

(1 / 2) | ,

1

(1 / 2) 2 = 1 F

0

X

1

( p ) 2 dp + F

X

1

(1 / 2) 2 = 1 { F

0

X

C ( X ) 2 1

1

0

Φ

1

σ

2

( p ) 2 dp =

σ 2 C ( X ) 2 .

1

( p ) F

X

1

(1 / 2) } 2 dp

X ) 2 . 1 ( p ) − F − X 1 (1 / 2)

In general we are concerned with the rate at which f X ( x) 0 at the edges of the support.

Lemma 4.4 If for some > 0 ,

then

then

lim p 1 f X ( F

X

lim p 0 f X ( F

X

1

1

f X ( F

X

1

( p )) c (1 p ) 1 as p 1

( p )) 1 ( p )) = . Correspondingly if

f X ( F

X

1

( p )) cp 1 as p 0

( p )) 1 ( p )) = .

12

(8)

(9)

Proof Simply note that by the Mills ratio (Shorack and Wellner 1986, p.850) as x → ∞ , Φ( x) φ ( x) /x, so that as p 1, φ 1 ( p )) (1 p 1 ( p ) (1 p ) 2 log(1 p ).

1 ( p ) ∼ (1 − p ) − 2 log(1 − p ). Example

Example 4.5

1. The density of the n-fold convolution of U (0 , 1) random variables is

given by f X ( x) = x n 1 / ( n 1)! for 0 < x < 1 , hence F

and f X ( F

random variable, f X ( F

1

X

( p ) = (n!p ) 1/n ,

1

X

( p )) = n/ ( n!) 1/n p (n 1)/n , so that Equation (9) holds.

1

X

2. For an Exp(1)

( p )) = 1 p , so that Equation

(8) fails and C ( X ) = 0 .

To obtain bounds on D ( S n ) as n → ∞ , we need to control the sequence C ( S n ). Motivated by properties of the (seemingly related) Poincar´e constant, we conjecture that C (( X 1 + X 2 ) / 2) ≥ C ( X 1 ) for independent and identically distributed X i . If this is true and C ( X ) = c then C ( S n ) c for all n.

Assuming that C ( S n ) c for all n, note that D ( T k ) (1 c/ 4) k D ( X 1 ) (1 c/ 4) k (2 σ 2 ). Now

so

k =0 1 +

D ( T k +1 ) D ( T k )(1 c/ 2) 1 +

8 σ 2 (1 c/ 2) ,

cD

(

T

k