1 ('~
Bootstrap Methodology
Gutti Jogesh Babu* and C. Radhakrishna Rao
1. Introduction
Let X = ( X 1 , . . . ,An) be an iid sample from a distribution F which is unknown. Further let Tn(X ) be an estimator of a parameter T(F), defined on the space of distribution functions. The sampling distribution (say G) of Tn(X ) T(F) is needed for drawing inference on T(F). This is usually complicated. Efron (1979) suggested a general method known as bootstrap of estimating G using the Monte Carlo simulation concept. The general idea is as follows. First, F is estimated by Fn using the observed X. This may be the empirical distribution function based on S or a smoothed version of it. Or, if F = F(O) depending on a parameter 0, then Pn = F(0n) where 0n is an estimate of 0 based on X. Let X * = ( X ~ , . . . , Xn*) be an iid sample from Fn and Tn(X*) be the value of T n at X*. It is shown that under certain conditions, the difference P(Tn(X ) 
(1.1)
is small uniformly in x in large samples. Note that the first term in (1.1) is the distribution function (say G) of Tn(X)  T(F) under sampling from the unknown F, and the second term is the distribution function say (Gb) of Tn(X*) T(Pn) under sampling from Fn, which is completely known once X is observed. G b is called the bootstrap estimate of G. It is suggested that G b can be used in the place of G for drawing inference on the parameter T(F). It may not be easy to obtain a closed form expression for Gb, the distribution of Tn(X*) T(Pn) under sampling from Pn. Efron suggested a simulation technique for this purpose. Denote by X*(1), X * ( 2 ) , . . . , X*(N), N independent iid samples of size n drawn from Pn. Then the distribution function G b can
Research sponsored by the Air Force Office of Scientific Research under Grant AFOSR910242 and the U.S. Army Research Office under Grant DAAL0389K0139 in the SDIO/IST program. * Research supported in part by NSF grant DMS9007717. 627
628
be estimated by the histogram formed from the estimates T.(X*(1))  T(Pn) . . . . , Tn(X*(N)) T(Pn) . In the above discussion a random variable of the form T,,(X)  T(F) which is a function of the sample X and the underlying distribution function F was considered. A random variable Tn(X; F), more general than T , ( X )  T(F), depending on X and F may have to be considered in statistical inference. The problem in such a case is to find the distribution function of T,,(X;F) under sampling from F. This is estimated by the distribution function of Tn(X*; Fn) under sampling from the known P~. Throughout this paper, the notation Tn(X; F) is used for general discussion to represent an estimator which may depend only on X or a more general random variable depending on X and F. The bootstrap technique can also be used in situations where the observed data are not iid from a common distribution function F, as in the regression problem. In such a case, it is necessary to estimate the entire probability mechanism which gave rise to the observations. Resampling is done from the estimated model for the observations (see Efron and Tibshirani, 1986).
2. Consistency
Bickel and Freedman (1981) and Singh (1981) have shown that, in the iid case for sample means and sample quantiles, the bootstrap distribution consistently estimates the sampling distribution. That is, the difference between the sampling distribution and the bootstrap distribution goes to zero uniformly, for almost all sample sequences, as the sample size increases. These results are extended to a wide class of statistics in Babu and Singh (1983, 1984b). The details are given in the next section. Consistency of the bootstra p distribution for Lstatistics is shown by Babu and Singh (1984a). If the statistics are not uniform in some sense, the bootstrap might fail to be consistent. Examples of Ustatistics and extreme value statistics, where the bootstrap method fails to approximate the sampling distribution are given in Bickel and Freedman (1981). Blind application of the bootstrap method could lead to disastrous results. This can be avoided in some cases by modifying the bootstrap statistic. Suppose Tn(Z; F)= n ( H ( Z )  H(ix)), where 2 denotes the mean of n iid random vectors, Z = (Z 1, . . . , Zn) with mean/x, and the function H is smooth with the vector l(/x) of first order partial derivatives at ~ vanishing. If the matrix of second order derivatives of H at /x is nonnull, then Tn(Z; F) is asymptotically like a linear combination of chisquares. Babu (1984) has shown that the bootstrap distribution, in this case, does not converge to any
Bootstrap methodology
distribution. However, a slight modification of the T~(Z*;/~n) would help. If, for example,
t* = T . ( Z * ; F . )  q ( 2 ) ) ' ( 2 "  2),
then P(T,(Z; F) ~<x)  P*(t* ~<x) = O(nl/2(log log n) ~/2) a.s. ; here F n denotes the empirical distribution and P* denotes the probability induced by the bootstrap mechanism. Even though l(Z) is very small, it has large influence on the distribution. It seems to matter, whether we consider Tn(Z; F)  T(F) = n(H(Z)  H(tz)) or Tnl(Z; F) = nHI(Z /z), where H 1 is a smooth function whose first order derivatives vanish at the null vector. The anomaly exhibited above seems to vanish if Tna(X;F ) and TnI(Z*;Fn) = h i l l ( Z *  Z,) are considered instead of T,(Z; F) and Tn(Z*; Fn). What is the measure P*? Clearly there are n n ways of choosing n numbers, ( i l , . . . ,in), with replacement from ( 1 , . . . ,n). If the given sample is X I , . . . , X, and the resampled sequence is ( i l , . . . , in), then ( X ] , . . . , X~*)= ( X q , . . . , X~,). So the measure P* is given by P*((X~,...,X*) = (Yl,.", Yn)) = nnN',
where N' is the numbers of ntuples ( i l , . . . , in) such that ( X i l , . . . , X i , ) = (Ya, , Yn). For example, if n = 2 and X 1 X 2, then the possible values for (X1, X2) are (X 1, X~), (X1, X2), 0(2, X1), (X 2, Xz), each occurring with equal probability. Consequently, P*(X* = XI) = P*(X* = X2) = 1 and P*(X* = X) = 1. This gives the bootstrap distribution of the sample mean, when the sample size is 2. If E(X 2) = 0% then Babu (1984) has given an example, where bootstrap distribution of the sample mean does not converge to any probability distribution. Athreya (1987) has investigated this phenomenon further for distributions belonging to the domain of attraction of a stable law. He has shown that the bootstrap distribution of the sample mean when properly standardized, given the sample, converges to a random distribution. It means, the limiting probability distribution /&~ depends on the original sample sequence w = (XI(w), X 2 ( w ) , . . . ) , and P(to:/z~ =/z0) 1 for any probability measure ~z 0. Liu, Singh and Lo (1989) used a representation technique to show that compact differentiability of the statistic at the population distribution function suffices for the consistency of the bootstrap estimate of the sampling distribution. (The sample median is compact differentiable, but not Fr6chet (see Reeds, 1976) differentiable.)
630
3. Accuracy of the bootstrap methods Singh (1981) investigated the convergence rates of the bootstrap distribution for the sample mean. Let { X ~ , . . . , X , } be iid random variables from a distribution F. Let {X 1 . . . . , X*} be a simple random sample with replacement from X 1. . . . , X n. THEOREM 3.1. S u p p o s e F is a univariate nonlattice distribution with m e a n v, s t a n d a r d deviation o', a n d has finite third m o m e n t . T h e n f o r a l m o s t all s a m p l e sequences Xl,... , X n , as n >,
Vn >0. The classical B e r r y  E s s e e n theorem gives only s u p [ P ( ( ) (  ,)V~ <~xer)  (x)[ = O(n a/2),
x
which cannot be improved, where ~b denotes the standard normal distribution function. On the other hand by T h e o r e m 3.1, the bootstrap provides a better approximation than the classical Gaussian approximation. Theorem 3.1 was generalized by Babu and Singh (1983, and 1984b) for a wide class of statistics, which can be written as smooth functions of the multivariate mean. That is
T , ( Z ; F ) = Vn(H(Z)  H(/z))
or
V~HI(2
it),
where H and H 1 are smooth functions and Z = (Z1 . . . . . Zn) are iid r a n d o m vectors from F with m e a n / ~ and dispersion Z. For example, Studentt can be written as
= V~(H(X
where Z i = ( X i  v, ( X i  , ) 2 ) and H ( u , v) = u(v  u2) 1/2. Many scale free statistics like standardized or studentized statistics can be written as
Tn(Z; F ) = x , ~ ( H ( 2 )  H ( t ~ ) ) ,
with (l(tl))'.Zl(i.t)= 1, where l denotes the vector of first order partial derivatives at/~ of H. Let Pn be an estimate of F based on Z 1 , . . . , Zn and let Z ~ , . . . , Z * be iid random vectors from P,. The following theorem is established in Babu and Singh (1984b).
Bootstrap methodology
631
THEOREM 3.2. Suppose F is strongly nonlattice, that is If eit'x dF(x)[ : 1 for t # O. Let l(tz) # O, P, converge weakly to F and ~. Ilxll 3 f IIxII3 dE(x) < for almost all samples. Then suplP{T~(Z; F) ~ x }  P*{T~(Z*;
x
P.) ~xX/l(tt~)'~fl(l~))l~O
for almost all samples, where ~ and Y3n denote the mean and dispersion of P..
If Pn is the empirical distribution and f ]]x[I3 dF(x) < 0% then the convergence assumptions of the theorem are automatically satisfied. Strongly nonlatticeness in the plane means that the distribution is not concentrated on countably many parallel lines. This condition holds in the case of Studentt statistic if X 1 has a continuous component. The estimating distribution can be the empirical distribution, or its symmetrized version, or a smoothed version, or a parametric estimate of F. As far as the second order accuracy is concerned, there is no difference, whatever be the estimator P~ as long as it is close to F. Smoothing has no advantage. REMARK 3.1. The class of statistics for which the above result is applicable includes sample mean, sample variance, central and noncentral t, sample coefficient of variation, regression and correlation coefficients, ratio estimators and smooth transforms of these. If the scale parameter is known, then the distribution of standardized statistics can be approximated by their bootstrap versions. It is important to note that/the bootstrapped scale should be used in the bootstrap version. For example, in the case of sample mean .~ of an iid sample X 1 , . . . , X n with variance ~r2, the standardized random variable is x/~(X/z)/tr. The sampling distribution of x ~ ( ] ( '  ~ ) / o  is closer to the distribution of x/'g(X* ff)/s n than that of x ~ ( X *  X)/o', where )(* is the mean of an iid sample from the empirical distribution of X 1. . . . , X n and 2 n sn = ( l / n ) E 1 (X i _ )~)2. In the absence of knowledge of the scale parameter, studentization helps. Beran (1982) remarks that the normal approximations of the sampling distribution is biased and that the bootstrap distribution is, in general, the normal approximation plus the natural estimate of the bias. The result of Babu and Singh (1984b) demonstrates that for studentized statistics, bootstrap automatically corrects for this bias (also called skewness of the sampling distribution). Helmers (1991) obtained similar results for studentized Ustatistics. The basic technique to prove such results is using the Edgeworth expansions for the distributions of Tn(Z; F) and Tn(Z*; Pn) and matching the leading terms. See Babu (1989). If E[Xll 3= 0% then Hall (1988a) has shown that the approximation of the sampling distribution of the sample mean by the bootstrap distribution is equivalent to or even worse than the normal approximation. Under some
632
conditions on the tail of the population distribution, (1988a) has shown that
if
lirn={IP(T. ~<x)  P*(T* <~x)l/lP(Tn <~x)  q~(x)l)  1 a.s. for all but finite number of values of x, where
T n : ~vn(2
E(/1))/~1),
1
T*n
(xi2)
and the bootstrap samples are drawn from the empirical distribution F,. He has also given examples, where the above limit is +~. For nonstandardized statistics, a total 1 / ~  t e r m correction cannot be expected as shown by Singh (1981). Liu and Singh (1987) have shown that, in terms of the asymptotic mean square error, the bootstrap estimator of H,(x) = P(x/fi(T,(X;F)T(F))<x) is superior to the normal estimator ~(X/SF, ), where S 2 is an estimate of the variance SzF of T,(X; F). This is called partial 1/x/fiter~n correction by the bootstrap. More specifically, suppose the expansions
Ho,B(x) =
+ o(n ''2)
hold a.e., where H,, B denotes the bootstrap distribution of T,(X; F)  T(F) and ~b denotes the standard normal density. If v~(S~,  S~) has asymptotically normal distribution with variance a 2 and P(x, Fn)~p(x, F) in probability, then
Bootstrap methodology
633
distribution at x than the two term empirical Edgeworth expansion at x. In particular, under Cram6r's condition on F and existence of large order moments, they have shown that for p > 1,
and t* is the corresponding bootstrap statistic. Bose and Babu (1991) considered the estimates of deviation of the bootstrap distribution from the sampling distribution. This leads to a better understanding of the asymptotics of bootstrap coverage probabilities and bootstrap confidence sets. Babu and Singh (1989) and Babu (1991a) considered Edgeworth expansions when some of the coordinates of the multivariate random vector Z 1 are lattice variables. These results are used to obtain bootstrap approximations of robust statistics like
i=1
4. Bootstrap for nonsmooth functions The choice of an estimator/~n of F, as a resampling distribution has very little effect on the consistency and second order accuracy of the bootstrap approximation for smooth functions of the means. However, for nonsmooth functions, resampling from /~n other than the empirical distribution may lead to better results. Quantiles and mode are such functions. Singh (1981) and Falk (1986) discussed the accuracy of the bootstrap approximation for the distribution of the quantiles. Babu and Singh (1984a) obtained strong representations of bootstrap quantiles and Lstatistics. Using these one can establish the following theorem. For any distribution G and O < t < l , let G l(t) = inf(x: G(x) >! t} denote the tth quantile. Let X 1 , . . . , X n be a sample from a distribution F and X~ . . . . , X* be a sample from the empirical distribution F n. Let F* denote the empirical distribution of X~ . . . . . X*.
634
THEOREM 4.1. Let F have a bounded second derivative in a neighborhood of F  l ( t ) and the density of F at F  l ( t ) be positive. Then sup JP(x/n(Fnl(t)  Fl(t)) <~x)  P*(xZB(F*~(t)  F n 1(0 ) ~<X)J
x
= O(n1/4(log n)1/2). Falk and Reiss (1989) compared the weak convergence rates for the smoothed and nonsmoothed bootstrap quantile estimates. If the bootstrap samples Z ~ , . . . , Z * are drawn from a kernel estimate F,'(x)= I K((x  y ) w n  1 )dFn(y), then it is called smoothed bootstrap, where K is a smooth kernel and w~ > 0 is the window width. Let
Z,'(t,q)=
Z,'(t,q)=
{ n(F,"
(q))F,
(q))<~t}
 P { x / B ( F : I ( q )  F l(q)) ~< t } , where /5 denotes distribution induced by the smoothed bootstrap resampling scheme and F , denote the empirical distribution of Z 1 , . . . , Zn. Falk and Reiss (1989) have shown that sup Iz,'(t,
t
q)J = Op(n'4)
and
under some regularity conditions. This shows that smoothing helps in this case. Babu (1986c) analyzed the moments of the bootstrap estimate of the variance. See also Ghosh et al. (1984). For 0 < p < 1, the bootstrap estimate of the variance op2,,"= v a r ( F ~ l ( p ) ) is given by
2 =r Op,,,
(:)f0
(4.1)
an integer. ,x (p)  F n 1 (p)). Babu where np <~r < np + 1 is Let B," p = x/~(F, (1986c) has shown that for almost all samples E*JBn,pJ k= O(1), for all k > 0, whenever E(log(1 + JXlJ)) < ~. The result holds without any assumption on the moments of X 1. This is the best condition on the tails of the distribution; since for almost all samples lim sup E , JB,',pJk = co,
n>oo
for all k > 0, if E(log(1 + JXlJ)/log log(3 + JXlJ)) = o~. As a consequence, the bootstrap estimator of the variance of a sample quantile is consistent provided E(log(1 + JXll)) < oo.
Bootstrap methodology
635
^2
2 = i 31 Op(/'/1/4) ,
which cannot be improved. Hall and Martin (1988) suggest using density estimation techniques and the estimate
2 nop,, =p(1 p)(f(Fl(p))) 2 + O(n1) ,
which holds if the density f of F at F~(p) has a bounded derivative. Babu (1986a,b) obtained the estimate (f.(p))i of the density quantile function (f(vl(p))) 1. If~.2 o  p , = n  l p ( 1  p ) ( f , ( p ) ) *, then Theorem 1 of Babu (1986b) leads to n16p2,,  Op2,I = p ( 1 _ p)[(f~(p))2 _ (f(F~(p)))21
= O ( n  ( k  1)/(2/~ 1))
by an explicit choice of the kernel h~ depending only on the number k of derivatives of F at F~(p). This in a sense is using kernel estimate in the defining equation (4.1). The result, in a slightly different form, is given in Hall, DiCiccio and Romano (1989). The density quantile estimation mentioned above also holds in the case of dependent random variables (like mixing) as shown in Babu (1986b). If the parameter of interest is a functional of the density function, the nonsmoothed bootstrap may not be consistent. Romano (1988b) discussed this for the mode. Let X 1 , . . . , X n be iid random variables from a distribution F with density f. Let
K((t
 Xi)/w.)
and the sample mode On,w = M(fn,w ). Let Pn be the estimate of F induced by J*,,w. If w, is chosen by crossvalidation method and if nw2+ d, then under some regularity conditions, the bootstrap estimator of the distribution of sample mode is inconsistent. But if the resampling is done from a smoothed Fn (for example, if nw2/logn+~), then the bootstrap estimator is consistent. The bootstrap estimator is also inconsistent if the bootstrap samples are drawn from the empirical distribution. Silverman and Young (1987) considered the effect of smoothing on the mean square error of the bootstrap estimators. The smoothing parameters in L6ger and Romano (1990) are chosen to minimize an empirical bootstrap estimate of risk.
5. Computation of the bootstrap distribution Computation of the theoretical distribution of T.(X*; F) is very difficult in some cases. Some approximation methods are as follows. One method is to use
636
G . J. B a b u
and C. R. Rao
a theoretical approximation and estimate the skewness by computing the first three cumulants. This is equivalent to using empirical one term Edgeworth expansion. Davison and Hinkley (1988) suggest using saddlepoint approximation, which is equivalent to the so called exponential tilting or using conjugate distributions and replacing the population cumulant generating function by the empirical one. This method requires existence of the moment generating function, which is too stringent an assumption. The method also requires inverting the empirical cumulant generating function. The most often used method is the Monte Carlo approximation. That is, resampling B times from Pn and using the histogram of the B quantities computed from the B resamples to approximate the bootstrap distribution. Babu and Singh (1983) proved that if B ~>n log n, then the second order accuracy is still maintained. Efron (1987) and Hall (1986b) gave some suggestions for the choice of B. For nonsmoothed functionals, taking the bootstrap sample size m # n, may have some benefits. Swanepoel (1986) proved that for an iid sample X 1 , . . . , X n from the uniform distribution on (0, 0), if m such that A n " < m <(e+l)/2 1/2 . Cn (log n) is the resample size, then the bootstrap is consistent for T n =n(Omaxl<_i<_nXi)/O, where A, C > 0 and 0 < e < l . If r e = n , then the bootstrap for Tn is not consistent. Balanced sampling (see Davison, Hinkley and Schectman, 1987) and importance sampling (see Gleason, 1988, and Johns, 1988) techniques can be used to approximate the bootstrap distribution. Bickel and Yahav (1988) use the Richardson extrapolation technique of numerical analysis to reduce the computation cost for bootstrap Monte Carlo simulation. For the results on balanced importance resampling, see Booth, Hall and Wood (1991).
Bootstrap methodology
637
Efron (1982) considered the former. Schuster and Bucker (1987) gave an example for the latter. Let the hypothesis to be tested be H0: F is symmetric about some center 0 , against Ha: F is not symmetric, based on iid observations X1, X e , . . . , X n from F, where 0 and F are unknown. Suppose that 0n is an estimate of 0 and T n is a test statistic. For most of the commonly used statistics the sampling distribution involves the unknown parameter 0 and F. In such cases one needs to use the bootstrap m e t h o d to find the critical point of the test. Since F is symmetric under the null hypothesis, resampling from the empirical distribution F n is not suitable. Schuster (1987) found a symmetric distribution which is the closest to F n in sup norm: Gn(X)={Fn(X)+F*~(X)}/2, where F* is the empirical distribution of 20n  X 1 , . . . , 20n  Xn, and 0n  (re(L) + M(L)) is the SchusterNarvarte (1973) estimate of 0. H e r e L = min{k: re(k) <~M(k)}, where
1)/2], j = n  k +
1i},
It can be shown that Ln 1= h,,(On)= SUpxlFn(x) 1 + Fn((20n  x ) _ ) l . Select Tn to be hn(On) and reject H 0 if L I> l. Let a(l, F ) = PF(L >1l [H o). T h e n l can be obtained by the following method: For b = l , . . . , N , let L b = L(Gn, X b ~ , . . . , X ~ ) , where X b l , . . . , x b n are iid from Gn, and let a B ( l ) = #{Lb>~l}/N be an estimate of a(l,F). Then a~l(a) is such that P(L>~ o~l(a) lHo)~oz, which is a critical value with an asymptotic level a. If a critical point [ is already obtained by other methods, then the bootstrap can be used to estimate the p value of the test by a s ( [ ). The simulation results show that this test performs well. Theoretically, the bootstrap estimate of the p value is not that reliable. In general the p values tend to zero exponentially as the alternative parameters are away from the null parameters, whereas bootstrap approximation involves errors of the order o(nl/2). The statistic T~ should also be chosen carefully. For example, suppose that X 1, X 2, . . . , X~ are iid from a population F, with m e a n / z and known variance 02. To test the null hypothesis H0:/x =/x o against H I : / z ~/x0, the test statistic can be chosen as T~ = v ~ ( 2 "  i%) with the rule of rejecting H 0 when ILl i> t0. Bunke and Riemer (1983) suggested the use of the distribution of Tn* = x/B(X* X ) given X 1 , . . . , X~, to approximate the distribution of Tn, where X ~ , . . . , X*n are iid from the empirical distribution function F~ of X 1 , . . . , X n. The critical value t0 can be obtained this way. However, simulation studies show that the bootstrap procedure is never reliable for n ~< 20. Because Bunke
638
and Riemer (1983) assumed that o = 1, the distribution of i"* = x/B(X*  2 ) / 2 s~ given X ~ , . . . , X n should be used to approximate that of T n, where s~ = n 2 (1/n) E i = l ( X i  X ) . See Remark 3.1. Ducharme and Jhun (1986) and Quenneville (1986) argued that if the test based on ttype statistic is used, then it will be better than that based on T n. This confirms the findings of Babu and Singh (1984b). The theoretical analysis and simulation results indicated that this leads to an interval much better than the former. Beran (1988b), considered the use of double bootstrap m e t h o d to reduce the error in rejection probability of the bootstrap test. Let X be a sample of size n whose distribution function is an unknown m e m b e r Po,, of a parametric family of the distributions, where 0 may be finite or infinite dimensional and restricted to O. The null hypothesis is 0 E O0 C O. Suppose that a test statistic T~ can be found, whose distribution function under the null hypothesis does not depend on 0. Let Cn(a ) be the largest ( 1  a )  t h quantile of the null distribution function of T,. Then qSn = (10 i f T n > G ( a ) ' otherwise is a test with level a. However, the distribution of T n in general depends on O. In this case, the asymptotic method or the bootstrap method can be used. For the asymptotic method, let Hn(', O) be the distribution function of Tn. Suppose H,,(., O)~H(., O) and 0n is a consistent estimate of O, then the asymptotic test is
otherwise. It can be proved under mild conditions that sup P0(T~ > H  l ( 1  a, On))* a .
0E@ 0
The bootstrap test is to estimate the null distribution function Hn(., 0) of T n by Hn(., On). This leads to q~B = {10 i f T n > H 2 l ( 1  c e ' O n ) otherwise.
'
It can also be proved under some regularity conditions that sup P0(rn > H21(1  oz, 0n))~ oz .
0 CO o
But the convergence rate may be the same as that of the asymptotic test. The double bootstrap. Let Tnl = H~(Tn, On) and Hnl(, 0) be the distribution
Bootstrap methodology
function of Tnl. Define 4~ = {10 ifotherwiseT, l>H,1 01 n ( l )a ,, .
639
The new test will have less rejection probability than 4~B In fact, if O n is assumed to be x/Bconsistant for 0 under the null hypothesis, and if
H.(x, O) = I4(x, O) + n
2h(x, O) + O(n
holds uniformly in x and locally uniformly for values of 0 in O0 for some k1> 1, then: (a) If H(, 0) is independent of 0, then ERP(~bA)= O(n~/2), ERP(~bB)O(n(k+l)/2), and ERP(thB1) = O(n(k+2)/2). (b) If H ( . , 0 ) depends on 0, then for some j<k, both ERP(4h) and ERP(thB) have the same order O(n~/2), and ERP(4'B1)= O(n(J+l)/Z), where ERP(40 denotes the error in rejection probability of test 4~. Beran (1986) considered tests based on a sample Z = (Z 1. . . . ,Z,), for testing the hypothesis Ho0: Z  P~0,0 against H p : Z  P~,0, where 0 is viewed as a fixed nuisance parameter. Romano (1988a, 1989) considered the use of bootstrap to some nonparametric distance tests and some nonparametric hypotheses. Recently, Boos, Janssen and Veraverbike (1989) and Boos and Brownie (1989) considered the bootstrap tests for twosample and ksample problems. In two recent papers, Liu (1991) and Liu and Rao (1993) considered the problem of estimating by the bootstrap sampling distribution of test statistics based on Rao's quadratic entropy (see Rao, 1982) in the analysis of one way classified data.
A bewildering array of bootstrap methods are available to construct confidence intervals. The commonly used methods include percentile method, percentilet method, biascorrected method (see Efron, 1982, Chapter 10) and accelerated biascorrected method (see Effon, 1987). Performance of percentile and biascorrected methods are not satisfactory. An article by Hall (1988b) clears the confusion surrounding these methods, by studying their theoretical properties. Second order correctness of studentized critical points (percentilet) are established in Babu and Singh (1983). Basically all these methods can be classified into two groups: (a) those requiring algebraic transformations prior to bootstrapping (Abramovitch and Singh, 1985), and (b) those which bootstrap first and then adjust the critical values so obtained. Biascorrected and accelerated biascorrected methods of Efron belong to the latter group. Both percentilet and accelerated bias
640
corrected methods are second order correct in a variety of cases (see Hall, 1988b). Let 0 = T(F) be a univariate parameter of interest and 0 = T , ( X ; F ) be an estimator of 0 based on an iid sample X = ( X 1 , . . . , X , ) of size n from a distribution F. Suppose Or2 is the asymptotic variance of x/~(0  0) and 62 is an estimator of o2. If o is known, x /  ~ ( 0  0 ) / o  is close to being pivotal. Otherwise one considers a studentized version x/~(0  0 ) / 6  as pivotal. A note of caution; in the sense the term is used in inference, neither of these quantities are strictly pivotal, in general. Let 0* = Tn(X*; F,), where F n is the empirical distribution of X and X* = ( X ~ , . . . , X,*) is an rid sample of size n from X.
7.1. Biascorrected percentile method If P*(O*<~O)#0.5, then the bootstrap is not median unbiased and bias correction is required. Let H n denote the distribution of 0* given the sample and z 0 = q~l(Hn(O)), where denotes the standard normal distribution function. Then the biascorrected two sided 100(1  2 a ) % confidence interval is given by [ H ~ l ( ( 2 z 0  z~)), H~l(O(2z0 + z~))], where z~ satisfies ~ (  z ~ ) = a. If Hn(O ) = 0 . 5 , then z 0 = 0 and the interval above is the 1  2a percentile confidence interval. The arguments supporting this are given in Effon (1982, Chapter 10).
7.2. Accelerated biascorrection method Schenker (1985) examined the sample variance estimate and showed the poor performance of biascorrection method. This motivated E f r o n (1987) to propose the accelerated biascorrection method. It not only takes bias correction into account, but also stabilizes the variance. The m e t h o d assumes existence of a transformation, which leads to standard normal distribution. The knowledge of the transformation is not needed, unlike Fisher's Ztransformation for correlation coefficient. The confidence interval is given by [H~l(@(z[a])), H~l(qb(z[1  a ] ) ) ] , where z[a] = z o + (z o + z ~ ) / ( 1  a ( z o + z~)) , and a mated a = 0, sions, is called the acceleration constant. The constant a is typically approxiby the skewness of the score function evaluated at the estimator 0. If then this reduces to biascorrected interval. Using Edgeworth expanHall (1988b, Section 3) illustrates estimation of a.
Bootstrap methodology
641
en ,
then
sup [H,(H*~I(a))  a I <2e, + j(H,) ,
0~ct ~1
642
the accelerated biascorrected methods. The construction leads to a new interval, which is asymptotically equivalent to the accelerated biascorrected interval and can be computed without bootstrap resampling. See Theorems 1 and 3 of Konishi (1991). DiCiccio and Romano (1988) obtained a bootstrap confidence interval, which approaches the nonparametric tilting method interval, but avoids the computation of the acceleration constant. They also mentioned the results for multiparameter families. Tibshirani (1988) suggested a 'variance stabilized bootstrap t' method to improve the percentilet methods. This procedure is less computationally intensive than the bootstrapt, is invariant under a transformation, second order correct and can be implemented automatically. The idea is to first obtain a variance stabilizing transformation h(O), then bootstrapping t interval for h(O) and finally transforming back to give the critical point for 0. The variance stabilizing transformation can also be obtained by bootstrap. Let X = { X 1 , . . . ,An} be the original sample. Resample B 1 times from X, get X [ = { X ~ , . . . ,XTn}, J'= 1 , . . . , B 1. Then use X [ to compute 0[, the bootstrap version of the estimator 0 of 0. For j = 1 , . . . , B~, do the double bootstrap, generate B 2 bootstrap samples from X [ , then compute the estimate of variance of 0 j , by the B 2 bootstrap samples. By smoothing this against 0 j , one gets a smooth variance function O(0). The variance stabilized transform is h(O)= f c {O(s)}l/2ds. The total number of bootstrap samples required by this method is B~B 2 + B3, where B 3 is the additional number of resamples drawn from X, to compute the confidence interval. If Ba = 100, B 2 = 25, B 3 = 1 0 0 0 , then total number of resamples is 3500, whereas the percentilet (double bootstrap) with the variance computed by bootstrap requires 25 000 resamples. For results on simultaneous confidence sets in the multiparameter case, see Beran (1988a, 1990).
Bootstrap methodology
643
mean zero and variance y 2 ~ P(1  p ) / f 2(pp), where f denotes the density of F at pp. If f(pp) is small, then a large value of B may be required to ensure a good approximation. The importance sampling estimates pp as follows. Let Y1,   , Y8 be a sample from G with density g. Suppose the sample is ordered as }1(1)< " " < Y(~) and
r
Sr : ~ 2
Di=l
[f(Y(i))/g(Y(i~)] ,
1 <~r < B .
Note that S r is close to F ( G  I ( r / B ) ) for r / B bounded away from 0 and 1. So if r is chosen such that S r is approximately p, then Y(r) is close to G  l ( r / B ) which is close to F  l ( p ) = pp. Hence pp can be estimated by pp = Y(R), where R is determined by S R <~p <~SR+ r The distribution of V B ( p ~  pp) is approximately normal with mean zero and variance
In principle /32 can be made arbitrarily small by choosing g ( y ) close to f ( y ) I ( y <~pp)/p. However, in practice f and pp are not known and complete control of g may not be possible. In some particular cases, a feasible g may be constructed as described in Johns (1988) to facilitate a reduction in the bootstrap replication size B. Though it achieves reduction in the number of bootstrap replications without significant loss of performance, it may increase the variability of the length of the intervals. Hall (1991) studies the asymptotic relative efficiency of uniform resampling and importance resampling. Bickel (1990) gives theoretical comparisons of bootstrap t confidence bounds. He investigates second order properties of parametric and nonparametric bootstrap methods and the effect of varying the scale estimators.
8. Randomly censored models The random censorship model from the right (for short, the random censorship model) has been the subject of intensive study over the last two decades because of its applicability to survival analysis. To describe the model, let Z 1, . . . , Z n be iid random variables with a continuous distribution function F. These variables generally represent the life of the items (or individuals) under observation. Associated with each Z i is an independent censoring variable C/. Further C 1 , . . . , Cn are assumed to be iid from a (censoring) distribution G. Due to random censoring, only Yi = min(Zi, C/)
and
6 i = I ( Z i <<Ci)
644
Bootstrap methodology
645
bands and tests. Babu (1991b) obtained representations for S~ and S* as a Ustatistic plus an error of the order o(1/n). Using these, the bootstrap distribution of the studentized version of the risk rate (the ratio of the survival distribution at x to the mean lifetime of an individual up to the time point x) (1Sn(x))/~o(1Sn(y))dy , is shown to approximate the corresponding sampling distribution better than the classical normal distribution. That is, the bootstrap is second order correct in this case. Similar results on second order correctness of the bootstrap for statistics, like specific exposure rate under competing risks (see Babu, Rao and Rao, 1992, for details and definitions), are also obtained in Babu (1991b). Lo and Wang (1989) and Dabrowska (1989) consider the bootstrap approximation of the bivariate productlimit estimates. Chung (1989) defined the bootstrapped productlimit ppercentile residual lifetime process and used it to construct the confidence band for ppercentile residual lifetime.
9. Regression models
9.1. Linear regression Consider the linear model Yi = X~fl + ei , and let fl(n) = ( i l l , . . . , fie)', Z ( n ) = ( X 1 , . . . , Xn), e(n) = ( e l , . . . , en)' and Y(n) = (YI . . . . , Yn)'. Consider first the case of nonrandom design matrix Z ( n ) . The least square estimate of/3 is given by fi(n) = ( Z ( n ) ' Z ( n ) }  l Z ( n ) ' Y ( n ) .
To approximate the sampling distribution of fi(n), Efron (1979) suggested a bootstrap method by resampling the estimated residuals. Freedman (1981) established the consistency, when {ei} are iid random variables with finite variance. Stine (1985) used this method to construct a prediction interval for Y at xf. Let the estimated residuals be given by 1 n ei(n) = ei(n)  n ~ ~j(n),
j
where ~(n) = Y(n)  Z ( n ) ' f l ( n ) 61, . . ,6,. . , en, ef . . Let . el, Z ( n ) ' f l ( n ) + e*(n), and/3*(n) = at xf be Y~ = x'yfl(n) + e~ and
and let F, denote the empirical distribution of be iid from Fn, e*(n) = ( e ~ , . . . , en), * Y*(n) = (Z(n)'Z(n))lZ'(n)Y*(n). Let the 'future value' the forecast error be
D f = Y f  x;/3*(n) = x ; ( f l ( n )  / 3 * ( n ) ) + e~.
646
= 0 (1  wi) 1'2 
oj(1  wj)
Consistency of bootstrap can be established if resampling is done from the empirical distribution of ( e l , . . . , e,). Asymptotically there is no difference, whether resampling is from ( ~ 1 , . . . , ~,) or from ( g ~ , . . . , in). Liu and Singh (1992) compare efficiency and robustness properties of various bootstrap and jackknife estimators of the variance of/3(n). See Babu (1992) for similar results for half sample methods. See also Singh and Liu (1990).
where X i = (i  0.5)/n; e l , . . . , e, are lid with mean zero and variance o2, and m is a twice continuously differentiable function. The estimate of m at x is given by the kernel method,
1 n
Bootstrap methodology
647
where K is a symmetric probability density with bounded support that is Lipschitz continuous and has been parametrized so that J" uZK(u)du = 1. Let ki = Y i  rn(Xi) denote the estimated residuals. To exclude boundary effects, consider only the residuals in ~Tn + 1 <~ i ~< (1  7/)n  1, 0 < ~7< 1. To bootstrap, consider for such i, = e i  [n(1  2~7)]1 ~ ' kj, where Z' denotes the sum over j in J = [~/n + 1, (1 ~/)n  1]. By resampling E 1 * ~ " " " ~ / ~ n* from ~., j E J one can construct Y~ = th(X~) + e~ and
m*(x) =  ~ .~=~K((x  X i ) / h ) Y ~ .
"
Note that rh(x) is a biased estimate of re(x). They have shown that the distribution of
du=l,
l>O, nlS>~, and h and g tend to zero at the rate n 1/5. The bootstrap
confidence interval can also be given by this method. Gu (1987) assumed that m is estimated by smoothing spline method. Then he applied the bootstrap method.
Xi=ui+
i,
+t ui+
648
where (6,., ei) are iid mean zero random vectors and u~ are unknown nuisance parameters. Let 0% and 0% respectively denote the standard deviations of 61 and e 1. The errorsinvariables models have been studied extensively in the literature. See Deeming (1968), Fuller (1987), Gleser (1985) and Jones (1979) 2 2 among others. It is well known that when the ratio A = o,/o~ is known, the least squares estimators of/3 and a are given by
~1 = h + sign(Sxy)(A + h2) 1/2
and
&l = 17 /31 ~ ,
(9.1)
where
h = (Syy  A S x x ) / 2 S x r , Sx x = ~ (X i _ ~)2,
i=1
The least squares method gives the same estimates as in (9.1), when both 08 and o% are known. Instead, if only one of the o is known, then under some conditions, the least squares estimators of/3 and a are given by
D2 = S x r / ( S x x  n o ' I ) ,
&2 = I7 /32 )~
l% = (Sty 
n',)/Sxv,
&3"~" 17 
~3 ~
'
when o% is known. A good summary of the estimators in the identifiable cases can be found in Jones (1979). It is not difficult to see that/%  / 3 , r = 1, 2, 3, can be written as smooth functions of the average of
Edgeworth expansions for ~j lead to those of /%. Standard results on Edgeworth expansions are not applicable for two reasons. First, the ~j are not identically distributed and secondly, the components of ~j are linearly dependent. But on the average ~j behave very well under some conditions on {uj}. Babu and Bai (1992) have shown that under some moment conditions, if e 1 and 61 are independent continuous centered random variables, then the studentized x/~(/%/3)/&r, r = 1,2, 3, and their bootstrapped versions have valid twoterm Edgeworth expansions. In fact, the independence of e 1 and 61 is not required but very weak continuity assumptions on the conditional distributions e 1 and 61 are enough. The expressions for estimators 6% of the standard deviations of ~/n~r are obtaining by using jackknife t y p e arguments
Bootstrap methodology
649
~))2,
i=i
^ 2 2, ((Xi _ k ) ( Y ~  Y  ~ 2 ( X i  S ) ) fi206)
~2
2 ~, i=1
(A  f i )
 fir)
0,
for almost all sample sequences, where P* denotes the probability distribution induced by bootstrap sampling, and fi* and #~* denote the bootstrap estimates of the slope and the standard deviation. That is, fi* and 0.r * * are obtained by replacing (X~, Y,.) by the bootstrap samples (X*, Y* ), in the expression for fir and 6" r This shows that bootstrap automatically corrects for skewness. Linder and Babu (1990) considered a different scheme of bootstrapping. Geometric arguments led them to estimate the residuals, construct the appropriate model and resample from the new residuals rather than the pairs (Xi, Y~). They also studied the asymptotic properties of the bootstrap distributions.
t = 1 . . . . , n, where A , B and C are coefficient matrices of unknown parameters, Y, is the vector of endogeneous variables at time t, where Z t is the vector of exogenous variables at time t, e, is the vector of disturbances (noise) at time t. The twostage weighted least squares method can be used to estimate A, B and C. Once these estimators _4, /) and C are obtained, to bootstrap, take a resample e l , . . . , en* from the estimated residuals ~t : Y t  Y r ~  Yt 1~  Z t C , after centering them. Let Y* = (Y*_I/) + ZtC + e* ) ( I  A) 1, keeping Z t fixed. From this, get new estimates A*, B* and C*. When . 4 , / ) and ~ are estimated b,y the twostage method, Freedman (1984) has shown that the distributions of A  A , / )  B and C  C can be approximated by A* _A, B*  / ) and C*  ~, both when (Z,} are random and nonrandom. This method is used to estimate the dispersion of A , / ) and ~ for some econometric models in Freedman and Peters (1984a,b). De Wet and van Wyk (1986) proposed a bootstrap method to set up the
650
where e l , . . . , en satisfy an autoregressive model AR(1), ei = d~ei_l + ,/i, or the moving average model MA(1), e~ = w~/i_t + ~)i. Here 9)i are iid with mean zero and finite variance. The parameters a and 13 are estimated by the least square method, that is
: ~, ~ :
(i~1

(t i  { ) X i
)(~
(t i _
"'i=l
~)2)1 .
The residuals of the model are ~ = X i  &  [3(t~  i ) . To use bootstrap, for AR(1) model, let ~i = ~ i  ~i1 and for MA(1) model let ~i : E i  ~)~i1' where
~'/=1
O2 = ~
_ ~
(1+
^, ~ , * Let {~/*} be lid from the empirical distribution of {~/i}, and let e~ = ~bsi_~ + ~ or s i = anl~_ 1 + ~/~ according as it is an AR(1) or MA(1) model. By defining ex = k~ and X~ = 6 + ]3(q  t) + e~, one can obtain the bootstrap estimates a* and/3*. By considering student type statistics (&  a ) / S ( & ) , where S2(&) is an estimate of the asymptotic variance of 6 (see Eriksson, 1983) and taking the quantiles of the corresponding bootstrap version as the true quantiles, one can obtain confidence intervals for a. Similar results hold for/3 also. Chatterjee (1986) considered the bootstrap approximations for the general A R M A model
~b(L){(1  L ) ' t Z t } : O ( L )at , where L is the usual lag operator, d is an integer required to make {Zt} a stationary process, and ~b and 0 are polynomials. Bootstrap methods for autoregressive spectral density function estimator were considered by Swanepoel and van Wyk (1986). Kiinsch (1989) used the blockwise bootstrap method to estimate the sampling distribution of a statistic based on dependent observations.
Bootstrap
methodology
651
Yt=~OiYt
i=l
i+et,
p
t=0,+l,+2,...,n,
"
where all the roots of Z j= 10jx p J   0 lie within the unit circle. Suppose the . 2(s+i) residuals {e~} are iid with mean zero and finite variance and Ee t < oo for some s 1> 3. The leastsquares estimators On = (01. . . . . , Opn) of 0 = ( 0 1 , . . . , Op) are given by
Sn(Oln, . . . . , Opn) t =
where Sn is the p x p matrix whose (i, ])th element is Etn=l Yt_iYt_j. To use bootstrap, let e i* be iid from the estimated residuals et = g t  ~'i=IP O i n Y t  i ' t = 1 , . . . , n, after centering them. To get the simulated model, define
P
YTE =Y * Oin t _ i + e * , t = l , . . . ,
i=1
and obtain the bootstrap estimates 0 n = (01, , 0p). Under Cramdr's condition for (ex, s~), Bose (1988) has shown that the bootstrap approximation is second order correct, that is for almost all samples,
0n)<X)l+0,
where X* and X denote the variance covariance matrices of {Y*} and {Yi} respectively. For the simple autoregressive model AR(1), Y~=/3Y~_~ + et, Y0=0,
and et are iid, the least squares estimate is consistent, if s~ has mean zero and finite variance. If Ifll < 1, then the process {Y~} is (asymptotically) stationary and I/~1>~ 1 corresponds to the nonstationary case. The nonstationary case is further divided into two cases: (a) explosive case, I~1 > 1, and (b) unstable case, Ifll = 1. For the limit theorems for the least squares estimators/3 of fl, see Anderson (1959). The limit distribution of ( / 3  fl), after proper normalization, is nonnormal if Ifll ~> 1. Basawa et al. (1989) have shown that the bootstrap method leads to an approximation of the sampling distribution of/3 in the explosive case. In the unstable case Basawa et al. (1991) have shown that bootstrap fails and that the bootstrap distribution converges to a random measure. The situation is similar to that dealt by Athreya (1987).
12.
Sample
survey
models
Suppose { x l , . . . ,xn} is a simple random sample from a finite population {X 1. . . . . XN}. The sample mean =(1/n)Zin=lxi is an estimate of the
652
population mean IdbN= ( 1 / N ) F , i = I X r The naive bootstrap will not help in estimating the variance of 2, since the variance estimate based on iid sample from (x I . . . . . . x,} will not be consistent. Gross (1980) suggested the following replication method. First suppose N = n k with k as an integer. By replicating ( x ~ , . . . , xn}, k times, get a new 'bootstrap population',
~Q = ( X l , . . . , X n , X l , . . . , X n , . . . , X l , . . . , X n ) .
Now take an iid sample x l , . . . , x n , from O without replacement. The conditional variance var*()(*), of Y*= (l/n) Zi= " ~x~ * given Xl, .. . , x,, can be used to estimate variance of ~?. In this case
var*(*)k ( n  1 ) ( N  n) (Znni~N~(nL~) ~
i=1
(xi
2) 2 .
In general if N = k n + r, 1 <~ r <~ n  1, duplicate the sample set ( X l , . . . , xn}, k times and (k + 1) times respectively and then resample from a mixture of the two replicated populations with mixing weights/3 and 1 /3, where/3 = (n  r) (n 1 r)/(n(n1)). Let the bootstrap sample so obtained be x ~ , . . . , x*. Chao and Lo (1985) established the consistency of the bootstrap approximation. THEOREM 12.1. S u p p o s e n   ~ 0% N  n> ~ a n d D u = ( l / N ) Zi"=1 IXi  tZNI 3. T h e n P*((2*  2) ~<t vVrv~~(Y* )) > ~(t)
f o r all t, in p r o b a b i l i t y .
This result can also be found in Bickel and Freedman (1984), where they considered bootstrapping for the stratified sampling case. McCarthy and Snowden (1985) suggested an alternative to this artificial resampling procedure. Naive bootstrap may be used with resampling size determined by matching the bootstrap variance estimate with the moment variance estimate. That is the resampling size n* is chosen so that
n
var*(*) = .:77Z ( x i  Y * ) 2 / n
n i=1
~ Nn N
1
n(n
Hence n* is ( n  1 ) ( 1  n / N ) approximately. This method is difficult to generalize to complex sample surveys. Rao and Wu (1988) made an adjustment for McCarthy and Snowden's procedure, so that it can be applied to stratified sampling models. Babu and Singh (1985) obtained Edgeworth expansions for distributions of sample mean and functions of multivariate sample mean, when sampling is
Bootstrap methodology
653
done without replacement from finite populations. This leads to results on asymptotic accuracy of bootstrap approximation, when resampling is done from the replicated sample sequence. Kovar, Rao and Wu (1988) also consider the problem of constructing confidence intervals for a nonlinear parameter 0 = g(1?) and the population quantiles, where I7" denotes the population mean. Simulation results are given to compare this method with other methods, for example the jackknife and balanced repeated replication methods. Rao and Wu (1988) consider a resampling procedure for a complex sampling scheme, where sampling is done with unequal probabilities and without replacement. Suppose the population of N units is partitioned at random into n groups G 1 , G 2 , . . . , G n of sizes N 1 , . . . , Nn, respectively. Let Pt = x,/X, Pk = ~tEGk Pt, Xt denote a measure of size of the tth unit approximately proportional to y~ and X = E x r Note that E Pk = 1 and E N k = N. A single unit is drawn from each of the n groups with p,_robabilities PJPk for the kth group. The unbiased estimate of Y is given by Y = E I ZkPk, where z k = y J ( N p k ) with (Yk, Pk) denoting the values for the unit selected from the kth^group. To describe the bootstrap procedure to estimate the variance of 17, let A2= (E N 2  N ) / ( N 2  Z N~). First attach the probability Pg to the sample unit from G~ and then select a sample {y/*, Pi* } /'~ = 1 of size m with replacement with n * * * probabilities Pk_ from {Y~, P~}k=r Calculate zi =Yi /(Npi ), z i = Y + Am~/2(z * 1?), I?= (l/m)Zero__1 2 i and 0 =g(Y) and repeat the procedure B times to obtain the bootstrap variance estimate of 0 and its Monte Carlo approximation. In the linear case it can be proved that var*(~')= var(l~'). Another example is the twostage cluster sampling with equal probability and without replacement. Suppose that the population comprises of N clusters with M e elements in the ith cluster, i = 1, 2 , . . . , N. The total population size is M 0 = E M v In many applications M 0 is not known. The procedure consists of first selecting a simple random sample of n clusters without replacement. If the ith sample cluster is chosen, then m e elements are chosen, again by simple random sampling without replacement, from the M~ elements in the ith duster. The population total Y is estimated by ~z = n /=1 ~ = n '= MiYi' where )7e is the sample mean for the ith sample cluster. Consider the variance estimate of Y = Y / M o. For the bootstrap, again a two stage resampling method is employed. First select a simple random sample of n clusters with replacement from the n sample clusters. If the ith sample cluster is chosen, then draw a simple random sample of size rni, with replacement from the m i elements in the ith sample cluster. If the sample cluster is chosen more than once, the subsamples should be chosen independently, as many times. Let Yej denote the y value of the ]th sample element in the ith resample cluster, m e * denote the m e value of the ith resample cluster, similarly M*, and ~z* denote the
654
G. J. B a b u a n d C. R . R a o
[ NY*
Y,, = r + and
n
~'~
) +
, f NM*y~
Mo
m*
NY*~ Mo J
1
mi
'
"=
j=l
= ~
n(m _,~) 1 
* 7mi m/  1
Replicating this step B times, the variance estimate of 0 = g ( I ~) can be obtained. In the linear case this estimate reduces to the standard variance estimate. Recently, Kuk (1989) used the double bootstrap method to give the variance estimator for the statistic under systematic sampling with probability proportional to size.
n i=1
i=1
where ( n ~ , . . . , nn) is a realization from the multinomial distribution M(n; 1/n, . . . , 1/n). The distribution of X* can also be realized as the conditional distribution of (l/n)Ein=l miXi, given Ein=l m i n, where m s , . . . , m, are iid Poisson variables with mean 1. This random weighting scheme can be generalized to obtain what is known as Bayesian bootstrap. Rubin (1981) considers the posterior distribution given the sample, when the prior is the exponential distribution with mean 1. More precisely, the Bayesian bootstrap distribution is the distribution of Jfo,n = E i~l viXi given the sample, where v = (v~ . . . . , v,) has the Dirichlet distribution D(n; 1, 1 , . . . , 1). Tu and Zheng (1987) and Weng (1989) have shown that, if v has the Dirichlet distribution D(n; 4 , . . . , 4) instead of D(n; 1 , . . . , 1) (that is, the prior is exponential with mean 4 rather than 1), then the Bayesian bootstrap is second order correct as in the case of the nonparametric bootstrap.
=
THEOREM 13.1. Let tx and 0 2 denote the mean and variance of X 1 and let (111,... , V,) have the Dirichlet distribution D(n; 4 . . . . ,4). If X 1 has nonlattice
Bootstrap methodology
655
v ~ suplP*(O. ~ x ~ . ) )
x
 P(v~(X"  / z ) ~<xcr)l~ 0
for almost all sample sequences {Xi} , where D. = Ei~=l(X i  f O V i and P* and var* denote the probability measure and the variance given the sample
(Xl,...
,Xn}.
Lo (1991) considers general priors instead of the exponential priors considered above the calls them Bayesian bootstrap clones. He obtains asymptotic normality. Babu and Bai (1993) unify the theory of Edgeworth expansions, combining bootstrap, Bayesian bootstrap (and its clones), and sampling from finite population schemes. These results lead to second order correctness of Bayesian bootstrap under a wide class of priors and for a wide class of statistics which can be approximated by functions of multivariate means. It is worth pursuing the study of relative merits of these types of bootstrap by considering weighted mean squares of the differences of the distributions. Bayesian bootstrap can be seen as akind of smoothing for Efron's bootstrap. As the weights are unrelated to the original sample, this method may be applied to the nonlid models. The Bayesian bootstrap or random weighting method can also be applied to simple linear regression models.
References
Abramovitch, L. and K. Singh (1985). Edgeworth corrected pivotal statistics and the bootstrap. Ann. Statist. 13, 116132. Akritas, M. G. (1986). Bootstrapping the KaplanMeier estimator. J. Amer. Statist. Assoc. 81, 10321038. Anderson, T. W. (1959). On asymptotic distributions of estimates of parameters of stochastic difference equations. Ann. Math. Statist. 30, 676687. Athreya, K. B. (1987). Bootstrap of the mean in the infinite variance case. Ann. Statist. 15, 724731. Babu, G. J. (1984). Bootstrapping statistics with linear combinations of chisquares as weak limit. Sankhyd Ser. A 46, 8593. Babu, G. J. (1986a). Efficient estimation of the reciprocal of the density quantile function at a point. Statist. Probab. Lett. 4, 133139. Babu, G. J. (1986b). Estimation of density quantile function. Sankhyd Ser. A 48, 142149. Babu, G. J. (1986c). A note on bootstrapping the variance of the sample quantile. Ann. Inst. Statist. Math. 38(A), 439443. Babu, G. J. (1989). Applications of Edgeworth expansions to bootstrap  A review. In: Y. Dodge, ed., Statistical Data Analysis and Inference, Elsevier, Amsterdam, 223237. Babu, G. J. (1991a). Edgeworth expansions for statistics, which are functions of lattice and nonlattice variables. Statist. Probab. Lett. 12, 17. Babu, G. J. (1991b). Asymptotic theory for estimators under random censorship. Probab. Theory Related Fields 90, 275290. Babu, G. J. (1992). Subsample and halfsample methods. Ann. Inst. Statist. Math., to appear.
656
Babu, G. J. and Z. D. Bai (1992). Edgeworth expansions for errorsinvariables models. J. Multivariate Anal. 42, 226244. Babu, G. J. and Z. D. Bai (1993). Mixtures of global and local Edgeworth expansions and their applications. Preprint. Babu, G. J. and A. Bose (1988). Bootstrap confidence intervals. Statist. Probab. Lett. 7, 151160. Babu, G. J. and C. R. Rao (1992). Expansions for statistics involving the mean absolute deviations. Ann. Inst. Statist. Math. 44, 387403. Babu, G. J., C. R. Rao and M. B. Rao (1992). Nonparametric estimation of specific occurrence/ exposure rate in risk and survival analysis. J. Amer. Statist. Assoc. 87, 8489. Babu, G. J. and K. Singh (1983). Inference on means using the bootstrap. Ann. Statist. 11, 9991003. Babu, G. J. and K. Singh (1984a). Asymptotic representations related to jackknifing and bootstrapping Lstatistics. Sankhyd Ser. A 46, 195206. Babu, G. J. and K. Singh (1984b). On one term Edgeworth correction by Efron's Bootstrap. Sankhyd Ser. A 46, 219232. Babu, G. J. and K. Singh (1985). Edgeworth expansions for sampling without replacement from finite populations. J. Multivariate Anal. 17, 261278. Babu, G.J. and K. Singh (1989). On Edgeworth expansions in the mixture cases. Ann. Statist. 17, 443447. Basawa, I.V., A. K. Mallik, W. P. McCormick and R. L. Taylor (1989). Bootstrapping explosive autoregressive processes. Ann. Statist. 17, 14791486. Basawa, I. V., A. K. Mallik, W. P. McCormick, J. H. Reeves and R. L. Taylor (1991). Bootstrapping unstable firstorder autoregressive processes. Ann. Statist. 19, 10981101. Beran, R. (1982). Estimated sampling distributions: The bootstrap and competitors. Ann. Statist. 10, 212225. Beran, R. (1986). Simulated power functions. Ann. Statist. 14, 151173. Beran, R. (1987). Prepivoting to reduce level error of confidence sets. Biometrika 74, 457468. Beran, R. (1988a). Balanced simultaneous confidence sets. J. Amer. Statist. Assoc. 83, 679686. Beran, R. (1988b). Prepivoting test statistics: A bootstrap view of asymptotic refinements. J. Amer. Statist. Assoc. 83, 687697. Beran, R. (1990). Refining bootstrap simultaneous confidence sets. J. Amer. Statist. Assoc. 85, 417426. Bhattacharya, R. N. and M. Qumsiyeh (1989). Second order and LPcomparisons between the bootstrap and empirical Edgeworth expansion methodologies. Ann. Statist. 17, 160169. Bickel, P. J. (1990). Theoretical comparison of bootstrap to confidence bounds. Technical Report 273. University of California, Berkeley, CA. Bickel, P. J. and D. A. Freedman (1981). Some asymptotics on the bootstrap. Ann. Statist. 9, 11961217. Bickel, P. J. and D. A. Freedman (1984). Asymptotic normality and the bootstrap in stratified sampling. Ann. Statist. 12, 470482. Bickel, P. J. and J. A. Yahav (1988). Richardson extrapolation and the bootstrap. J. Amer. Statist. Assoc. 83, 387393. Boos, D. D. and C. Brownie (1989). Bootstrap methods for testing homogeneity of variance. Technometrics 31, 6982. Boos, D. D., P. Janssen and N. Veraverbeke (1989). Resampling from centered data in the twosample problem. J. Statist. Plann. Inference 21, 327346. Booth, J. G., P. Hall and A. T. A. Wood (1991). Balanced importance resampling for the bootstrap. Technical Report CMASR1691, Australian National University, Canberra. Bose, A. (1988). Edgeworth correction by bootstrap in autoregressions. Ann. Statist. 16, 17091722. Bose, A. and G. J. Babu (1991). Accuracy of the bootstrap approximation. Probab. Theory Related Fields 90, 301316. Bunke, O. and S. Riemer (1983). A note on bootstrap and other empirical procedures for testing linear hypotheses without normality. Statistics 14, 517526.
Bootstrap methodology
657
Chao, M.T. and S.H. Lo (1985). A bootstrap method for finite population. Sankhyff Ser. A 47, 399405. Chatterjee, S. (1986). Bootstrapping A R M A models: Some simulations. I E E E Trans. Systems Man Cybernet. 16, 294299. Chung, C.J. F. (1989). Confidence bands for percentile residual lifetime under random censorship model. J. Multivariate Anal. 29, 94126. Dabrowska, D. M. (1989). KaplanMeier estimate on the plane: Weak convergence, LIL, and the bootstrap. J. Multivariate Anal. 29, 308325. Davison, A. C. and D. V. Hinkley (1988). Saddlepoint approximations in resampling methods. Biometrika 75, 417431. Davison, A. C., D. V. Hinkley and E. Schectman (1987). Efficient bootstrap simulation. Biometrika 74, 555566. Deeming, T. J. (1968). The analysis of linear correlation in astronomy. Vistas Astronom. 10, 125142. De Wet, T. and J. W. J. van Wyk (1986). Bootstrap confidence intervals for regression coefficients when the residuals are dependent. J. Statist. Comput. Simulation 23, 317327. DiCiccio, T. J. and J. P. Romano (1988). Discussion of 'Theoretical comparison of bootstrap confidence intervals' by P. Hall. Ann. Statist. 16, 965969. DiCiccio, T. J. and R. Tibshirani (1987). Bootstrap confidence intervals and bootstrap approximations. J. Amer. Statist. Assoc. 82, 163170. Ducharme, G. R. and M. Jhun (1986). A note on bootstrap procedure for testing linear hypotheses. Statistics 17, 527531. Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 126. Efron, B. (1981). Censored data and the bootstrap. J. Amer. Statist. Assoc. 76, 312319. Efron, B. (t982). The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia, PA. Efron, B. (1987). Better bootstrap confidence intervals (with discussion). J. Amer. Statist. Assoc, 82, 171200. Effon, B. and F. Tibshirani (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy (with discussion). Statist. Sci. 1, 5477. Eriksson, B. O. (1983). On the construction of confidence limits for the regression coefficients when the residuals are dependent. J. Statist. Comput. Simulation 17, 297309. Falk, M. (1986). On the accuracy of the bootstrap approximation of the joint distribution of sample quantiles. Comm. Statist. Theory Methods 15, 28672876. Falk, M. and R.D. Reiss (1989). Weak convergence of smoothed and nonsmoothed bootstrap quantile estimates. Ann. Probab. 17, 362371. Freedman, D. A. (1981). Bootstrapping regression models. Ann. Statist. 9, 12181228. Freedman, D. A. (1984). On the bootstrapping twostage least squares estimates in stationary linear models. Ann. Statist. 12, 827842. Freedman, D. A. and S. C. Peters (1984a). Bootstrapping a regression equation: Some empirical results. J. Amer. Statist. Assoc. 79, 97106. Freedman, D. A. and S. C. Peters (1984b). Bootstrapping an econometric model: Some empirical results. J. Bus. Econ. Statist. 2, 150158. Fuller, W. A. (1987). Measurement Error Models. Wiley, New York. Ghosh, M., W. Parr, K, Singh and G. J. Babu (1984). A note on bootstrapping the sample median, Ann. Statist. 12, 11301135. Gleason, J. R. (1988). Algorithms for balanced bootstrap simulations. Amer. Statist. 42, 263266. Gleser, L. J. (1985). A note on G. R. Dolby's unreplicated ultrastructural model. Biometrika 72, 117124. Gross, S. (1980). Media estimation in sample surveys. Paper presented at 1980 Amer. Statist, Assoc. meeting on survey sampling. Gu, C. (1987). What happens when bootstrapping the smoothing spline? Comm. Statist. Theory Methods 16, 32753284. Hall, P. (1986a). On the bootstrap and confidence intervals. Ann. Statist. 14, 14311452.
658
Hall, P. (1986b). On the number of bootstrap simulations required to construct a confidence interval. Ann. Statist. 14, 14521462. Hall, E (1988a). Rate of convergence in bootstrap approximations. Ann. Probab. 16, 16651684. Hall, E (1988b). Theoretical comparison of bootstrap confidence intervals (with discussion). Ann. Statist. 16, 927985. Hall, P. (199t). Bahadur representations for uniform resampling and importance resampling, with applications to asymptotic relative efficiency. Ann. Statist. 19, 10621072. Hall, P., T. J. DiCiccio and J. E Romano (1989). On smoothing and the bootstrap. Ann. Statist. 17, 692704. Hall, P. and M. A. Martin (1988). Exact convergence rate of bootstrap quantile variance estimator. Probab. Theory Related Fields 80, 261268. H~rdle, W. and A. W. Bowman (1988). Bootstrapping in nonparametric regression: Local adaptive smoothing and confidence bands. J. Amer. Statist. Assoc. 83, 102110. Helmets, R. (1991). On Edgeworth expansion and the bootstrap approximation for a Studentized Ustatistic. Ann. Statist. 19, 470484. Horv~ith, L. and B. S. Yandell (1987). Convergence rates for the bootstrapped productlimit process. Ann. Statist. 15, 11551173. Johns, M. V. (1988). Importance resampling for bootstrap confidence intervals. J. Amer. Statist. Assoc. 83, 709714. Jones, T. A. (1979). Fitting straight lines when both variables are subject to error. Math. Geol. 11, 125. Konishi, S. (1991). Normalizing transformations and bootstrap confidence intervals. Ann. Statist. 19, 22092225. Kovar, J., J. N. K. Rao and C. F. J. Wu (1988). Bootstrap and other methods to measure errors in survey estimates. Canad. J. Statist. 16(supplement), 2545. Kuk, A. Y. C. (1989). Double bootstrap estimation of variance under systematic sampling with probability proportional to size. J. Statist. Comput. Simulation 31, 7382. K/insch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17, 12711241. L6ger, C. and J. E Romano (1990). Bootstrap choice of tuning parameters. Ann. Inst. Statist. Math. 42, 708735. Linder, E. and G. J. Babu (1990). Bootstrapping the linear functional relationship with known error variance ratio. Submitted for publication. Liu, R. Y. and K. Singh (1987). On partial correction by the bootstrap. Ann. Statist. 15, 17131718. Liu, R. Y. and K. Singh (1992). Efficiency and robustness in resampling. Ann. Statist. 20, 370384. Liu, R. Y., K. Singh and S.H. Lo (1989). On a representation related to the bootstrap. SankhyK 51, 168177. Liu, Z. J. (1991). Bootstrapping one way analysis of Rao's quadratic entropy. Comm. Statist. 20, 16831702. Liu, Z. J. and C. R. Rao (1993). Asymptotic distribution of statistics based on quadratic entropy and bootstrapping. J. Statist. Plann. Inference, in press. Lo, S.H. and K. Singh (1986). The productlimit estimator and the bootstrap: Some asymptotic representations. Probab. Theory Related Fields 71, 455465. Lo, S.H. and J. L. Wang (1989). I.i.d. representations for the bivariate product limit estimators and the bootstrap versions. J. Multivariate Anal. 28, 211226. Lo, A. Y. (1991). Bayesian bootstrap clones and a biometry function. Sankhyd Ser. A 53,320333. Loh, W.Y. (1987). Calibrating confidence coefficients. J. Amer. Statist. Assoc. 82, 155162. Loh, W.Y. (1988). Discussion of 'Theoretical comparison of bootstrap confidence intervals' by P. Hall. Ann. Statist. 16, 972976. McCarthy, P. J. and C. B. Snowden (1985). The bootstrap and finite population sampling. In: Vital Health Statist. Ser. 2, Vol. 95, Public Health Service Publication 851369, U.S. Government Printing Office, Washington, DC.
Bootstrap methodology
659
Quenneville, B. (1986). Bootstrap procedure for testing linear hypotheses without normality. Statistics 17, 533538. Rao, C. R. (1982). Diversity, its measurement, decomposition, apportionment and analysis. Sankhyd Ser. A 44, 121. Rao, J. N. K. and C. F. J. Wu (1988). Resampling inference with complex survey data. J. Amer. Statist. Assoc. 83, 231241. Reeds III, J. A. (1976). On the definition of von Mises functionals. Ph.D. Thesis, Department of Statistics, Harvard University. Reid, N. (1981). Estimating the median survival time. Biometrika 68, 601608. Romano, J. P. (1988a). A bootstrap revival of some nonparametric distance tests. J. Amer. Statist. Assoc. 83, 698708. Romano, J. P. (1988b). Bootstrapping the mode. Ann. Inst. Statist. Math. 40, 565586. Romano, J. P. (1989). Bootstrap and randomization tests of some nonparametric hypotheses. Ann. Statist. 17, 147159. Rubin, D. B. (1981). The bayesian bootstrap. Ann. Statist. 9, 130134. Schenker, N. (1985). Qualms about bootstrap confidence intervals. J. Amer. Statist. Assoc. 80, 360361. Schuster, E. F. (1987). Identifying the closest symmetric distribution on density function. Ann. Statist. 15, 865874. Schuster, E. F. and R. Bucker (1987). Using the bootstrap in testing symmetry vs. asymmetry. Comm. Statist. B. Simulation 16, 6984. Schuster, E. F. and J. A. Narvarte (1973). A new nonparametric estimator of the center of a symmetric distribution. Ann. Statist. 1, 10961104. Silverman, B. W. and G. A. Young (1987). The bootstrap: To smooth or not to smooth? Biometrika 74, 469479. Singh, K. (1981). On the asymptotic accuracy of Efron's bootstrap. Ann. Statist. 9, 11871195. Singh, K. and G. J. Babu (1990). On asymptotic optimality of the bootstrap. Scand. J. Statist. 17, 19. Singh, K. and R. Y. Liu (1990). On the validity of the jackknife procedure. Scand. J. Statis. 17, 1121. Stine, R. A. (1985). Bootstrap prediction intervals for regression. J. Amer. Statist. Assoc. 80, 10261031. Swanepoel, J. W. H. (1986). A note on proving that the (modified) bootstrap works. Comm. Statist. Theory Methods 15, 31933203. Swanepoel, J. W. H. and J. W. J. van Wyk (1986). The bootstrap applied to power spectral density function estimation. Biometrika 73, 135141. Tibshirani, R. (1988). Variance stabilization and the bootstrap. Biometrika 75, 433444. Tu, D. S. and Z. G. Zheng (1987). On the Edgeworth expansions of random weighing methods. Chinese J. Appl. Probab. Statist. 3, 340347. Weber, N. C. (1984). On resampling techniques for regression models. Statist. Probab. Lett. 2, 275278. Weng, Ch.S. (1989). On secondorder asymptotic property of the Bayesian bootstrap mean. Ann. Statist. 17, 705710. Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis (with discussion). Ann. Statist. 14, 12611350.