Sie sind auf Seite 1von 19

Computational Statistics & Data Analysis 45 (2004) 215 233

www.elsevier.com/locate/csda

Asymptotic inference under heteroskedasticity of


unknown form
Francisco Cribari-Neto
Departamento de Estat stica, Universidade Federal de Pernambuco, Cidade Universit aria, Recife/PE,
50740 540, Brazil

Received 1 January 2002; received in revised form 1 November 2002

Abstract

We focus on the -nite-sample behavior of heteroskedasticity-consistent covariance matrix es-


timators and associated quasi-t tests. The estimator most commonly used is that proposed by
Halbert White. Its -nite-sample behavior under both homoskedasticity and heteroskedasticity is
analyzed using Monte Carlo methods. We also consider two other consistent estimators, namely:
the HC3 estimator, which is an approximation to the jackknife estimator, and the weighted boot-
strap estimator. Additionally, we evaluate the -nite-sample behavior of two bootstrap quasi-t
tests: the test based on a single bootstrapping scheme and the test based on a double, nested
bootstrapping scheme. The latter is very computer-intensive, but proves to work well in small
samples. Finally, we propose a new estimator, which we call HC4; it is tailored to take into
account the e9ect of leverage points in the design matrix on associated quasi-t tests.
c 2003 Elsevier B.V. All rights reserved.

Keywords: Bootstrap; Heteroskedasticity; High leverage points; Quasi-t tests; Regression

1. Introduction

Linear regression models for cross sectional data usually display some form of het-
eroskedasticity, i.e., the error variances are not the same for all observations. The
ordinary least-squares estimator (OLSE) of the linear parameters remains unbiased and
consistent under neglected heteroskedasticity, and is commonly used even when one
suspects that the conditional variances are not constant. The usual covariance matrix
estimator of the OLSE, however, becomes biased and is not consistent under unequal
error variances. Several authors have proposed covariance matrix estimators that are

E-mail address: cribari@de.ufpe.br (F. Cribari-Neto).

c 2003 Elsevier B.V. All rights reserved.


0167-9473/$ - see front matter 
doi:10.1016/S0167-9473(02)00366-3
216 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

consistent under both homoskedasticity and heteroskedasticity of unknown form. For


instance, in his recent econometrics textbook, Je9rey Wooldridge writes (Wooldridge,
2000, p. 249):

In the last two decades, econometricians have learned to adjust standard errors, t,
F and LM statistics so that they are valid in the presence of heteroskedasticity
of unknown form. This is very convenient because it means we can report new
statistics that work, regardless of the kind of heteroskedasticity present in the
population.

The most commonly used heteroskedasticity-consistent covariance matrix estimator


(HCCME) is that proposed by Halbert White in an inEuential Econometrica paper
(White, 1980), who built upon Eicker (1963). His estimator, which is also known as
HC0, is implemented into a number of statistical software (such as, e.g., LIMDEP and
SHAZAM) and is commonly used by practitioners. The paper where it was originally
proposed currently has over 2600 (two thousand and six hundred) citations according
to the Institute for Scienti-c Information. This is clearly an indication that Whites
paper has had a profound impact on the literature. Monte Carlo evidence, however,
indicates that the HC0 estimator can be considerably biased in -nite samples, and that
it tends to underestimate the true variances, thus leading to liberal associated quasi-t
tests; see, e.g., Cribari-Neto et al. (2000), Cribari-Neto and Zarkos (1999, 2001) and
MacKinnon and White (1985). 1
MacKinnon and White (1985) considered alternative HCCMEs and found that the
jackknife covariance matrix estimator typically outperforms other estimators in -nite
samples, including the estimator proposed by White. Davidson and MacKinnon (1993)
argue that the jackknife estimator is closely approximated by a variant of the White
estimator known as HC3. The simulation results in Long and Ervin (2000) favor the
HC3 estimator over other alternative estimators. This estimator is also implemented
into some statistical software. 2 Wu (1986) proposed a weighted bootstrap estimator
that is consistent under heteroskedasticity of unknown form. Cribari-Neto et al. (2000)
developed a bias-correction scheme to be applied to the HC0 estimator which delivers
a sequence of bias-adjusted estimators.
The results in Cribari-Neto and Zarkos (2001) show that the presence of high lever-
age points in the design matrix is more decisive for the -nite-sample behavior of the
di9erent HCCMEs than the degree of heteroskedasticity itself. The resulting quasi-t
tests tend to be quite liberal when the design matrix includes high leverage observa-
tions, thus leading to imprecise inference. In this paper, we propose a HCCME that
takes into account the impact of high leverage points on the -nite-sample behavior of
the covariance matrix estimator. We call this estimator HC4 and show that it performs

1 By liberal we mean tests that overreject the null hypothesis when this hypothesis is true.
2 Long and Ervin (2000) survey 12 statistical packages, and show that the most commonly implemented
heteroskedasticity-consistent covariance matrix estimator is the White (HC0) estimator, the HC3 estimator
only being available in STATA and TSP.
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 217

well in -nite samples when used to construct quasi-t tests, regardless of whether the
data contain high leverage observations.
We also investigate the -nite-sample behavior of inference based on the bootstrap.
We consider three di9erent approaches. The -rst uses the weighted bootstrap esti-
mator proposed by Wu (1986) to construct quasi-t statistics, the second bootstraps
the quasi-t statistic constructed using the HC0 estimator, and the third and -nal one
bootstraps the same test statistic but uses a nested, double bootstrapping scheme.
The latter is quite intensive computationally since it involves two levels of boot-
strapping.
Our results show, as expected, that asymptotic inference in linear regression mod-
els with heteroskedasticity of unknown form is considerably a9ected by the
presence of high leverage observations in the design matrix. Inference based on a
double bootstrap test proves to be somewhat reliable even when such points do ex-
ist. The results also show that quasi-t tests that use the HC4 estimator we propose
are also reliable. Indeed, they are even more reliable than double bootstrap tests,
and much simpler computationally. The numerical results also suggest that inference
under highly asymmetric errors when heteroskedasticity is strong can be imprecise.
But HC4-based inference seems to be the least imprecise of all inference strategies
considered.
The paper unfolds as follows. Section 2 describes the model of interest and several
heteroskedasticity-consistent covariance matrix estimators. In Section 3 we propose
a new estimator for the covariance matrix of the ordinary least-squares regression
parameters estimator; we call the proposed estimator HC4. Section 4 presents bootstrap
tests; both single and double bootstrap tests that are robust to heteroskedasticity are
considered. Numerical results from stochastic simulation are presented and discussed
in Section 5. An empirical application is presented in Section 6. Finally, Section 7
concludes the paper.

2. The model and estimators

The model is the linear regression model where the variable of interest (y) is mod-
eled as a linear systematic component plus error:
y = X + u;
where y is an n1 vector of observations on the dependent variable, X is a -xed matrix
of dimension n p with full column rank (rank(X ) = p n) containing explanatory
variables,  = (1 ; : : : ; p ) is a p-vector of unknown linear parameters, and u is an
n-vector of errors, each having mean zero and variance i2 . We denote the covariance
matrix of u as  = diag{12 ; : : : ; n2 }. When the errors are homoskedastic, we have
i2 = 2 0, i.e.,  = 2 In , where In is the identity matrix of order n. The ordinary
least-squares estimator of  is given by  = (X  X )1 X  y, which has mean  (i.e., it
is unbiased) and variance structure var() = , with

 = (X  X )1 X  X (X  X )1 :
218 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

Under homoskedasticity, i.e.,  = 2 In , this expression simpli-es to 2 (X  X )1 , which


can be easily estimated as 2 (X  X )1 , where 2 = u u=(n p). Here, u = (In
X (X  X )1 X  )y is the n-vector of least-squares residuals.
The most commonly used consistent estimator for  is the HC0 estimator (White,
1980), namely:
 = (X  X )1 X  X
(X  X )1 ;
where  = diag{u21 ; : : : ; u2n }. That is,  is a diagonal matrix formed out of the vector
of squared least-squares residuals. This estimator is consistent under both homoskedas-
ticity and heteroskedasticity of unknown form; see White (1980). However, it can be
considerably biased in -nite samples; see, e.g., Cribari-Neto and Zarkos (1999, 2001)
and MacKinnon and White (1985).
An alternative estimator with superior -nite-sample behavior can be devised by mod-
ifying the HC0 estimator. The idea is to use
 = diag{u21 =(1 h1 )2 ; : : : ; u2n =(1 hn )2 };
where hi is the ith diagonal element of the hat matrix H = X (X  X )1 X  ; i = 1; : : : ; n.
The resulting estimator is widely known as the HC3 estimator and provides a close
approximation to the jackknife estimator considered by MacKinnon and White (1985);
see Davidson and MacKinnon (1993, Section 16.3).
A computer-intensive alternative is to use the bootstrap method, which was originally
proposed by Bradley Efron in a 1979 Annals of Statistics paper (Efron, 1979). The
bootstrap is a computer-based method that allows one to obtain measures of accuracy
for statistical estimates. 3 The simplest formulation of the bootstrap algorithm samples
with replacement from the residuals to create additional pseudo-samples. However, this
bootstrapping scheme does not take into account the fact that the error variances are not
the same when there is heteroskedasticity in the model. Indeed, the bootstrap estimator
described above is neither consistent nor asymptotically unbiased when the underlying
data generating process is heteroskedastic; see, e.g., Wu (1986). A bootstrap estimator
which is consistent under both homoskedasticity and heteroskedasticity of unknown
form was proposed by Wu (1986), and can be described as follows:

1. For each i, i = 1; : : : ; n, draw a random number ti from a population that has mean
zero and variance one.
2. Construct a bootstrap sample (y ; X ), where yi = Xi  + ti u i =(1 hi ), Xi denoting
the ith row of X .
3. Compute the OLSE of :  = (X  X )1 X  y .
4. Repeat steps 1 to 3 a large number (say, B) of times.
5. Compute the variance of the B+1 vectors of estimates (the initial vector of estimates
and the B bootstrap estimates).

The resulting estimator is known as the weighted bootstrap estimator. 4 Note that,
in the bootstrapping scheme, the variance of ti u i is not constant. Note also that we
3 For details, see, e.g., Davison and Hinkley (1997) and Efron and Tibshirani (1993).
4 It is also known as the wild bootstrap or external bootstrap estimator.
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 219


have modi-ed step 2. Wus proposal was to divide each residual by 1 hi and
not by 1 hi . We found that the latter form usually yields superior small-sample
behavior.

3. A new estimator

Monte Carlo evidence has shown that the HCCMEs described in the previous section
tend to display poor -nite-sample behavior in small samples when the design matrix X
contains points of high leverage, leading to associated quasi-t tests that are liberal, i.e.,
that overreject the null hypothesis when it is true; see, e.g., Cribari-Neto and Zarkos
(2001). Hoaglin and Welsch (1978) proposed to use the diagonal elements h1 ; : : : ; hn of
the hat matrix H = X (X  X )1 X  as measures of leverage of the n observations, since
hi = 9y i =9yi , where y i is the ith -tted value. As noted by Davidson and MacKinnon
(1993, Section 1.6), it is possible to write
 
(i) 1
 = (X  X )1 Xi u i ; i = 1; : : : ; n;
1 hi

where (i) is the vector of OLS estimates obtained when we omit observation i from
the sample. It then follows that when u i is large and/or 1 hi is small (i.e., hi is
large), the e9ect of the ith observation on at least some of the elements of  is likely
to be sizeable. We can also write
 
(i) hi
Xi  = Xi  u i ; i = 1; : : : ; n;
1 hi
thus implying that the change in the ith -tted value caused by the omission of obser-
vation i equals u i hi =(1 hi ). As a direct consequence, the change in the ith residual
is {hi =(1 hi )}u i . We can then use hi as a measure for the leverage of the ith ob-
servation. A general rule-of-thumb is that values of hi in excess of two or three times
the average (i.e., 2p=n and 3p=n) are regarded as inEuential and worthy of further
investigation (Judge et al., 1988, p. 893; see also Davidson and MacKinnon, 1993,
p. 36).
As noted by Chesher and Jewitt (1987, p. 1219), the possibility of severe down-
ward bias in the HC0 estimator arises when there are large hi , because the associated
least-squares residuals have small magnitude on average and the HC0 estimator takes
small residuals as evidence of small error variances. The HC3 estimator includes a
correction term which takes into account the e9ect of the degree of leverage of each
observation since it uses, as we have seen,

 = diag{u21 =(1 h1 )2 ; : : : ; u2n =(1 hn )2 }:

Another commonly used estimator is the HC2 estimator (MacKinnon and White, 1985),
which uses

 = diag{u21 =(1 h1 ); : : : ; u2n =(1 hn )}:


220 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

The HC3 estimator discounts more heavily the e9ect of the hi s than the HC2 estimator,
and typically has better -nite-sample behavior.
The estimator we propose, which we call HC4, uses

 = diag{u21 =(1 h1 )1 ; : : : ; u2n =(1 hn )n };

where
   
hi nhi
i = min 4; = min 4; n ;
hO j=1 hi
n
where hO = n1 hi , i.e., hO is the average of the hi s. That is,
i=1
 
nhi
i = min 4; :
p

Here we use the fact that the sum of all hi s is equal to p since
n

hi = tr(H ) = tr(X (X  X )1 X  ) = tr(X  X (X  X )1 ) = tr(Ip ) = p:
i=1

The exponent controls the level of discounting for observation i and is given by the
O Since 0 1 hi 1 and i 0, it
ratio between hi and the average of the hi s, h.
i
follows that 0 (1 hi ) 1. Hence, the ith squared residual will be more strongly
O This linear discounting is truncated at 4, which
inEated when hi is large relative to h.
amounts to twice the degree of discounting used by the HC3 estimator, so that i = 4
when hi 4hO = 4p=n.

4. Bootstrap tests

An alternative approach is to use the HC0 estimator to construct the quasi-t statistic,
and then bootstrap this quantity, which is known to be asymptotically pivotal, i.e., its
null asymptotic distribution is free of unknown parameters. The bootstrapping scheme
is performed imposing the restriction under test, the test statistic being computed in
each bootstrap replication. At the end of the bootstrapping resampling scheme, we
obtain either a critical value for the test (to be used as a replacement to the asymptotic
critical value obtained from a standard normal distribution) or a bootstrap p-value. The
bootstrap test can be carried out as follows. At the outset, compute the quasi-t statistic,
say . Then:

1. For each i, i = 1; : : : ; n, draw a random number ti from a population that has mean
zero and variance one.
2. Construct a bootstrap sample (y ; X ), where yi = Xi  + ti u i =(1 hi ). Here, 
and u are the restricted parameter estimates and the associated restricted least-squares
residuals of y on X .
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 221

3. Compute the OLSE of ,  = (X  X )1 X  y , and compute the associated quasi-t


test statistic,  .
4. Repeat steps 1 to 3 a large number (say, B) of times.
5. Compute the quantile of interest of the empirical distribution of the B+1 realizations
of the test statistic.
6. Perform the test using the quasi-t statistic computed initially () together with the
bootstrap critical value obtained in step 5 above.

Note that in the bootstrap test we do not rely on critical values from the asymptotic
null distribution of the test statistic, i.e., we do not rely on normal critical values. We
use instead critical values obtained from the bootstrapping scheme.
The decision rule can be more conveniently expressed using the p-value of the test
and its bootstrap estimate. The approximated p-value obtained from the bootstrapping
scheme, for a two-sided test, is given by
1 + #{|b | ||}
p= ;
B+1
where b , b = 1; : : : ; B, are the bootstrap realizations of the test statistic. We reject the
null hypothesis when the bootstrap p-value is smaller than the selected nominal size
of the test.
It is possible to obtain a more accurate bootstrap p-value using the double bootstrap,
which is, however, more computer-intensive. The basic idea is to perform a second level
of bootstrap resampling for each original bootstrap replication. 5 Let 1 ; : : : ; B denote
the B bootstrap realizations of the test statistic. We can devise the following double
bootstrapping scheme, where C denotes the number of bootstrap replications in the
second level of bootstrapping, and b = 1; : : : ; B indexes the -rst level of bootstrapping:

1. For each i, i = 1; : : : ; n, draw a random number ti from a population that has mean
zero and variance one.
2. Construct a bootstrap sample (y ; X ), where yi =Xi  +ti ui =(1hi ). Here,  and
u are the restricted parameter estimates and the associated restricted least-squares
residuals from the regression of y on X .
3. Compute the OLSE of ,  = (X  X )1 X  y , and the associated quasi-t statistic,
 .
4. Compute pb using (1); see below.
5. Use the realizations from the two levels of bootstrapping to obtain an adjusted
p-value for the test (see below).

The steps 1 4 described above must be performed for each outer bootstrap replication
(b = 1; : : : ; B). The bootstrap adjusted p-value is then given by
1 + #{pb 6 p}
padj = ;
B+1

5 See, e.g., Davison and Hinkley (1997, Section 4.5).


222 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

where, for each b,


1 + #{|
bc | |b |}
pb = ; (1)
C +1
c=1; : : : ; C. We reject the null hypothesis, tested against a two-sided alternative hypoth-
esis, if padj 6 , where  is the nominal level of the test. Note that the total number
of bootstrap replications is now B C, thus implying a heavier computational burden.

5. Numerical evaluation

The numerical results reported in this section are obtained using the model

yi = 1 + 2 xi + i ui ; i = 1; : : : ; n:

The sample sizes used were n = 50; 100; 150. When n = 50, the values of xi were ob-
tained as independent random draws from a lognormal distribution. These observations
were each replicated twice and three times when n = 100 and 150, respectively. By
constructing larger samples this way we make sure that the degree of heteroskedastic-
ity remains unchanged as the sample size increases. The errors, ui s, are independent
and identically distributed according to a N(0; 1) distribution. Data generation was
performed using 1 = 1 and 2 = 0. Under homoskedasticity, i = 1 for all i. On the
other hand, to obtain data from a heteroskedastic generating mechanism we used

i2 = exp{ xi + xi2 }

with di9erent values of . The degree of heteroskedasticity can be measured using


! = (max i2 )=(mini2 ); that is, under homoskedasticity ! = 1, otherwise ! 1. In the
bootstrapping schemes, the ti s and ti s were drawn from the corresponding set nor-
malized residuals; i.e., from u (weighted bootstrap estimator), u (single bootstrap test)
and u (double bootstrap test) after these residuals were normalized to have mean zero
and variance one. The number of Monte Carlo replications was set at 5000, and the
number of bootstrap replications for the -rst and second levels of bootstrapping were
999 and 249, respectively. Each experiment thus entails a total of nearly 1.25 billion
replications, thus indicating that the simulations are very computer-intensive. All exper-
iments were programmed using the C programming language (Cribari-Neto, 1999) and
compiled using the gcc compiler (Stallman, 1999) under the Linux operating system
(MacKinnon, 1999).
Tables 1 and 2 present, respectively,
the total relative biases and the square root of
the total mean squared error ( 5000) for each estimator. The following estimators
were considered: the ordinary least-squares estimator (OLS), the HC0, HC3 and HC4
estimators, and the weighted bootstrap estimator (wboot). We de-ne total relative bias
as the sum of the absolute values of the individual relative biases of the estimated
variances of 1 and 2 . The individual relative bias is de-ned as the di9erence between
the mean of all variance estimates and the true variance divided by the true variance.
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 223

Table 1
Total relative bias

n ! OLS HC0 HC3 HC4 wboot

50 0.00 1.0000 0.0042 0.1927 0.2149 0.6643 0.2132


0.04 4.5672 0.6338 0.3059 0.2618 1.0366 0.2597
0.08 20.8593 1.0613 0.4481 0.2841 1.4003 0.2805
0.12 95.2686 1.3205 0.5493 0.2905 1.6369 0.2909

100 0.00 1.0000 0.0002 0.0969 0.0982 0.2581 0.0954


0.04 4.5672 0.6243 0.1808 0.0877 0.3643 0.0871
0.08 20.8593 1.0215 0.2358 0.1223 0.5388 0.1141
0.12 95.2686 1.2711 0.2848 0.1328 0.6449 0.1364

150 0.00 1.0000 0.0019 0.0627 0.0655 0.1610 0.0624


0.04 4.5672 0.6080 0.1012 0.0801 0.2521 0.0762
0.08 20.8593 1.0047 0.1453 0.0950 0.3528 0.0929
0.12 95.2686 1.2481 0.1831 0.0961 0.4114 0.0923

Table 2
Total RMSE ( 5000)

n ! OLS HC0 HC3 HC4 wboot

50 0.00 1.0000 0.6754 0.8885 1.1228 1.6026 1.1354


0.04 4.5672 2.0667 2.3119 3.4669 6.0827 3.4750
0.08 20.8593 9.2075 8.8181 13.6044 24.7257 13.6726
0.12 95.2686 41.3048 38.8268 60.2640 109.5318 60.7768

100 0.00 1.0000 0.2364 0.3298 0.3714 0.4335 0.3793


0.04 4.5672 0.9212 0.8536 1.0181 1.3365 1.0314
0.08 20.8593 4.3146 3.4525 4.2550 5.7482 4.2391
0.12 95.2686 19.6086 15.4746 19.1685 25.8815 19.7064

150 0.00 1.0000 0.1312 0.1832 0.1979 0.2176 0.2047


0.04 4.5672 0.5771 0.4820 0.5489 0.6615 0.5560
0.08 20.8593 2.7903 1.9507 2.2507 2.7636 2.2804
0.12 95.2686 12.7185 8.6590 9.9923 12.2551 10.1757

The total relative bias thus yields an aggregate measure for the biases of two variance
estimates. These results are displayed in Table 1. Table 2 reports the square roots of
the sum of the two individual mean squared errors (RMSE). These quantities take into
account not only bias, but also the variances of the di9erent estimators.
The -gures in Table 1 reveal, as expected, that the least-squares variance estimator
is unbiased when all errors share the same variance, but is considerably biased other-
wise. The HCCMEs that display the smallest biases overall are the HC3 and weighted
bootstrap estimators, the HC4 estimator being the most biased one. For instance, when
n = 100 and = 0:12 (which results in ! = 95:27), the total relative biases of the
224 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

Table 3
Estimated null rejection rates of quasi-t tests,  = 5%

n ! OLS HC0 HC3 HC4 wboot single boot double boot

50 0.00 1.00 5.94 9.68 6.86 4.70 6.94 6.48 5.04


0.04 4.57 20.40 13.74 8.88 5.88 8.84 8.54 6.26
0.08 20.86 41.70 16.58 10.16 6.18 9.94 9.42 7.58
0.12 95.27 55.46 18.26 10.70 5.66 10.82 8.48 6.98

100 0.00 1.00 5.40 7.54 5.76 4.82 5.90 5.88 4.56
0.04 4.57 19.84 9.46 6.94 5.10 7.04 6.80 5.14
0.08 20.86 39.42 10.18 7.36 5.26 7.56 7.26 5.80
0.12 95.27 50.00 11.18 7.96 5.56 8.04 7.88 7.14

150 0.00 1.00 5.38 6.82 5.76 4.96 5.78 5.70 4.78
0.04 4.57 20.06 7.82 6.46 5.04 6.40 6.06 4.68
0.08 20.86 38.72 8.90 7.08 5.64 7.18 7.10 5.74
0.12 95.27 50.60 9.28 6.86 5.18 6.96 6.98 6.54

HC0, HC3, HC4 and weighted bootstrap estimators are, respectively, 28.48%, 13.28%,
64.49% and 13.64%. Hence, as far as bias goes, the HC3 and weighted bootstrap
estimators are clearly superior to the other HCCMEs.
The total rootmean squared errors of the di9erent variance estimators are presented
in Table 2 ( 5000). We note that the HC0 estimator is the consistent estimator
with smallest total root mean squared error, the HC4 estimator being the one with
poorest performance. The HC3 and weighted bootstrap estimators again have similar
-nite-sample behavior.
Table 3 contains the estimated null rejection rates (in percentages) of the quasi-t
tests that use variance estimates from the estimators considered here, and also the null
rejection rates of the two bootstrap tests, i.e., the bootstrap test (single boot) and the
double bootstrap test (double boot). The interest lies in testing the null hypothesis H0 :
j = j(0) ; j = 1; : : : ; p, where j(0) is a given constant, against a two-sided alternative
hypothesis. The test statistic can be written as

j j(0)
= ;
j )
var(

j ) denotes the estimated variance of j obtained from a HCCME. Under


where var(
the null hypothesis, the test statistic has a limiting N(0; 1) distribution and the test
is usually performed using a critical value from this limiting distribution, which is
compared to the (absolute value of) test statistic. In our numerical exercise, we test
H0 : 2 = 0 against H1 : 2 = 0. That is, we test the exclusion of the covariate x.
The double bootstrapping scheme was performed using B = 999 and C = 249. The null
rejection rates of the di9erent tests (expressed as percentages) corresponding to the
nominal level  = 5% are presented in Table 3.
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 225

The -gures in Table 3 convey important information. First, the test that uses the
least-squares variance estimator is largely liberal when we no longer have homoskedas-
tic errors. Secondly, the test whose test statistic uses the HC0 estimator is liberal; the
stronger the degree of heteroskedasticity (measured by !), the more liberal the test. For
instance, when n = 100 and = 0:12, the HC0-based quasi-t test (incorrectly) rejects
the null hypothesis over 11% of the time, that is, over twice what would be expected
based on the nominal level of the test. Thirdly, the quasi-t tests that employ the HC3
and weighted bootstrap estimators prove to be liberal, thus rejecting the null hypothesis
more often than desired. These tests are, however, less liberal than the test based on the
HC0 estimator; e.g., when n = 100 and = 0:12, their null rejection rates were approx-
imately 8%. Fourthly, the bootstrap test based on the one-level bootstrapping scheme
is also liberal, slightly outperforming the test that uses the weighted bootstrap estimate
in the denominator of the test statistic and the asymptotic normal critical value. The
gain from bootstrapping an asymptotically pivotal quantity is thus negligible. Fifth, the
test based on the double bootstrap does achieve an improvement relative to the single
bootstrap test. Its empirical size is closer to the nominal level of the test than that
of the single level bootstrap test. For example, when n = 100 and = 0:08, the sizes
of the single and double bootstrap tests are, respectively, 7.26% and 5.50%. The gain
from introducing a second level of bootstrapping seems worth the extra computational
burden. Finally, the test that is based on the test statistic constructed using the HC4
estimator is reliable. It is noteworthy that the HC4 estimator had poor -nite-sample
behavior when the criteria were bias and root mean squared error, and yet it delivers
reliable associated inference. Indeed, the -nite-sample performance of the HC4-based
test is superior to that of the double bootstrap test, without requiring the computa-
tional burden associated with nested bootstrapping resampling schemes. Overall, the
HC4-based test has the best -nite-sample performance of all tests considered. In short,
the tests that proved to be reliable are the double bootstrap test (described in Section 4)
and the test that uses the HC4 estimator (proposed in Section 3), the latter displaying
superior behavior.
The design matrix X used in the numerical exercise described above contains points
of high leverage. Indeed, three of the -fty base observations have hi s in excess of
3p=n = 0:12. As already mentioned, Cribari-Neto and Zarkos (2001) argue that the
existence such points is more decisive for the -nite-sample inference based on quasi-t
tests than the degree of heteroskedasticity itself. These tests tend to be liberal when the
design matrix includes observations with high leverage. The results in Table 3 suggest
that when such points do exist, inference should be based on the HC4-based test or
on a double bootstrap test.
In order to examine the e9ect of high leverage observations on the di9erent in-
ference strategies considered above, the three observations whose leverage measures
exceed 3p=n = 0:12 were removed from the sample, and replaced by three new ob-
servations which were independently drawn from the same distribution as the original
ones. We then checked whether the new full sample had any leverage points. If so,
these were replaced in similar fashion. The process only stopped when there was no
observation in the design matrix with high leverage. Then, another simulation using
this new set of values for the covariate was conducted. The values of were altered
226 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

Table 4
Total relative bias, no leverage points

n ! OLS HC0 HC3 HC4 wboot

50 0.00 1.0000 0.0067 0.1123 0.1249 0.0873 0.1261


0.69 4.6028 0.3217 0.1395 0.1348 0.1285 0.1344
1.38 21.1860 0.4670 0.1632 0.1551 0.1860 0.1514
2.07 97.5155 0.7215 0.2403 0.0973 0.1542 0.0935

100 0.00 1.0000 0.0002 0.0599 0.0558 0.0365 0.0535


0.69 4.6028 0.3214 0.0714 0.0638 0.0599 0.0621
1.38 21.1860 0.4695 0.0835 0.0730 0.0866 0.0686
2.07 97.5155 0.6934 0.1060 0.0656 0.0934 0.0646

150 0.00 1.0000 0.0024 0.0425 0.0342 0.0213 0.0307


0.69 4.6028 0.3231 0.0396 0.0505 0.0478 0.0501
1.38 21.1860 0.4694 0.0713 0.0314 0.0397 0.0288
2.07 97.5155 0.6840 0.0763 0.0371 0.0545 0.0347

Table 5
Total RMSE ( 5000), no leverage points

n ! OLS HC0 HC3 HC4 wboot

50 0.00 1.0000 5.8929 8.0601 9.3283 9.2717 9.4734


0.69 4.6028 20.9618 25.7558 30.6202 31.6511 30.9322
1.38 21.1860 108.6154 112.6944 136.6601 144.3242 137.1067
2.07 97.5155 540.9129 502.7288 603.9850 643.8569 604.8852

100 0.00 1.0000 2.0456 2.8193 3.0191 3.0051 3.0870


0.69 4.6028 9.2199 9.3145 10.1560 10.3258 10.3168
1.38 21.1860 50.6579 40.6335 44.6933 45.9510 45.0005
2.07 97.5155 256.4926 191.4280 210.7051 217.5917 214.6442

150 0.00 1.0000 1.1426 1.5893 1.6606 1.6552 1.7180


0.69 4.6028 5.7616 5.0737 5.3944 5.4508 5.5021
1.38 21.1860 32.9516 22.0907 23.3901 23.8067 23.6070
2.07 97.5155 166.6269 102.1647 108.7302 111.0885 109.5909

so that the resulting degrees of heteroskedasticity were similar to those in the previ-
ous exercise.
The results corresponding to total relative bias, total root mean squared
error ( 5000), and null rejection rates (in percentages, for  = 5%) are presented in
Tables 46. These results should be contrasted to the results in Tables 1, 2 and 3,
respectively.
The results displayed in Table 4 show that the total relative biases of all estimators
are considerably smaller than in the case where the data contained points of high
leverage (Table 1). For example, when n = 50 and ! = 21:19, the total relative bias
of the HC0 estimator is approximately 16%, whereas this quantity was over 44% in
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 227

Table 6
Estimated null rejection rates of quasi-t tests,  = 5%, no leverage points

n ! OLS HC0 HC3 HC4 boot boot test double boot

50 0.00 1.00 5.14 5.96 4.84 4.88 4.82 4.76 4.02


0.69 4.60 9.00 7.64 5.70 5.78 5.74 5.76 4.72
1.38 21.19 14.70 8.10 6.16 6.00 6.28 6.30 5.38
2.07 97.52 22.52 9.32 6.78 6.68 6.90 6.82 6.62

100 0.00 1.00 5.10 5.98 5.26 5.36 5.16 5.40 4.54
0.69 4.60 8.56 6.40 5.52 5.58 5.56 5.52 4.58
1.38 21.19 14.90 6.98 5.82 5.74 5.96 6.16 5.28
2.07 97.52 21.54 8.30 7.00 6.88 7.10 7.52 7.00

150 0.00 1.00 5.38 6.00 5.42 5.50 5.56 5.54 5.02
0.69 4.60 9.24 6.06 5.54 5.56 5.60 5.48 4.74
1.38 21.19 15.20 6.42 5.86 5.80 5.76 6.02 5.32
2.07 97.52 20.30 6.40 5.88 5.78 6.00 6.40 5.84

the regression design with leverage points. It is noteworthy that the HC4 estimator
is now less biased than the HC3 estimator under homoskedasticity (! = 1). The same
happens when !=4:60. The total root mean squared errors, however, increased when the
high leverage points were replaced by non-inEuential observations (Table 5 compared
to Table 2). This occurs because the observations with high leverage tend to act as
attractors, bringing the regression line close to them, and thus inducing low variability.
Table 6 presents the estimated sizes of the di9erent tests, now in a setting where there
are no high leverage data points. We note that the size distortions of the tests are
smaller than those in Table 3. In particular, the test that employs the HC0 estimator in
the denominator of the test statistic now proves to be more reliable. It is also important
to note that: (1) The test based on the HC3 estimator has again -nite-sample behavior
similar to that of the test based on the weighted bootstrap estimator; (2) The HC4
estimator once again yields associated tests that are more reliable than those based on
the HC3 estimator; (3) Overall, the double bootstrap test is the most reliable test; only
in the extreme case where the maximum variance is nearly 100 times greater than the
smallest one, it becomes noticeably (yet not considerably) oversized.
Overall, the results from this second numerical experiment, relative to the results
from the previous exercise, show that high leverage points in the design matrix tend
to introduce size distortions in quasi-t tests, these tests becoming considerably liberal.
This e9ect can lead, for instance, investigators to spuriously conclude that some inde-
pendent variables are signi-cant at the usual nominal levels. The results from the two
experiments together favor inference based on the HC4 estimator and also inference
based on the double bootstrap test when it comes to performing quasi-t tests, as is
commonly done in regression models with heteroskedasticity of unknown form.
Next, we examine the e9ect of non-normal errors on the -nite-sample null behavior
of quasi-t tests based on di9erent variance estimators. We consider two error distribu-
tions, namely: t3 and exponential with unit mean; the former has fat tails and the latter
228 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

Table 7
Estimated null rejection rates of quasi-t tests,  = 5%, leverage points and non-normal errors

n ! OLS HC0 HC3 HC4 boot boot test double boot

t3 distributed errors
50 0.00 1.00 6.00 8.24 4.88 3.18 4.96 5.86 5.28
0.12 95.27 52.70 15.88 8.12 4.24 8.28 7.00 6.00
100 0.00 1.00 5.14 5.86 4.36 3.60 4.44 5.44 4.56
0.12 95.27 48.62 9.20 6.30 4.24 6.34 6.24 5.58
150 0.00 1.00 6.16 6.18 5.12 4.32 5.20 6.18 5.20
0.12 95.27 50.22 8.22 6.02 4.28 6.14 7.10 6.70

Exponentially distributed errors


50 0.00 1.00 4.92 11.32 8.20 6.84 8.38 5.58 3.20
0.12 95.27 56.88 23.58 16.72 9.92 16.64 14.98 13.44
100 0.00 1.00 5.52 8.36 7.06 6.22 7.14 6.36 4.46
0.12 95.27 52.94 15.18 12.22 10.08 12.32 12.06 11.32
150 0.00 1.00 4.54 7.26 6.48 5.78 6.42 6.30 4.80
0.12 95.27 51.24 12.98 11.16 9.72 11.26 11.20 10.90

is highly asymmetric. The null rejection rates for the di9erent quasi-t tests under these
error distributions are presented in Table 7. The results are for homoskedasticity and
strong heteroskedasticity, and should be contrasted to those displayed in Table 3. Two
interesting conclusions emerge from these results. First, the -nite-sample behavior of
the tests under t3 errors is, overall, slightly better than under normal errors. Second,
when the errors are highly asymmetric the tests display much larger size distortions
when heteroskedasticity is strong. For example, when n = 50 and ! = 95:27, the null re-
jection rates for the quasi-t tests based on the HC0, HC3 and HC4 estimators at the 5%
nominal level are, respectively, 23.58%, 16.72% and 9.92%. The double bootstrap test
yields null rejection rate equal to 13.44%. That is, the empirical size of the HC0-based
test is nearly -ve times larger than the asymptotic level of the test, the HC3-based
test rejects the null hypothesis over three times as often as one would expect, the size
distortion of the HC4-based test equals 4.92%, and the size distortion of the double
bootstrap test equals 8.44%. The test based on the HC4 estimator is, overall, the one
with smallest size distortions. But it is important to bear in mind that inference under
asymmetric errors and strong heteroskedasticity can be imprecise even with moderately
large sample sizes.

6. An empirical application

The variable of interest (y) is per capita spending on public schools and the inde-
pendent variables, x and x2 , are per capita income by state in 1979 in the United States
and its square; income is scaled by 104 . We have dropped Wisconsin from the data
set since it had missing data, and included Washington, DC. The data are presented in
Greene (1997, Table 12.1, p. 541) and their original source is the US Department of
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 229

Table 8
Quasi-t inference, p-values

With Alaska, n = 50 Without Alaska, n = 49

Test p-value Test p-value

OLS 0.0022 OLS 0.6495


HC0 0.0559 HC0 0.6162
HC3 0.4264 HC3 0.7758
HC4 0.7725 HC4 0.8923
wboot 0.4492 wboot 0.7780
single boot 0.6680 single boot 0.8250
double boot 0.7250 double boot 0.8600
Note: The null hypothesis under test is H0 : 3 = 0.

Commerce. The regression model is thus


yi = 1 + 2 xi + 3 xi2 + ui ; i = 1; : : : ; 50:
We obtain the following ordinary least-squares estimates for the linear parameters:
1 = 832:914, 2 = 1834:203 and 3 = 1587:042. The BreuschPaganGodfrey test for
the null hypothesis of homoskedasticity rejects this hypothesis at the nominal level of
1%, thus indicating that there is heteroskedasticity in the data.
The interest lies in testing H0 : 3 = 0 vs. H1 : 3 = 0, i.e., we wish to test a linear
speci-cation against a quadratic one. Table 8 presents the di9erent p-values for this
test. Again, we used B = 999 and C = 249 in the bootstrapping schemes.
We see (Table 8) that the test based on the least-squares variance estimator rejects
the null hypothesis at the 1% nominal level. The same is true for the HC0-based quasi-t
test, which rejects the null hypothesis at  = 10%. It turns out that a scatterplot shows
a satisfactorily linear scatter except for a single high-leverage point, corresponding
to Alaska, with hi = 0:651, when 3p=n = 0:180. The data, along with the estimated
speci-cations, are presented in Fig. 1. If we remove Alaska from the data and perform
the inference again, then no test rejects the null hypothesis at the usual signi-cance
levels (Table 8). One atypical observation (Alaska) is driving the rejection of the
null hypothesis of a linear speci-cation when inference is performed using the HC0
standard error estimate to construct the test statistic. The quasi-t test based on the
HC4 estimator has the largest p-value of all tests, regardless of whether Alaska is in
the data. It is thus the test with the smallest amount of evidence against H0 . The
p-values of the HC4-based tests are, in both situations, close to those of the double
bootstrap test. Note also that the p-values of the double bootstrap tests are, in both
cases, larger than the corresponding p-values of the single bootstrap tests: the second
level bootstrap correction seems to be working in the correct direction, i.e., reducing
the amount of information against the null hypothesis of a linear relationship between
per capita spending on public schools and per capita income.
In order to examine the impact the observation corresponding to the state of Alaska
(observation 2) has on the resulting inference, we have estimated the model 50 times,
230 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

800
Alaska

per capita spending on public shools

700
600
500
400
300

0.6 0.7 0.8 0.9 1.0 1.1


per capita income

Fig. 1. Per capita spending on public schools and per capita income.

each time leaving one observation out. The resulting parameter estimates are presented
in Table 9. The large impact that this observation (observation 2) has on the estimates is
evident. When it is not in the sample, the estimate of 3 becomes negative (314:139).
In the other cases, the estimates range from 1526.776 to 2113.17, averaging 1603.681.
This reveals that the relationship between y and x is linear, and that the rejection of
the null hypothesis that 3 equals zero by the test that uses the HC0 estimator is being
driven by a single observation. The inference derived from the other consistent tests,
on the other hand, are not dominated by a single observation and point to a linear
relationship between the two variables.

7. Discussion and concluding remarks

It is common for regression models of cross sectional data to display some form
of heteroskedasticity. It is common practice to still report ordinary least-squares esti-
mates of the linear parameters, but to perform inference using heteroskedasticity-robust
standard errors. The usual covariance matrix estimator, 2 (X  X )1 , is no longer con-
sistent, and should not be used when heteroskedasticity is suspected. A commonly
used heteroskedasticity-consistent covariance matrix estimator was proposed by Halbert
White in 1980. Our simulation results, however, show that this estimator can deliver
liberal associated quasi-t tests, especially when the design matrix contains observations
with high leverage. The HC3 estimator (see Davidson and MacKinnon, 1993; Long
and Ervin, 2000) has been mentioned in the literature as a reliable alternative to the
White estimator when it comes to performing inference via quasi-t tests. Long and
Ervin (2000) even claim that this estimator should be preferred when n 6 250, and
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 231

Table 9
Leave-one-out estimates
Obs. 1 2 3
1 870.356 1920:294 1635.783
2 209:034 1000.534 314:139
3 831.140 1829:240 1583.799
4 873.876 1929:117 1641.177
5 808.219 1782:417 1564.448
6 881.051 1956:335 1660.508
7 856.050 1879:980 1604.295
8 829.843 1827:056 1583.229
9 1094.365 2580:285 2113.170
10 815.792 1787:849 1557.467
11 850.940 1869:162 1603.390
12 828.203 1822:768 1580.537
13 835.901 1838:716 1588.358
14 830.361 1828:527 1584.248
15 823.128 1808:506 1571.053
16 871.455 1933:316 1647.514
17 814.980 1788:700 1559.671
18 859.074 1888:543 1614.704
19 832.348 1833:046 1586.464
20 813.662 1790:499 1562.605
21 847.917 1870:989 1608.263
22 861.173 1905:438 1629.582
23 864.225 1909:478 1629.284
24 902.230 2012:966 1696.420
25 954.456 2124:937 1757.911
26 806.920 1760:717 1538.670
27 844.085 1878:860 1622.080
28 811.747 1775:305 1548.696
29 809.698 1798:300 1584.117
30 813.273 1777:132 1548.792
31 822.243 1811:343 1576.591
32 781.409 1721:918 1526.776
33 863.572 1909:745 1630.905
34 822.396 1812:254 1575.810
35 814.681 1784:563 1555.235
36 802.973 1756:683 1539.429
37 832.957 1833:495 1586.222
38 850.278 1879:024 1614.490
39 862.737 1912:144 1635.334
40 828.091 1821:197 1578.770
41 822.521 1810:289 1573.495
42 832.894 1834:176 1587.038
43 863.008 1900:141 1622.620
44 805.052 1761:049 1541.529
45 784.346 1733:740 1536.150
46 808.079 1780:752 1558.741
47 832.352 1832:710 1586.105
48 832.173 1832:423 1586.046
49 825.478 1817:755 1578.082
50 836.025 1837:164 1584.778
232 F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233

encourage developers of statistical software to implement the HC3 estimator in their


software. Our results show that the -nite-sample behavior of this estimator is similar to
that of the weighted bootstrap estimator. They both yield liberal associated tests when
some of the covariate points have high leverage. We have proposed a modi-ed form of
the HC3 estimator, and have called the resulting estimator HC4. The numerical results
show that this new estimator has -nite-sample behavior superior to that of the HC3
estimator, when it comes to inference, and the quasi-t tests that employ the new esti-
mator slightly outperform double bootstrap tests. The test based on the HC4 estimator,
however, is much less demanding computationally than the bootstrap test that uses a
double, nested bootstrapping scheme.
Long and Ervin (2000, p. 218) write: researchers and software vendors are unaware
of, or unconvinced by, the limited evidence regarding the small sample performance
of HC0. They add: software vendors need to make simple changes in their software
that could result in substantial improvements in the application of the linear regression
model. We agree with their remarks. They favor the HC3 estimator over alternative
forms of the HC0 estimator. Our results show however that even the HC3 estimator
can be ill-behaved when the data contain possibly inEuential observations. The HC4
estimator we propose, on the other hand, behaves well both under unbalanced and
balanced regression designs. Indeed, the numerical results have shown that inference
based on HC4 is even more reliable than that performed via a double bootstrap test,
where the second level of bootstrapping is performed to adjust the bootstrap p-value
from the outer bootstrap. The HC4 estimator we propose, however, is suited for in-
ference via quasi-t tests, but not for point estimation, since it is typically quite biased
in -nite samples. When used to construct quasi-t test statistics, however, it delivers
reliable inference.
There are several issues to be addressed in future research. For instance, a referee
suggested, as an alternative to asymptotic inference and the bootstrap, the use of per-
mutation tests, as in, e.g., OGorman (2001). Future research should also focus on
performing further Monte Carlo simulation to evaluate the inference approaches con-
sidered in the present paper under a wider range of data generating processes.

Acknowledgements

I wish to thank James MacKinnon, Spyros Zarkos and two anonymous referees for
comments and suggestions. I also gratefully acknowledge partial -nancial support from
CNPq.

References

Chesher, A., Jewitt, I., 1987. The bias of a heteroskedasticity consistent covariance matrix estimator.
Econometrica 55, 12171222.
Cribari-Neto, F., 1999. C for econometricians. Comput. Econom. 14, 135149.
Cribari-Neto, F., Zarkos, S.G., 1999. Bootstrap methods for heteroskedastic regression models: evidence on
estimation and testing. Econom. Rev. 18, 211228.
F. Cribari-Neto / Computational Statistics & Data Analysis 45 (2004) 215 233 233

Cribari-Neto, F., Zarkos, S.G., 2001. Heteroskedasticity-consistent covariance matrix estimation: Whites
estimator and the bootstrap. J. Statist. Comput. Simulation 68, 391411.
Cribari-Neto, F., Ferrari, S.L.P., Cordeiro, G.M., 2000. Improved heteroscedasticity-consistent covariance
matrix estimators. Biometrika 87, 907918.
Davidson, R., MacKinnon, J.G., 1993. Estimation and Inference in Econometrics. Oxford University Press,
New York.
Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and their Application. Cambridge University Press,
New York.
Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Statist. 7, 126.
Efron, B., Tibshirani, R.J., 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.
Eicker, F., 1963. Asymptotic normality and consistency of the last squares estimators for families of linear
regressions. Ann. Math. Statist. 34, 447456.
Greene, W.H., 1997. Econometric Analysis, 3rd Edition. Prentice-Hall, Upper Saddle River, NJ.
Hoaglin, D.C., Welsch, R.E., 1978. The hat matrix in regression and ANOVA. Amer. Statist. 32, 1722.
Judge, G.C., Hill, R.C., GriXths, W.E., Lutkepohl, H., Lee, T.-C., 1988. Introduction to the Theory and
Practice of Econometrics, 2nd Edition. Wiley, New York.
Long, J.S., Ervin, L.H., 2000. Using heteroskedasticity-consistent standard errors in the linear regression
model. Amer. Statist. 54, 217224.
MacKinnon, J.G., 1999. The Linux operating system: Debian GNU/Linux. J. Appl. Econom. 14, 443452.
MacKinnon, J.G., White, H., 1985. Some heteroskedasticityconsistent covariance matrix estimators with
improved -nite sample properties. J. Econom. 29, 305325.
OGorman, T.W., 2001. An adaptative permutation test procedure for several common tests of signi-cance.
Comput. Statist. Data Anal. 35, 335350.
Stallman, R.M., 1999. Using and Porting the GNU Compiler Collection. The Free Software Foundation,
Boston.
White, H., 1980. A heteroskedasticity-consistent covariance matrix and a direct test for heteroskedasticity.
Econometrica 48, 817838.
Wooldridge, J.M., 2000. Introductory Econometrics: A Modern Approach. South-Western College Publishing,
Cincinnati.
Wu, C.F.J., 1986. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Statist. 14,
12611295.

Das könnte Ihnen auch gefallen