Sie sind auf Seite 1von 21

A

LECTURE6_GLS
HETEROSKEDASTICITY GLS (WLS) ESTIMATORS WHITE CORRECTION

Maria Elena Bontempi mariaelena.bontempi@unibo.it


Roberto Golinelli roberto.golinelli@unibo.it
07/11/2012
Preliminary; comments welcome
1. Introduction
The main assumption about the classical regression model errors is that they are identically and
independently distributed with mean equal to zero, in symbols: ~ iid(0, 2).
The E() = 0 assumption is perfectly represented by OLS residuals that always sum to zero by
definition, provided that models specification includes the intercept.
The assumption of independently distributed errors (errors belonging to different observation are
not related each other) is not easily checked in cross-sections, given that there is not an obvious
way in which cross-section observations have to be ordered (listed). In this context, an appropriate
sampling design (random sampling) may prevent the insurgence of the problem. On the other side,
the assessment of errors being independently distributed is crucial in time series.
The assumption of identically distributed errors usually is no longer valid in cross-section data,
characterised by relevant variability. Errors heteroskedasticity is the most common problem: often
the errors variance seems not constant over different observations. In this case, the assumption of
identically distributed errors is no longer valid.
If iid assumption is valid we have that:
On average the regression
line is correct
Homoskedasticity
Non cross-correlation

E ( i ) = 0, i = 1,..., N

E ( i2

| X ) = Var ( i | X ) = , i = 1,..., N
2

E( i j | X ) = Cov( i , j | X ) = 0 ,i j

identically
distributed

iid (0,

IN )

independently
distributed

where 2IN is the VCOV matrix of errors, equal to


2
L
0
1 L 0

0 .
E() = 2 I N = 2 0 O 0 = 0 O
0 L 1 0 L 2

In other terms, the VCOV matrix is a scalar matrix, i.e. a diagonal matrix whose diagonal elements
are all equal.
We can compute the variance of the estimator by (exogeneity assumed):

Var ( | X ) = Var ( X X ) 1 X + | X = Var ( X X ) 1 X | X = ( X X ) 1 X Var ( | X ) X ( X X ) 1 =


N

= 2 ( X X ) 1 = 2 X i X i '
i =1

where Xi is the (K1) vector of explanatory variables for observation i.


In cross-section (and panel data) the homoskedasticity assumption is rarely satisfied.
For example, in cross-sectional data, it is hard to suppose that the consumption variability around
its mean is constant independently from the income level. Instead, rich person could have
variegated interests, tastes, and consumption opportunities: this makes consumption variance be
higher at high income levels.
Non spherical errors can be characterized by heteroskedasticity i.e. the errors variance is not
constant over different observations:
Var ( i | X ) = i2 = 2ih .
In matrix notation, we can write:
12 L 0
1h L 0

Var(|X) =E(|X)= Diag ( i2 ) = 0 O 0 = 2 Diag ( ih ) = 2 0 O 0 = 2 .


0 L N2
0 L Nh

where is a positive definite matrix, not necessarily scalar. Hence, it may be necessary to estimate
N additional parameters (the parameters along the main diagonal).
In presence of heteroskedasticity OLS is unbiased (unbiasedness is based on linearity and
exogeneity) but not efficient:
Var ( | X ) = E (( )( )) = ( X X ) 1 X Var ( | X ) X ( X X ) 1 =
= ( X X ) 1 X 2 X ( X X ) 1 = ( X X ) 1 X Diag ( i2 ) X ( X X ) 1 2 ( X X ) 1

In particular, the variance of is higher than 2 ( X X ) 1 (homoskedastic case) by the positive


definite matrix ( X X ) 1 X X .
Moreover, the MSE, s2, is a biased estimator of 2:

1
1

M
tr ( M )
E(s 2 ) = E
=
tr [E ( M )] =
tr[ ME ( )] =
= E
= E

N K
N K N K
N K
N K
.
2
1
2
2
=
tr ( M ) =
tr ( M)
N K
NK
where M = I -PX = I - X(X'X)-1X' is the matrix projecting Y upon the space orthogonal to the one
spanned by the columns of X:
= Y - Y =Y - X = Y - X (X'X)-1X'Y = (I-PX)Y= MXY
The matrix M is symmetric (M' = M), idempotent (M M = M), with rank(M) = tr(M) = N-K.
Hence, the variance of OLS is biased because the weighing matrix is no more ( X X ) 1 and because
s2 is a biased estimator of 2.
As a consequence, inference (test t and F) is not correct: statistic-test do not have the standard
distribution; standard confidence regions are no more valid.
2

Consider the following example.


use GLS_data, clear
descr
Contains data
obs:
100
vars:
3
16 Nov 2004 18:25
size:
1,300 (99.9% of memory free)
------------------------------------------------------------------------------storage display
value
variable name
type
format
label
variable label
------------------------------------------------------------------------------obs
byte
%8.0g
families
cons1000
float %9.0g
consumption in 2003 at constant prices
redd1000
float %9.0g
income in 2002 at constant prices
-------------------------------------------------------------------------------

The idea of explaining the consumption with the income in the previous year predetermines the
dynamic relationship in a quite restrictive way but at the advantage of avoiding consumptionincome simultaneity and endogeneity problems.
Scatterplot tells us that the consumption variability growths with the level of income: richer people
behave in different ways. This fact per se implies the likely heteroschedasticity of the linear model
residuals.
. graph7 cons1000 redd1000, ylabel xlabel

CONS1000

100

50

0
0

50

100

150

REDD1000

Keyness (linear) consumption function


. reg cons1000 redd1000
Source |
SS
df
MS
-------------+-----------------------------Model | 46059.3208
1 46059.3208
Residual | 4354.87802
98 44.4375308
-------------+-----------------------------Total | 50414.1988
99 509.234332

Number of obs
F( 1,
98)
Prob > F
R-squared
Adj R-squared
Root MSE

=
100
= 1036.50
= 0.0000
= 0.9136
= 0.9127
= 6.6661

-----------------------------------------------------------------------------cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7002875
.0217517
32.19
0.000
.6571221
.743453
_cons |
5.668498
1.331578
4.26
0.000
3.026025
8.310971
------------------------------------------------------------------------------

2. Heteroskedasticity tests
Graphical analysis represents a first step towards discovering whether heteroskedasticity is present.
We are supposing that the error variance is a function of income:
version 7: rvfplot, oneway twoway box ylabel xlabel yline(0)

20

Residuals

10

-10

-20
0

50
Fitted values

100

Heteroskedasticity tests verify the hypothesis


H0: VAR(i) = 2, i = 1, .., N.
In general, the tests use auxiliary regressions in the form

i2 = f ( Z i ) + u i ,
where ui iid(0, u2 ), and Zi are V1 vectors, with V number of variables in Z (and of associated
parameters s) used to explain the error variance; for this Zi are called the variance indicator
variables.
The null hypothesis to be tested becomes H0: =0.
What about the alternative hypothesis, H1? Non-constant variance implies that specific variance
behaviours must be assumed.
Under the alternative, the form of the detected heteroskedasticity depends on the choice of the
explanatory indicators Zi. The test is conditional on a set of variables which are presumed to
influence the error variance: fitted values, explanatory variables, any other variable presumed to
influence the error variance (for example, in the financial time-series setting, Engle (1982)
proposes
an
ARCH
test,
for
autoregressive
conditional
heteroskedasticity:
2
2
2
t = 1 t 1 + 2 t 2 + ... + u t ).

The statistic is computed as either the F (small samples) or LM (large samples) for the overall
significance of the independent variables in explaining i2 .
The F statistic is

R a2 / V
(1

Ra2 ) /( N

V 1)

, where Ra2 is the R-squared of the auxiliary regression.

The LM statistic is just the sample size times the R-squared of the auxiliary regression; under the
null, it is distributed asymptotically as V2 .

A first form of the test is the Breusch-Pagan (1979) (Breusch-Pagan (1979), Godfrey (1978), and
Cook-Weisberg (1983) separately derived the same test statistic).
It is a Lagrange multiplier test for heteroskedasticity in the error distribution.
It is the most general test, even if it is not powerful and it is sensitive to the assumption of error
normally distributed (this is the assumption of the original formulation; see below for a change in
this assumption).
The Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of freedom. It is
obtained by the following steps:
1) run the model regression and define the dependent variable of the Breusch-Pagan auxiliary
i2
regression g i =
;1
N
1
i2

N i =1
2) run the auxiliary regression g i = 0 + Z i + u i and obtain the BP statistic as BP = MSS/Mdf
This test can verify whether heteroskedasticity is conditional on any list of Zi variables, which are
presumed to influence the error variance (i.e. variance-indicators); they can be the fitted value, or
the explanatory variables of the model, or any variables you think they can be affect the errors
variance. The trade-off in the choice of indicator variables in these tests is that a smaller set of
indicator variables will preserve degrees of freedom, at the cost of being unable to detect
heteroskedasticity in certain directions.

For this, Breusch-Pagan (1979, p. 1293) say: the quantity

g i is of some importance in tests of heteroskedasticity.

Thus, if one is going to plot any quantity, it would seem more reasonable to plot

g i than i2 .. By dividing for the

mean, residuals are normalised: under the null there are not noise terms that can affect the chi-squared distribution; it is
possible to use every variable you think is useful in explaining heteroskedasticity.
6

A second form of the heteroskedasticity test is the very often reported White (1980) test for
heteroskedasticity.
It is based on a different auxiliary regression where the squared residuals are regressed on the
model regressors, all their squares, and all their possible (not redundant) cross products.
The asymptotic chi-squared White test-statistic is obtained by the product of the number of
observations times the R-squared of the auxiliary regression.
The F-version for small samples is obtained by setting to zero all the explanatory variables of the
auxiliary regression (i.e. by looking at the F-test for the overall significance of the auxiliary
regression).

We have several commands to execute these heteroskedasticity tests.


Suppose the model yi = + 1X1i + 2X2i + i,
Different possibilities for the heteroskedasticity test are summarized in the following table.
Breusch-Pagan

White

Fitted variable

hettest

X1 X2

hettest, rhv
bpagan X1 X2
ivhettest, all (ivlev)
(output BreuschPagan/Godfrey/CookWeisberg)

X1 X2 X21 X22 X1X2 hettest X1 X2 X21 X22 X1X2

bpagan X1 X2 X 1 X 2 X1X2
ivhettest, all ivcp (output
Breusch-Pagan/Godfrey/CookWeisberg)
NOTE: the command hettest is not appropriate after regress, nocons
2

hettest X1 X2 X21 X22 X1X2, iid


whitetst
ivhettest, ivcp (output
White/Koenker nR2 test
statistic)

For example, if we suppose that in our simple consumption model the levels of income and their
squares are both valid variance indicators, we can test for heteroskedasticity in the following way:
. g redd2=redd1000^2
. hettest redd1000 redd2
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000 redd2
chi2(2)
Prob > chi2

=
=

25.00
0.0000

The same result can be obtained by applying a procedure, written by C. F. Baum and V. Wiggins,
that specifically run the Breusch-Pagan (1979) test for heteroskedasticity conditional on a set of
variables.
. bpagan redd1000 redd2
Breusch-Pagan LM statistic:

25.0018

Chi-sq( 2)

P-value =

3.7e-06

In general, the Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of
freedom (in the latter example V=2). The statistic above may be replicated with the following steps.
1) Compute the dependent variable of the Breusch-Pagan auxiliary regression:
. reg cons1000 redd1000
Source |
SS
df
MS
Number of obs =
100
-------------+-----------------------------F( 1,
98) = 1036.50
Model | 46059.3208
1 46059.3208
Prob > F
= 0.0000
Residual | 4354.87802
98 44.4375308
R-squared
= 0.9136
-------------+-----------------------------Adj R-squared = 0.9127
Total | 50414.1988
99 509.234332
Root MSE
= 6.6661
-----------------------------------------------------------------------------cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7002875
.0217517
32.19
0.000
.6571221
.743453
_cons |
5.668498
1.331578
4.26
0.000
3.026025
8.310971
------------------------------------------------------------------------------

. predict res, resid


. g BP_g= res^2/(e(rss)/e(N))

where e(rss)=4354.87802 and e(N)=100 are post estimation results corresponding, respectively,
to the residual sum of squares and to the total number of observations.
2) Run the Breusch-Pagan auxiliary regression and comute the test statistic and/or its P-value:
. reg BP_g redd1000 redd2
Source |
SS
df
MS
Number of obs =
100
-------------+-----------------------------F( 2,
97) =
12.90
Model | 50.0036064
2 25.0018032
Prob > F
= 0.0000
Residual | 188.030708
97 1.93846091
R-squared
= 0.2101
-------------+-----------------------------Adj R-squared = 0.1938
Total | 238.034314
99 2.40438701
Root MSE
= 1.3923
-----------------------------------------------------------------------------BP_g |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 | -.0092459
.0167538
-0.55
0.582
-.0424975
.0240058
redd2 |
.0002848
.0001499
1.90
0.060
-.0000127
.0005823
_cons |
.4226007
.4038909
1.05
0.298
-.3790109
1.224212
------------------------------------------------------------------------------

. di e(mss)/e(df_m)
25.001803
where e(mss)=50.0036064

and e(df_m)=2 are post estimation results corresponding, respectively,


to the model sum of squares and to the model degrees of freedom of the auxiliary regression.
The P-value of the test is obtained as:

. display chi2tail(2,e(mss)/e(df_m))
3.723e-06

The White test can be performed in several ways, the easiest is to run a procedure, always written
by Baum and Cox, that automatically computes the asymptotic version of the White test.
. qui reg cons1000 redd1000
. whitetst
White's general test statistic :

21.00689

Chi-sq( 2)

P-value =

2.7e-05

This result may be replicated with the following steps.


1) Compute the dependent variable of the White auxiliary regression:
. g res2=res^2

2) Run the White auxiliary regression (remember that we have only one explanatory variable):
. reg res2 redd1000 redd2
Source |
SS
df
MS
-------------+-----------------------------Model | 94831.6529
2 47415.8264
Residual | 356599.534
97 3676.28386
-------------+-----------------------------Total | 451431.187
99 4559.91098

Number of obs
F( 2,
97)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

100
12.90
0.0000
0.2101
0.1938
60.632

-----------------------------------------------------------------------------res2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 | -.4026459
.7296083
-0.55
0.582
-1.850716
1.045424
redd2 |
.0124035
.0065273
1.90
0.060
-.0005513
.0253583
_cons |
18.40375
17.58896
1.05
0.298
-16.50546
53.31296
------------------------------------------------------------------------------

The White test-statistic and its P-value in the asymptotic version of the test
The LM test-statistic for heteroskedasticity is just the sample size N times the R-squared of the
auxiliary regression:
. di e(N)*e(r2)
21.00689
where e(N)=100 and e(r2)=0.2101

are post estimation results corresponding, respectively, to the


total number of observations and to the R-squared of the auxiliary regression. The P-value of the
test is obtained as:

. display chi2tail(2,e(N)*e(r2))
.00002744

The F version of the White test for small samples2


. testparm redd1000 redd2
( 1)
( 2)

redd1000 = 0.0
redd2 = 0.0
F(

2,
97) =
Prob > F =

12.90
0.0000

This command can be used also in the Breusch-Pagan auxiliary regression; of course the results in the two tests
coincide.
10

Note that:
. qui reg cons1000 redd1000
. hettest redd1000 red2, iid
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000 red2
chi2(2)
Prob > chi2

=
=

21.01
0.0000

The Breusch-Pagan (1979) test from the hettest command is numerically equal to the White
(1980) test for heteroskedasticity, if the same Whites auxiliary regression is specified and the
option iid is used. Differently from the default of hettest and from bpagan, that compute the
original Breusch-Pagan test assuming that the regression disturbances are normally distributed, the
option iid causes hettest to compute the NR2 version of the score test, which drops the
normality assumption.3
A useful command that, despite its name, also works after OLS and performs both previous tests is:
. ivhettest, all ivcp
OLS heteroskedasticity test(s) using levels and cross products of all IVs
Ho: Disturbance is homoskedastic
White/Koenker nR2 test statistic
: 21.007 Chi-sq(2) P-value = 0.0000
Breusch-Pagan/Godfrey/Cook-Weisberg : 25.002 Chi-sq(2) P-value = 0.0000

Note that if you write hettest only, the residual variance is assumed to depend on the fitted values
(i.e. Z i y i , and V=1); if you use the option ,rhs the residual variance is assumed to depend on
the explanatory variables of the model (in our case of one explanatory variable these two tests
coincide).
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000
chi2(1)
Prob > chi2

=
=

21.50
0.0000

. bpagan redd1000
Breusch-Pagan LM statistic:

21.50193

Chi-sq( 1)

P-value =

3.5e-06

. ivhettest, all
OLS heteroskedasticity test(s) using levels of IVs only
Ho: Disturbance is homoskedastic
4
White/Koenker nR2 test statistic
: 18.066 Chi-sq(1) P-value = 0.0000
Breusch-Pagan/Godfrey/Cook-Weisberg : 21.502 Chi-sq(1) P-value = 0.0000

Koenker (1981) showed that when the assumption of normality is removed, a version of the test is available that can
be calculated as the sample size N times the centered R-squared from an artificial regression of the squared residuals
from the original regression on the indicator variables.
4
This test is the Breusch-Pagan without the normality assumption.
11

3. How accounting for heteroskedasticity?


3.1. Heteroskedasticity-consistent estimates of the standard errors
A first way to account for heteroskedasticity is that of estimating models parameters by OLS (if
the Keynesian model is correctly specified, OLS estimator is unbiased and consistent, even if not
efficient due to heteroskedasticity), and of correcting the OLS estimates of the standard errors
(biased). To do so, consistent standard errors are needed.
The robust option of the regress Stata command specifies that the Eicker (1967)/Huber
(1973)/White (1980) sandwich estimator of variance is used instead of the traditional OLS error
variance estimator; inference is heteroskedasticity-robust.
In particular, White (1980) argues that it is not necessary to estimate all i2 s, but that we simply
need a consistent estimator of the (KK) matrix
N

X E( ) X = X 2 X = X Diag( i2 ) X = i2 X i X i ' .
i =1

If we define as Xi the (K1) vector of explanatory variables for observation i, a consistent estimator
can be obtained as
X X 1
=
N
N

i2 X i X i ' ,
i =1

where i is the OLS residual and plim

X X
= X 2 X .
N

Thus, the sandwich:


N
N

Var ( ) = ( X X ) 1 i2 X i X i '( X X ) 1 = X i X i '


i =1
i =1

i =1

i2 X i X i '

X i X i '
i =1

can be used as an estimate of the true variance of the OLS estimator.


In our case above, after we detected residuals heteroskedasticity, under the assumption that the
other assumptions about our keynesian model hold, we can obtain consistent standard errors using a
very simple option:
. reg cons1000 redd1000, robust
Regression with robust standard errors

Number of obs =
F( 1,
98) =
Prob > F
=
R-squared
=
Root MSE
=

100
799.78
0.0000
0.9136
6.6661

-----------------------------------------------------------------------------|
Robust
cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7002875
.0247622
28.28
0.000
.6511477
.7494274
_cons |
5.668498
1.076363
5.27
0.000
3.532492
7.804505
------------------------------------------------------------------------------

NOTE: parameter estimates (with and without standard errors correction) are identical: the White
correction does not modify the parameters estimates.
12

3.2. Feasible generalised least squares (FGLS)


If we have some idea about the heteroskedasticity determinants, we can introduce a different
estimator: FGLS (feasible generalised least squares), the efficient estimator in the context of
heteroskedastic errors (remember: OLS is only consistent but inefficient because does not account
for the heteroskedastic behaviour of errors).
If Var ( i | X ) = i2 = 2 ih , with i observed variable and h known constant,
the inverse of is diagonal with generic element i h .
Lets define the L matrix, diagonal with generic element i h / 2 .
The general principle at the basis of FGLS is the following.
.
Suppose we know or we dispose of a consistent estimate
is not singular, and it is possible to find a (KN) matrix L such that
In addition,
L = IN and LL=
-1.
L
The specific form of the L matrix depends on the problem one has to tackle.
But the general principle is that of minimise an appropriately weighted average of squared errors,
with lower weights to the observations characterised by the higher residual variance.
Pre-multiply by L the heteroskedastic model: y = X + and obtain
y* = X* +*
where y* = Ly, X* = LX and *=L
Now it is true that:
E(*) = E(L) = LE() = 0
L = 2IN.
E(**) = E(L L) = LE()L = 2L
Hence, the OLS estimator of the transformed model is best (minimum variance) and corresponds to
the FGLS estimator:
1 X ) 1 X
1 y .
FGLS = ( X *' X *) 1 X *' y* = ( X L' LX ) 1 X L' Ly = ( X
The FGLS are BLUE despite the presence of heteroskedasticty (and/or autocorrelation); in other
terms, the Aitken Theorem applied to transformed data substitutes for the Gauss-Markov Theorem,
and, in particular, the Gauss-Markov theorem is a special case of the Aitken theorem for = IN.

13

When is known in the structure and in the parameters we directly are in the case of FGLS.
For example, in the cases of:
group wise heteroskedasticity;
autocorrelation in the MA(1) form when we estimated dynamic panel after taking first
difference to remove individual effects (note that, given the presence of lagged dependent
variable among regressors, we need to use IV+GLS = GIVE, generalized instrumental variable
estimator).
Weighted least squares (WLS) is a specific case of FGLS, used, for example, in presence of group
wise heteroskedasticity i.e. when we know that heteroskedasticity derives from how the data are
collected: we only dispose of averaged or aggregated data (by clusters which may be industries,
typologies of companies and so on). In this case is known in its structure and parameters.
Some examples are in the Appendix.

Usually is stochastic, known in the structure, but unknown in the parameters.


,a
Thus, we talk of UGLS, Unfeasible GLS. Estimation is possible only after we dispose of
consistent estimate of the VCOV errors matrix; in this case UGLS become feasible (FGLS).
FGLS estimator is consistent and asymptotically efficient (the small sample properties are
unknown).
As examples:
constant autocorrelation inside the individual in panel data with random effects;
cross-correlation in seemingly unrelated regressions (SUR);
comfac models, i.e static models with AR(1) errors (this case is very specific and not very
realistic).
Note that in an autoregressive model with autocorrelated errors OLS are biased and not consistent,
and FGLS are not applicable, unless we estimate with instrumental variables (IV) in order to obtain
. This is the generalized IV (GIVE) or heteroskedastic 2SLS (two stages least squares):

1 X ) 1 Z
1 y .
GIVE = ( Z L' LX ) 1 Z L' Ly = ( Z
See more in lecture_IV.
An alternative is to augment the dynamics i.e. to re-specify the model.

14

Behavioural assumption in the consumption-income relationship: the error variance is a linear


function of income (redd1000) because wealthy people have a larger set of consumption options.
If this is true, then it is reasonable use such information in the estimation phase down-weighting the
observation corresponding to higher incomes because less informative about the regression line.
In fact, they are assumed to be more dispersed (higher variance) than those of poorer people.
From the model Ci = + Ri + i, where Var(i) = 2i = 2Ri .
12 L 0
R1 L 0

2
2
2
2
2
Hence is this case = Diag ( i ) = 0 O 0 = 0 O 0 .
0 L N2
0 L R N

If we scale all the variables by the root square of income, we obtain the transformed model:

Ci
R
R

=
+ i + i =
+ i + ui ,
Ri
Ri
Ri
Ri
Ri
Ri
1
1 2
1 2
i
=
Var ( i ) =
i =
Ri = 2 , i.e. errors ui are homoskedastic.

R
R
R
i
i
i
Ri

where Var (u i ) = Var

1 / 1

Hence, in this case: L = 0


0

0 1 R1

O
0 = 0
L 1 / N 0

O
0
L 1 R N
L

WLS are efficient just because the higher-variance observations (i.e. those corresponding to richer
people) have less weight.5

If the model we suppose able to explain the heteroskedasticity is right, the FGLS are more efficient than robust OLS.
15

. reg cons1000 redd1000 [aweight=1/redd1000]


(sum of wgt is
1.0121e+02)
Source |
SS
df
MS
-------------+-----------------------------Model | 2623.25305
1 2623.25305
Residual | 84.2000896
98 .859184587
-------------+-----------------------------Total | 2707.45314
99 27.3480115

Number of obs
F( 1,
98)
Prob > F
R-squared
Adj R-squared
Root MSE

=
100
= 3053.19
= 0.0000
= 0.9689
= 0.9686
= .92692

-----------------------------------------------------------------------------cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7145188
.0129311
55.26
0.000
.6888574
.7401803
_cons |
4.914329
.0935686
52.52
0.000
4.728645
5.100013
------------------------------------------------------------------------------

aweight stands for analytical weights are inversely proportional to the variance of an
observation. These are automatically employed in models that use averages, e.g. in Between Effects
panel regression.
FGLS (WLS) can be reproduced by the following steps:
. g peso=1/redd1000^0.5
. g consp=cons1000*peso
. g reddp=redd1000*peso
. reg consp reddp peso, noconst
Source |
SS
df
MS
-------------+-----------------------------Model | 5852.17626
2 2926.08813
Residual | 85.2218967
98 .869611191
-------------+-----------------------------Total | 5937.39815
100 59.3739815

Number of obs
F( 2,
98)
Prob > F
R-squared
Adj R-squared
Root MSE

=
100
= 3364.82
= 0.0000
= 0.9856
= 0.9854
= .93253

-----------------------------------------------------------------------------consp |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reddp |
.7145188
.0129311
55.26
0.000
.6888574
.7401803
peso |
4.914329
.0935686
52.52
0.000
4.728645
5.100013
------------------------------------------------------------------------------

. whitetst
White's general test statistic :

5.086438

Chi-sq( 5)

P-value =

.4054

. g reddp2=reddp^2
. bpagan reddp reddp2
Breusch-Pagan LM statistic:

1.265137

Chi-sq( 2)

P-value =

.5312

Opposite to the previous two heteroskedasticity tests, note that after a regression without the
constant term we cannot run the hettest command:
. hettest
not appropriate after regress, nocons
r(301);

16

Previous heteroskedasticy tests are run for didactical reasons only, just to see that the
heteroskedasticity is no more present in the weighted regression; of course heroskedasticity tests
are not performable after FGLS.
All the issues raised above can be summarised in a single table on order to fix ideas. In doing so,
we use the previous (we checked it is heteroskedastic) consumption function. To do so, we can
quietly run three regressions of interest, namely: (1) heteroskedastic OLS without Whites standard
errors correction; (2) heteroskedastic OLS without Whites standard errors correction; (3) WLS
assuming that the errors variance is a linear function of incomes:
. qui reg cons1000 redd1000
. est store OLS
. qui reg cons1000 redd1000, robust
. est store white
. qui reg cons1000 redd1000 [aweight=1/redd1000]
. est store WLS

. est table OLS white WLS , b(%6.3f) se(%6.3f) t(%6.2f) /*


*/
stats(N df_r df_m r2 r2_a rmse F)
-------------------------------------------Variable |
OLS
white
WLS
-------------+-----------------------------redd1000 |
0.700
0.700
0.715
|
0.022
0.025
0.013
|
32.19
28.28
55.26
_cons |
5.668
5.668
4.914
|
1.332
1.076
0.094
|
4.26
5.27
52.52
-------------+-----------------------------N |
100
100
100
df_r | 98.000
98.000
98.000
df_m |
1.000
1.000
1.000
r2 |
0.914
0.914
0.969
r2_a |
0.913
0.913
0.969
rmse |
6.666
6.666
0.927
F | 1036.496
799.785
3053.189
-------------------------------------------legend: b/se/t

Discussion. In the context of a model with heteroskedastic errors, both OLS and WLS estimators
are unbiased and consistent, therefore all the estimates are fairly closed each other. The parameters
standard errors estimated by OLS in the first column are biased (because of heteroskedastic errors),
while those in the second column are robust to heteroskedasticity (hence, reliable). However, being
WLS also efficient, the standard errors reported in the third column are remarkably lower than
those in the second column.

17

Appendix
A1. Averaged data
yc = X c + c

where c = 1,2,..,C is the number of groups (or clusters). Each group is composed by i = 1,2,..,Nc
individuals which are averaged.
Single individuals have homoskedastic errors, Var(i) = 2 i=1,2,..,N, and are not cross-sectional
correlated, Cov(i, j) = 0 for i j.
However, the available observations are
c =

1
1
( 1 + .. + N c ) =
Nc
Nc

Nc

i =1

Hence variance is:


1
Var ( c ) = E
Nc

1
2
i = 2 N c 2 =
= 2c

Nc
N c
i =1

Nc

i.e. the error variance decreases as the number of individuals within a cluster, Nc, increases. 6
12 L 0

Var( | X) = E( ' | X) = 2 = 2 Diag ( i2 ) = 2 0 O 0 =


0 L N2

1 / N1
0

=2 0

0
0
FGLS/WLS weight each observation with
variance

L
0
O
0
L 1/ N c
O
0
0
0

L
0 .
O
0
L
0

O
0
L 1 / N C

N c giving more weight to observations with higher

N1 L
0
L
0

O
0
O
0
0
In particular, the L matrix is L= 0
L
Nc L
0 .

O
0
O
0
0
0
0
0
L
N C

If we multiply all the variables by the root square of each group dimension, we obtain the
transformed model:
Ly = LX + L that, looking at the cth observation, corresponds to

N c yc = N c X c + N c c ,
where Var ( N c c ) = N c

2
Nc

= 2 , i.e. transformed errors are homoskedastic.

Note that with this kind of data we loose the within groups variation and hence the estimates of paramteres are less
precise. However, the fit, R2, improves because the variation of errors are averaged.
18

The OLS estimator of the transformed model is best (minimum variance) and corresponds to the
FGLS/WLS estimator:
FGLS / WLS = ( X L' LX ) 1 X L' Ly = ( X 1 X ) 1 X 1 y .

A2. Aggregated data


yc = X c + c

where c = 1,2,..,C is the number of groups (or clusters), and each group is the sum of i = 1,2,..,Nc
individuals.
Our observations are
Nc

c = i .
i =1

Hence variance is:


2

Nc
Var ( c ) = E i = N c 2
i =1

i.e. the error variance increases as the number of individuals within a cluster, Nc, increases; this is
true even if the covariance among individuals within the cluster is negative.
12 L 0

Var( | X) = E( ' | X) = 2 = 2 Diag ( i2 ) = 2 0 O 0 =


0 L N2

N1 L 0 L 0
0 O 0 O 0

= 2 0 L Nc L 0

0 O 0 O 0
0 0 0 L N C
FGLS/WLS weight each observation with 1/ N c giving more weight to observations with higher
variance 2 c .
1 / N 1 L
0
L
0

O
0
O
0
0
In particular, the L matrix is L= 0
L 1/ N c L
0 .

O
0
O
0
0
0
0
0
L 1 / N C

If we scale all the variables by the root square of each group dimension, we obtain the transformed
model:
Ly = LX + L that, looking at the c observation, corresponds to
1
1
1
yc =
Xc +
c ,
Nc
Nc
Nc
1
1
c ) =
N c 2 = 2 , i.e. transformed errors are homoskedastic.
where Var (
N
Nc
c

19

A3. Some hints on panel data


To conclude this lecture and to add useful information especially in the panel data context, we
compare three OLS estimators with different correction of the standard errors that are available in
the regress command.
(1) No correction of the standard errors or homoskedastic estimator (regress)
Var ( OLS ) = ( X X ) 1 X Var ( ) X ( X X ) 1 = ( X X ) 1 X s 2 X ( X X ) 1 = s 2(X X) 1 ,
N
1
where s 2 =
i2 .

N K i =1
(2) Heteroskedastic-consistent estimator (regress, robust)
N
N

1
1
1

Var ( robust ) = (X X) ( i X i )( i X i ) (X X) = (X X) i2 X i X i (X X) 1 .
i =1
i =1

where the center of the sandwich is sometimes multiplied by N/(N-K) as a degree-of-freedom


adjustment for finite-sample.
(3) Estimator that accounts for clustering into groups, with observations correlated within groups,
but independent between groups [regress, cluster(name_groups)]
NC

Var ( cluster ) = (X X) 1 u c u c (X X)1 ,


c =1

where we have c = 1, 2, ..., NC clusters and u c = iX i is the sum of observations within each
ic

cluster c; the center of the sandwich is sometimes multiplied by (N-1)/(N-K) Nc /( Nc -1) as finitesample adjustment.
Note that cluster implies robust option. The formula for the clustered estimator is simply that
of the robust (unclustered) estimator with the individual i X i replaced by their sums over each
cluster. In other terms, the standard errors are computed based on aggregate y for the Nc
independent groups.
If the variance of the clustered estimator (3) is smaller than that of the robust (unclustered)
estimator (2), it means that the cluster sums of i X i have less variability than the individual i X i .
That is, when we sum the i X i within a cluster, some of the variation gets cancelled out, and the
total variation is smaller.
This means that a big positive is summed with a big negative to produce something small; in other
words, there is negative correlation within cluster.
If the number of clusters is very small compared to the overall sample size, it could be that the
clustered standard errors (3) are quite larger than the homoskedastic ones (1), because they are
computed on aggregate data for few groups.
Interpreting a difference between (1) the OLS estimator and (2) or (3) is trickier.
In (1) the squared residuals are summed, but in (2) and (3) the residuals are multiplied by the X's
(then for (3) summed within cluster) and then squared and summed.
Hence, any difference between them has to do with very complicated relationships between the
residuals and the X's.

20

If big (in absolute value) i are paired with big Xi, then the robust variance estimate will be bigger
than the OLS estimate.
On the other hand, if the robust variance estimate is smaller than the OLS estimate, it is not clear at
all what is happening (in any case, it has to do with some odd correlations between the residuals
and the X's.
Note that if the OLS model is true, the residuals should, of course, be uncorrelated with the X's.
Indeed, if all the assumptions of the OLS model are true, then the expected values of (1) the OLS
estimator and (2) the robust (unclustered) estimator are approximately the same. So, if the robust
(unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS
assumptions are true and we are seeing a bit of random variation. If the robust (unclustered)
estimates are much smaller than the OLS estimates, then either we are seeing a lot of random
variation (which is possible, but unlikely), or else there is something odd going on between the
residuals and the X's.

21

Das könnte Ihnen auch gefallen