Beruflich Dokumente
Kultur Dokumente
LECTURE6_GLS
HETEROSKEDASTICITY GLS (WLS) ESTIMATORS WHITE CORRECTION
E ( i ) = 0, i = 1,..., N
E ( i2
| X ) = Var ( i | X ) = , i = 1,..., N
2
E( i j | X ) = Cov( i , j | X ) = 0 ,i j
identically
distributed
iid (0,
IN )
independently
distributed
0 .
E() = 2 I N = 2 0 O 0 = 0 O
0 L 1 0 L 2
In other terms, the VCOV matrix is a scalar matrix, i.e. a diagonal matrix whose diagonal elements
are all equal.
We can compute the variance of the estimator by (exogeneity assumed):
= 2 ( X X ) 1 = 2 X i X i '
i =1
where is a positive definite matrix, not necessarily scalar. Hence, it may be necessary to estimate
N additional parameters (the parameters along the main diagonal).
In presence of heteroskedasticity OLS is unbiased (unbiasedness is based on linearity and
exogeneity) but not efficient:
Var ( | X ) = E (( )( )) = ( X X ) 1 X Var ( | X ) X ( X X ) 1 =
= ( X X ) 1 X 2 X ( X X ) 1 = ( X X ) 1 X Diag ( i2 ) X ( X X ) 1 2 ( X X ) 1
1
1
M
tr ( M )
E(s 2 ) = E
=
tr [E ( M )] =
tr[ ME ( )] =
= E
= E
N K
N K N K
N K
N K
.
2
1
2
2
=
tr ( M ) =
tr ( M)
N K
NK
where M = I -PX = I - X(X'X)-1X' is the matrix projecting Y upon the space orthogonal to the one
spanned by the columns of X:
= Y - Y =Y - X = Y - X (X'X)-1X'Y = (I-PX)Y= MXY
The matrix M is symmetric (M' = M), idempotent (M M = M), with rank(M) = tr(M) = N-K.
Hence, the variance of OLS is biased because the weighing matrix is no more ( X X ) 1 and because
s2 is a biased estimator of 2.
As a consequence, inference (test t and F) is not correct: statistic-test do not have the standard
distribution; standard confidence regions are no more valid.
2
The idea of explaining the consumption with the income in the previous year predetermines the
dynamic relationship in a quite restrictive way but at the advantage of avoiding consumptionincome simultaneity and endogeneity problems.
Scatterplot tells us that the consumption variability growths with the level of income: richer people
behave in different ways. This fact per se implies the likely heteroschedasticity of the linear model
residuals.
. graph7 cons1000 redd1000, ylabel xlabel
CONS1000
100
50
0
0
50
100
150
REDD1000
Number of obs
F( 1,
98)
Prob > F
R-squared
Adj R-squared
Root MSE
=
100
= 1036.50
= 0.0000
= 0.9136
= 0.9127
= 6.6661
-----------------------------------------------------------------------------cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7002875
.0217517
32.19
0.000
.6571221
.743453
_cons |
5.668498
1.331578
4.26
0.000
3.026025
8.310971
------------------------------------------------------------------------------
2. Heteroskedasticity tests
Graphical analysis represents a first step towards discovering whether heteroskedasticity is present.
We are supposing that the error variance is a function of income:
version 7: rvfplot, oneway twoway box ylabel xlabel yline(0)
20
Residuals
10
-10
-20
0
50
Fitted values
100
i2 = f ( Z i ) + u i ,
where ui iid(0, u2 ), and Zi are V1 vectors, with V number of variables in Z (and of associated
parameters s) used to explain the error variance; for this Zi are called the variance indicator
variables.
The null hypothesis to be tested becomes H0: =0.
What about the alternative hypothesis, H1? Non-constant variance implies that specific variance
behaviours must be assumed.
Under the alternative, the form of the detected heteroskedasticity depends on the choice of the
explanatory indicators Zi. The test is conditional on a set of variables which are presumed to
influence the error variance: fitted values, explanatory variables, any other variable presumed to
influence the error variance (for example, in the financial time-series setting, Engle (1982)
proposes
an
ARCH
test,
for
autoregressive
conditional
heteroskedasticity:
2
2
2
t = 1 t 1 + 2 t 2 + ... + u t ).
The statistic is computed as either the F (small samples) or LM (large samples) for the overall
significance of the independent variables in explaining i2 .
The F statistic is
R a2 / V
(1
Ra2 ) /( N
V 1)
The LM statistic is just the sample size times the R-squared of the auxiliary regression; under the
null, it is distributed asymptotically as V2 .
A first form of the test is the Breusch-Pagan (1979) (Breusch-Pagan (1979), Godfrey (1978), and
Cook-Weisberg (1983) separately derived the same test statistic).
It is a Lagrange multiplier test for heteroskedasticity in the error distribution.
It is the most general test, even if it is not powerful and it is sensitive to the assumption of error
normally distributed (this is the assumption of the original formulation; see below for a change in
this assumption).
The Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of freedom. It is
obtained by the following steps:
1) run the model regression and define the dependent variable of the Breusch-Pagan auxiliary
i2
regression g i =
;1
N
1
i2
N i =1
2) run the auxiliary regression g i = 0 + Z i + u i and obtain the BP statistic as BP = MSS/Mdf
This test can verify whether heteroskedasticity is conditional on any list of Zi variables, which are
presumed to influence the error variance (i.e. variance-indicators); they can be the fitted value, or
the explanatory variables of the model, or any variables you think they can be affect the errors
variance. The trade-off in the choice of indicator variables in these tests is that a smaller set of
indicator variables will preserve degrees of freedom, at the cost of being unable to detect
heteroskedasticity in certain directions.
Thus, if one is going to plot any quantity, it would seem more reasonable to plot
mean, residuals are normalised: under the null there are not noise terms that can affect the chi-squared distribution; it is
possible to use every variable you think is useful in explaining heteroskedasticity.
6
A second form of the heteroskedasticity test is the very often reported White (1980) test for
heteroskedasticity.
It is based on a different auxiliary regression where the squared residuals are regressed on the
model regressors, all their squares, and all their possible (not redundant) cross products.
The asymptotic chi-squared White test-statistic is obtained by the product of the number of
observations times the R-squared of the auxiliary regression.
The F-version for small samples is obtained by setting to zero all the explanatory variables of the
auxiliary regression (i.e. by looking at the F-test for the overall significance of the auxiliary
regression).
White
Fitted variable
hettest
X1 X2
hettest, rhv
bpagan X1 X2
ivhettest, all (ivlev)
(output BreuschPagan/Godfrey/CookWeisberg)
bpagan X1 X2 X 1 X 2 X1X2
ivhettest, all ivcp (output
Breusch-Pagan/Godfrey/CookWeisberg)
NOTE: the command hettest is not appropriate after regress, nocons
2
For example, if we suppose that in our simple consumption model the levels of income and their
squares are both valid variance indicators, we can test for heteroskedasticity in the following way:
. g redd2=redd1000^2
. hettest redd1000 redd2
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000 redd2
chi2(2)
Prob > chi2
=
=
25.00
0.0000
The same result can be obtained by applying a procedure, written by C. F. Baum and V. Wiggins,
that specifically run the Breusch-Pagan (1979) test for heteroskedasticity conditional on a set of
variables.
. bpagan redd1000 redd2
Breusch-Pagan LM statistic:
25.0018
Chi-sq( 2)
P-value =
3.7e-06
In general, the Breusch and Pagan test-statistic is distributed as a chi-squared with V degrees of
freedom (in the latter example V=2). The statistic above may be replicated with the following steps.
1) Compute the dependent variable of the Breusch-Pagan auxiliary regression:
. reg cons1000 redd1000
Source |
SS
df
MS
Number of obs =
100
-------------+-----------------------------F( 1,
98) = 1036.50
Model | 46059.3208
1 46059.3208
Prob > F
= 0.0000
Residual | 4354.87802
98 44.4375308
R-squared
= 0.9136
-------------+-----------------------------Adj R-squared = 0.9127
Total | 50414.1988
99 509.234332
Root MSE
= 6.6661
-----------------------------------------------------------------------------cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7002875
.0217517
32.19
0.000
.6571221
.743453
_cons |
5.668498
1.331578
4.26
0.000
3.026025
8.310971
------------------------------------------------------------------------------
where e(rss)=4354.87802 and e(N)=100 are post estimation results corresponding, respectively,
to the residual sum of squares and to the total number of observations.
2) Run the Breusch-Pagan auxiliary regression and comute the test statistic and/or its P-value:
. reg BP_g redd1000 redd2
Source |
SS
df
MS
Number of obs =
100
-------------+-----------------------------F( 2,
97) =
12.90
Model | 50.0036064
2 25.0018032
Prob > F
= 0.0000
Residual | 188.030708
97 1.93846091
R-squared
= 0.2101
-------------+-----------------------------Adj R-squared = 0.1938
Total | 238.034314
99 2.40438701
Root MSE
= 1.3923
-----------------------------------------------------------------------------BP_g |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 | -.0092459
.0167538
-0.55
0.582
-.0424975
.0240058
redd2 |
.0002848
.0001499
1.90
0.060
-.0000127
.0005823
_cons |
.4226007
.4038909
1.05
0.298
-.3790109
1.224212
------------------------------------------------------------------------------
. di e(mss)/e(df_m)
25.001803
where e(mss)=50.0036064
. display chi2tail(2,e(mss)/e(df_m))
3.723e-06
The White test can be performed in several ways, the easiest is to run a procedure, always written
by Baum and Cox, that automatically computes the asymptotic version of the White test.
. qui reg cons1000 redd1000
. whitetst
White's general test statistic :
21.00689
Chi-sq( 2)
P-value =
2.7e-05
2) Run the White auxiliary regression (remember that we have only one explanatory variable):
. reg res2 redd1000 redd2
Source |
SS
df
MS
-------------+-----------------------------Model | 94831.6529
2 47415.8264
Residual | 356599.534
97 3676.28386
-------------+-----------------------------Total | 451431.187
99 4559.91098
Number of obs
F( 2,
97)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
100
12.90
0.0000
0.2101
0.1938
60.632
-----------------------------------------------------------------------------res2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 | -.4026459
.7296083
-0.55
0.582
-1.850716
1.045424
redd2 |
.0124035
.0065273
1.90
0.060
-.0005513
.0253583
_cons |
18.40375
17.58896
1.05
0.298
-16.50546
53.31296
------------------------------------------------------------------------------
The White test-statistic and its P-value in the asymptotic version of the test
The LM test-statistic for heteroskedasticity is just the sample size N times the R-squared of the
auxiliary regression:
. di e(N)*e(r2)
21.00689
where e(N)=100 and e(r2)=0.2101
. display chi2tail(2,e(N)*e(r2))
.00002744
redd1000 = 0.0
redd2 = 0.0
F(
2,
97) =
Prob > F =
12.90
0.0000
This command can be used also in the Breusch-Pagan auxiliary regression; of course the results in the two tests
coincide.
10
Note that:
. qui reg cons1000 redd1000
. hettest redd1000 red2, iid
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000 red2
chi2(2)
Prob > chi2
=
=
21.01
0.0000
The Breusch-Pagan (1979) test from the hettest command is numerically equal to the White
(1980) test for heteroskedasticity, if the same Whites auxiliary regression is specified and the
option iid is used. Differently from the default of hettest and from bpagan, that compute the
original Breusch-Pagan test assuming that the regression disturbances are normally distributed, the
option iid causes hettest to compute the NR2 version of the score test, which drops the
normality assumption.3
A useful command that, despite its name, also works after OLS and performs both previous tests is:
. ivhettest, all ivcp
OLS heteroskedasticity test(s) using levels and cross products of all IVs
Ho: Disturbance is homoskedastic
White/Koenker nR2 test statistic
: 21.007 Chi-sq(2) P-value = 0.0000
Breusch-Pagan/Godfrey/Cook-Weisberg : 25.002 Chi-sq(2) P-value = 0.0000
Note that if you write hettest only, the residual variance is assumed to depend on the fitted values
(i.e. Z i y i , and V=1); if you use the option ,rhs the residual variance is assumed to depend on
the explanatory variables of the model (in our case of one explanatory variable these two tests
coincide).
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: redd1000
chi2(1)
Prob > chi2
=
=
21.50
0.0000
. bpagan redd1000
Breusch-Pagan LM statistic:
21.50193
Chi-sq( 1)
P-value =
3.5e-06
. ivhettest, all
OLS heteroskedasticity test(s) using levels of IVs only
Ho: Disturbance is homoskedastic
4
White/Koenker nR2 test statistic
: 18.066 Chi-sq(1) P-value = 0.0000
Breusch-Pagan/Godfrey/Cook-Weisberg : 21.502 Chi-sq(1) P-value = 0.0000
Koenker (1981) showed that when the assumption of normality is removed, a version of the test is available that can
be calculated as the sample size N times the centered R-squared from an artificial regression of the squared residuals
from the original regression on the indicator variables.
4
This test is the Breusch-Pagan without the normality assumption.
11
X E( ) X = X 2 X = X Diag( i2 ) X = i2 X i X i ' .
i =1
If we define as Xi the (K1) vector of explanatory variables for observation i, a consistent estimator
can be obtained as
X X 1
=
N
N
i2 X i X i ' ,
i =1
X X
= X 2 X .
N
i =1
i2 X i X i '
X i X i '
i =1
Number of obs =
F( 1,
98) =
Prob > F
=
R-squared
=
Root MSE
=
100
799.78
0.0000
0.9136
6.6661
-----------------------------------------------------------------------------|
Robust
cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7002875
.0247622
28.28
0.000
.6511477
.7494274
_cons |
5.668498
1.076363
5.27
0.000
3.532492
7.804505
------------------------------------------------------------------------------
NOTE: parameter estimates (with and without standard errors correction) are identical: the White
correction does not modify the parameters estimates.
12
13
When is known in the structure and in the parameters we directly are in the case of FGLS.
For example, in the cases of:
group wise heteroskedasticity;
autocorrelation in the MA(1) form when we estimated dynamic panel after taking first
difference to remove individual effects (note that, given the presence of lagged dependent
variable among regressors, we need to use IV+GLS = GIVE, generalized instrumental variable
estimator).
Weighted least squares (WLS) is a specific case of FGLS, used, for example, in presence of group
wise heteroskedasticity i.e. when we know that heteroskedasticity derives from how the data are
collected: we only dispose of averaged or aggregated data (by clusters which may be industries,
typologies of companies and so on). In this case is known in its structure and parameters.
Some examples are in the Appendix.
1 X ) 1 Z
1 y .
GIVE = ( Z L' LX ) 1 Z L' Ly = ( Z
See more in lecture_IV.
An alternative is to augment the dynamics i.e. to re-specify the model.
14
2
2
2
2
2
Hence is this case = Diag ( i ) = 0 O 0 = 0 O 0 .
0 L N2
0 L R N
If we scale all the variables by the root square of income, we obtain the transformed model:
Ci
R
R
=
+ i + i =
+ i + ui ,
Ri
Ri
Ri
Ri
Ri
Ri
1
1 2
1 2
i
=
Var ( i ) =
i =
Ri = 2 , i.e. errors ui are homoskedastic.
R
R
R
i
i
i
Ri
1 / 1
0 1 R1
O
0 = 0
L 1 / N 0
O
0
L 1 R N
L
WLS are efficient just because the higher-variance observations (i.e. those corresponding to richer
people) have less weight.5
If the model we suppose able to explain the heteroskedasticity is right, the FGLS are more efficient than robust OLS.
15
Number of obs
F( 1,
98)
Prob > F
R-squared
Adj R-squared
Root MSE
=
100
= 3053.19
= 0.0000
= 0.9689
= 0.9686
= .92692
-----------------------------------------------------------------------------cons1000 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------redd1000 |
.7145188
.0129311
55.26
0.000
.6888574
.7401803
_cons |
4.914329
.0935686
52.52
0.000
4.728645
5.100013
------------------------------------------------------------------------------
aweight stands for analytical weights are inversely proportional to the variance of an
observation. These are automatically employed in models that use averages, e.g. in Between Effects
panel regression.
FGLS (WLS) can be reproduced by the following steps:
. g peso=1/redd1000^0.5
. g consp=cons1000*peso
. g reddp=redd1000*peso
. reg consp reddp peso, noconst
Source |
SS
df
MS
-------------+-----------------------------Model | 5852.17626
2 2926.08813
Residual | 85.2218967
98 .869611191
-------------+-----------------------------Total | 5937.39815
100 59.3739815
Number of obs
F( 2,
98)
Prob > F
R-squared
Adj R-squared
Root MSE
=
100
= 3364.82
= 0.0000
= 0.9856
= 0.9854
= .93253
-----------------------------------------------------------------------------consp |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reddp |
.7145188
.0129311
55.26
0.000
.6888574
.7401803
peso |
4.914329
.0935686
52.52
0.000
4.728645
5.100013
------------------------------------------------------------------------------
. whitetst
White's general test statistic :
5.086438
Chi-sq( 5)
P-value =
.4054
. g reddp2=reddp^2
. bpagan reddp reddp2
Breusch-Pagan LM statistic:
1.265137
Chi-sq( 2)
P-value =
.5312
Opposite to the previous two heteroskedasticity tests, note that after a regression without the
constant term we cannot run the hettest command:
. hettest
not appropriate after regress, nocons
r(301);
16
Previous heteroskedasticy tests are run for didactical reasons only, just to see that the
heteroskedasticity is no more present in the weighted regression; of course heroskedasticity tests
are not performable after FGLS.
All the issues raised above can be summarised in a single table on order to fix ideas. In doing so,
we use the previous (we checked it is heteroskedastic) consumption function. To do so, we can
quietly run three regressions of interest, namely: (1) heteroskedastic OLS without Whites standard
errors correction; (2) heteroskedastic OLS without Whites standard errors correction; (3) WLS
assuming that the errors variance is a linear function of incomes:
. qui reg cons1000 redd1000
. est store OLS
. qui reg cons1000 redd1000, robust
. est store white
. qui reg cons1000 redd1000 [aweight=1/redd1000]
. est store WLS
Discussion. In the context of a model with heteroskedastic errors, both OLS and WLS estimators
are unbiased and consistent, therefore all the estimates are fairly closed each other. The parameters
standard errors estimated by OLS in the first column are biased (because of heteroskedastic errors),
while those in the second column are robust to heteroskedasticity (hence, reliable). However, being
WLS also efficient, the standard errors reported in the third column are remarkably lower than
those in the second column.
17
Appendix
A1. Averaged data
yc = X c + c
where c = 1,2,..,C is the number of groups (or clusters). Each group is composed by i = 1,2,..,Nc
individuals which are averaged.
Single individuals have homoskedastic errors, Var(i) = 2 i=1,2,..,N, and are not cross-sectional
correlated, Cov(i, j) = 0 for i j.
However, the available observations are
c =
1
1
( 1 + .. + N c ) =
Nc
Nc
Nc
i =1
1
2
i = 2 N c 2 =
= 2c
Nc
N c
i =1
Nc
i.e. the error variance decreases as the number of individuals within a cluster, Nc, increases. 6
12 L 0
1 / N1
0
=2 0
0
0
FGLS/WLS weight each observation with
variance
L
0
O
0
L 1/ N c
O
0
0
0
L
0 .
O
0
L
0
O
0
L 1 / N C
N1 L
0
L
0
O
0
O
0
0
In particular, the L matrix is L= 0
L
Nc L
0 .
O
0
O
0
0
0
0
0
L
N C
If we multiply all the variables by the root square of each group dimension, we obtain the
transformed model:
Ly = LX + L that, looking at the cth observation, corresponds to
N c yc = N c X c + N c c ,
where Var ( N c c ) = N c
2
Nc
Note that with this kind of data we loose the within groups variation and hence the estimates of paramteres are less
precise. However, the fit, R2, improves because the variation of errors are averaged.
18
The OLS estimator of the transformed model is best (minimum variance) and corresponds to the
FGLS/WLS estimator:
FGLS / WLS = ( X L' LX ) 1 X L' Ly = ( X 1 X ) 1 X 1 y .
where c = 1,2,..,C is the number of groups (or clusters), and each group is the sum of i = 1,2,..,Nc
individuals.
Our observations are
Nc
c = i .
i =1
Nc
Var ( c ) = E i = N c 2
i =1
i.e. the error variance increases as the number of individuals within a cluster, Nc, increases; this is
true even if the covariance among individuals within the cluster is negative.
12 L 0
N1 L 0 L 0
0 O 0 O 0
= 2 0 L Nc L 0
0 O 0 O 0
0 0 0 L N C
FGLS/WLS weight each observation with 1/ N c giving more weight to observations with higher
variance 2 c .
1 / N 1 L
0
L
0
O
0
O
0
0
In particular, the L matrix is L= 0
L 1/ N c L
0 .
O
0
O
0
0
0
0
0
L 1 / N C
If we scale all the variables by the root square of each group dimension, we obtain the transformed
model:
Ly = LX + L that, looking at the c observation, corresponds to
1
1
1
yc =
Xc +
c ,
Nc
Nc
Nc
1
1
c ) =
N c 2 = 2 , i.e. transformed errors are homoskedastic.
where Var (
N
Nc
c
19
N K i =1
(2) Heteroskedastic-consistent estimator (regress, robust)
N
N
1
1
1
Var ( robust ) = (X X) ( i X i )( i X i ) (X X) = (X X) i2 X i X i (X X) 1 .
i =1
i =1
where we have c = 1, 2, ..., NC clusters and u c = iX i is the sum of observations within each
ic
cluster c; the center of the sandwich is sometimes multiplied by (N-1)/(N-K) Nc /( Nc -1) as finitesample adjustment.
Note that cluster implies robust option. The formula for the clustered estimator is simply that
of the robust (unclustered) estimator with the individual i X i replaced by their sums over each
cluster. In other terms, the standard errors are computed based on aggregate y for the Nc
independent groups.
If the variance of the clustered estimator (3) is smaller than that of the robust (unclustered)
estimator (2), it means that the cluster sums of i X i have less variability than the individual i X i .
That is, when we sum the i X i within a cluster, some of the variation gets cancelled out, and the
total variation is smaller.
This means that a big positive is summed with a big negative to produce something small; in other
words, there is negative correlation within cluster.
If the number of clusters is very small compared to the overall sample size, it could be that the
clustered standard errors (3) are quite larger than the homoskedastic ones (1), because they are
computed on aggregate data for few groups.
Interpreting a difference between (1) the OLS estimator and (2) or (3) is trickier.
In (1) the squared residuals are summed, but in (2) and (3) the residuals are multiplied by the X's
(then for (3) summed within cluster) and then squared and summed.
Hence, any difference between them has to do with very complicated relationships between the
residuals and the X's.
20
If big (in absolute value) i are paired with big Xi, then the robust variance estimate will be bigger
than the OLS estimate.
On the other hand, if the robust variance estimate is smaller than the OLS estimate, it is not clear at
all what is happening (in any case, it has to do with some odd correlations between the residuals
and the X's.
Note that if the OLS model is true, the residuals should, of course, be uncorrelated with the X's.
Indeed, if all the assumptions of the OLS model are true, then the expected values of (1) the OLS
estimator and (2) the robust (unclustered) estimator are approximately the same. So, if the robust
(unclustered) estimates are just a little smaller than the OLS estimates, it may be that the OLS
assumptions are true and we are seeing a bit of random variation. If the robust (unclustered)
estimates are much smaller than the OLS estimates, then either we are seeing a lot of random
variation (which is possible, but unlikely), or else there is something odd going on between the
residuals and the X's.
21