Beruflich Dokumente
Kultur Dokumente
Heteroskedasticity
A. The Concept of Variation in Error Variances. To review,
the sphericality assumption implies that we have homoskedasticity of
errors, that their variance is generally constant across cases. This is
violated, giving us another fun word to say (heteroskedasticity), when
our predictive model performs particularly poorly in some set of
cases.
When might this happen? One possibility is that we have
measurement error in some subset of our observations. Suppose you
are explaining GDP growth across countries, but you have less faith in
the GDP estimates from post-Soviet and African nations. In this case,
there should be more random variation, and thus larger mean error
variances, in these countries. Or perhaps your model of state
expenditures does fairly well in explaining variation in most regions of
the country, but just does terribly in the South. Here, errors in
southern cases will have a larger variation than errors in cases from
the rest of the country.
In either case, will not equal 2I. Even if we do not have any
covariance between our errors, the matrix will be non spherical and
look something like:
3.2
0
3.1
0
0
0
8.5
0
0
0
0
8.4
3.2 0
0 3. 1
0
0
0
0
0
0
8.5
0
0
0
0
8.4
1
3.2
1
3. 1
1
8. 5
hatGLS = (X-1X)-1X-1Y
1
8.4
Var(hatGLS) = 2(X-1X)-1
21
0
0
0
1
2
2 1
0
0
0
2
0
0
0 23
0
0
0 24
2
1 0 0 0
0 0 0
2
2
0 0 3 0
0 0 0 4
1
3
1
4
hatGLS = (X-1X)-1X-1Y
Var(hatGLS) = 2(X-1X)-1
C. How to Build Your Estimated Omega Matrix. You are
going to have to use some theory to get a general idea of the patterns,
and then use your data to fill in the s individually or as groups.
i. If you think different regions have different error
variances, then run regressions on subsets of your data and use root
mean squared errors as estimates of s.
ii. If you think your variance increases as some variable
increases (or decreases), then use the inverse of that variable (or that
variable) as your weight.
Here is a model that uses various rules of committee
procedures, elements of legislative professionalism, and the number
of bills introduced in state legislatures to explain their batting
averages of bill passage in 1997-1998.
reg batting senhear uplimit ksalary ksession kstaff introreg
Source |
SS
df
MS
Number of obs =
49
-------------+-----------------------------F( 6,
42) =
8.34
Model | .730694678
6 .121782446
Prob > F
= 0.0000
Residual | .612959937
42 .014594284
R-squared
= 0.5438
-------------+-----------------------------Adj R-squared = 0.4786
Total | 1.34365461
48 .027992804
Root MSE
= .12081
-----------------------------------------------------------------------------batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.1022993
.0469589
2.18
0.035
.0075324
.1970662
uplimit |
.100388
.0492296
2.04
0.048
.0010387
.1997373
ksalary | -.0147299
.0167533
-0.88
0.384
-.0485394
.0190795
ksession | -.0251643
.0118604
-2.12
0.040
-.0490997
-.001229
kstaff |
.0071363
.0111804
0.64
0.527
-.0154267
.0296992
introreg | -8.27e-06
4.22e-06
-1.96
0.056
-.0000168
2.36e-07
_cons |
.4732023
.0598806
7.90
0.000
.3523583
.5940463
-----------------------------------------------------------------------------predict battingrs, r
(951 missing values generated)
Rather than relying solely on my theoretical hunch, I can look at
patterns in error variance by plotting those errors that I saved after running
my initial model by variables in the model:
plot battingrs introreg
.317322 +
|
*
|
|
| **
|
R
| *
e
|
*
*
s
i
d
u
a
l
s
|
| *
* *
| * * *
| ** **
*
|
* ** **
| *
** *
| **** * *
*
*
| * *
|
*
* *
|
*
|
|
*
-.261077 +
*
+----------------------------------------------------------------+
745
bills introduced in regular legislative
32263
. reg batting senhear uplimit ksalary ksession kstaff introreg
[aweight=1/introreg]
(sum of wgt is
2.2627e-02)
Source |
SS
df
MS
-------------+-----------------------------Model | .738968006
6 .123161334
Residual | .585426577
42 .013938728
-------------+-----------------------------Total | 1.32439458
48 .027591554
Number of obs
F( 6,
42)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
49
8.84
0.0000
0.5580
0.4948
.11806
-----------------------------------------------------------------------------batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.081729
.0462613
1.77
0.085
-.01163
.175088
uplimit |
.1031672
.0435644
2.37
0.023
.0152506
.1910838
ksalary |
-.012647
.0165156
-0.77
0.448
-.0459768
.0206828
ksession | -.0206915
.0110394
-1.87
0.068
-.04297
.001587
kstaff |
.0038876
.0108663
0.36
0.722
-.0180413
.0258166
introreg | -.0000229
9.73e-06
-2.35
0.023
-.0000425
-3.26e-06
_cons |
.5147589
.0577829
8.91
0.000
.3981483
.6313696
-----------------------------------------------------------------------------. reg batting senhear uplimit ksalary ksession kstaff introreg
[aweight=introreg]
(sum of wgt is
1.9167e+05)
Source |
SS
df
MS
-------------+-----------------------------Model | .675155622
6 .112525937
Residual | .439093138
42 .010454599
-------------+-----------------------------Total | 1.11424876
48 .023213516
Number of obs
F( 6,
42)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
49
10.76
0.0000
0.6059
0.5496
.10225
-----------------------------------------------------------------------------batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.0870888
.0409285
2.13
0.039
.0044917
.169686
uplimit |
.1326276
.0524956
2.53
0.015
.0266871
.238568
ksalary | -.0171153
.0160612
-1.07
0.293
-.049528
.0152974
ksession | -.0312976
.0118149
-2.65
0.011
-.055141
-.0074541
kstaff |
.0159856
.0108125
1.48
0.147
-.005835
.0378061
introreg | -3.36e-06
1.90e-06
-1.77
0.084
-7.21e-06
4.78e-07
_cons |
.4247547
.0602255
7.05
0.000
.3032146
.5462948
------------------------------------------------------------------------------
D. What if you have no clue about Omega? You can use Whites
estimator, an option on just about all Stata commands: (See Greene, p.
219-220)
Estimated Asymptotic Variance ( )
n
1 X'X 1
X'X
i 1 ei2 xi xi ' n
n
n n
Number of obs =
F( 6,
42) =
Prob > F
=
R-squared
=
Root MSE
=
49
17.25
0.0000
0.5438
.12081
-----------------------------------------------------------------------------|
Robust
batting |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------senhear |
.1022993
.0424499
2.41
0.020
.016632
.1879666
uplimit |
.100388
.0556844
1.80
0.079
-.0119877
.2127636
ksalary | -.0147299
.0113512
-1.30
0.201
-.0376376
.0081777
ksession | -.0251643
.0125633
-2.00
0.052
-.0505182
.0001895
kstaff |
.0071363
.0110034
0.65
0.520
-.0150696
.0293421
introreg | -8.27e-06
4.79e-06
-1.73
0.091
-.0000179
1.39e-06
_cons |
.4732023
.0620882
7.62
0.000
.3479032
.5985014
------------------------------------------------------------------------------