Sie sind auf Seite 1von 7

Heteroskedasticity

Assume that you have the regression Y


t
=
o
+
1
X
t
+
2
Z
t
+
t
. If you run this
regression you will get OLS residuals
t
c . We want to use the data and our residuals to
test if there is a heteroskedasticity problem.




Question #1: What is heteroskedasticity?

Heteroskedasticity in the regression Y
t
=
o
+
1
X
t
+
2
Z
t
+
t
means that the variance of

t
is not constant, but changes whenever some other variable changes. The other variable
may be the explanatory variable X
t
or Z
t
. Or, the other variable may be time, t. There are
several ways to write this, but one way, using X
t
and Z
t
along with their squares is

Var(
t
) = (
o
+
1
X
t
+
2
X
t
2
+
3
Z
t
+
4
Z
t
2
)
2


There are other functional relations possible. The point is that the variance of
t
depends
on some other variable which is not constant.

Thus, heteroskedasticity means var(
t
)
2
a constant.


Question #2: How can we test if a regression has heteroskedasticity?

Gretl uses Whites test which does the following:

(1) Run the regression Y
t
=
o
+
1
X
t
+
2
Z
t
+
t

(2) Obtain the residuals
t
c for t = 1,...,T
(3) Run the auxiliary regression
t t t t t o t
Z Z X X v o o o o o c + + + + + =
2
4 3
2
2 1
2

(4) Get the TR
2
statistic which has
2
with 4 degrees of freedom.
(4) Test the hypothesis H
o
: 0
4 3 2 1
= = = = o o o o using the
2
statistic with 4 df.
(5) If you reject H
o
then the regression in (1) above appears to have heteroskedasticity.
You will reject H
o
if TR
2
has a p-value smaller than 0.05


Question #3: If we use OLS to estimate the coefficients, will the estimation be good?

In general, heteroskedasticity is not a bad problem, if we only wish to estimate and not
make inferences. Under heteroskedasticity the t-statistics will typically be biased and
therefore it is difficult to determine whether or not our coefficient estimates are
statistically significant. The forecasts from a regression will also experience bias.

Question #4: How can we correct for heteroskedasticity?

Since each case of heteroskedasticity is somewhat different, there is no general rule or
method correcting for it. However, if X
t
is related to the variance, then generally we can
simply transform the regression. For example, if the variance is inversely related to X
t

then we can simply multiply both sides of the regression by X
t
or its square root. If X
t
is
positively related to the variance, then we can consider dividing both sides of the
regression by X
t
or its square root. If the variance is related to time, then we can do the
same using time, t.


Example:

Suppose that we look at the relationship between the growth in real wages (gw) in
commerce and the overall unemployment rate (u) in Taiwan.






This graph shows that there is an inverse relation between gw and u. This is as we would
expect. When unemployment is high, then we would expect that wages would grow
slowly. When unemployment is low, then wages should grow quickly.

The graph also shows that there is considerable variation about the regression line when u
is small and much less variation about the regression line when u is large. This is the
heteroskedasticity, which is a problem of regression. In this case, it is not hard to see
that there is heteroskedasticity. It is an obvious problem. Sometimes, heteroskedasticity
is not so obvious. It can be well hidden in the residuals.




Regression Results:

OLS estimates using the 97 observations 1979:1-2003:1
Dependent variable: gw

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) const 0.108494 0.0154461 7.024 < 0.00001 ***
43) u -2.49586 0.606892 -4.113 0.000083 ***

Mean of dependent variable = 0.0507331
Standard deviation of dep. var. = 0.0683549
Sum of squared residuals = 0.380762
Standard error of residuals = 0.0633089
Unadjusted R-squared = 0.151126
Adjusted R-squared = 0.14219
Degrees of freedom = 95
Durbin-Watson statistic = 1.19691
First-order autocorrelation coeff. = 0.36463


We can write our estimated regression as:

t t
u gw
) 113 . 4 ( ) 024 . 7 (
49586 . 2 108494 . 0

=

Note that we have almost 100 observations and therefore our t statistics will be
significant if | t | > 2. This is true for both the constant and the unemployment rate. The
Durbin-Watson statistic is quite low which indicates that there may be autocorrelation in
the residuals. However, we will not worry about autocorrelation now. The R
2
statistic is
not very high.....being only 0.151126. This means that the unemployment rate can only
explain about 15% of the variation in the growth of real wages. We should not be too
bothered by this result. There are many other factors which affect real wages, including
such things as the growth in productivity, changes in the composition of the labor force,
changes in the natural rate of unemployment, and changes in the regulation of industry.


We now turn to the potential problem of heteroskedasticity. We first look at a graph of
the residuals against the unemployment rate. This is shown below.










Note how that the residuals are scattered widely when u is small and are clustered
together when u is high. This is a clear case of heteroskedasticity. This means that the t
tests for our regression above are probably biased and not good indicators of statistical
significance.









Gretl allows us to automatically test for heteroskedasticity. The test for heteroskedasticity
is given below.

White's test for heteroskedasticity
OLS estimates using the 97 observations 1979:1-2003:1
Dependent variable: uhat^2

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

0) const 0.0121861 0.00393831 3.094 0.002599 ***
43) u -0.535324 0.295556 -1.811 0.073297 *
45) sq_u 6.37272 4.77252 1.335 0.185005

Unadjusted R-squared = 0.0719755

Test statistic: TR^2 = 6.981625,
with p-value = P(Chi-square(2) > 6.981625) = 0.030476


Gretl automatically runs an auxiliary regression of
2
) (
t
c on a constant, u
t
, and (u
t
)
2
. It
then computes the statistic TR
2
which is distributed as a
2
with 2 degrees of freedom
(two since the null hypothesis is that the coefficients on u
t
and (u
t
)
2
are zero. The p-value
is seen to be 0.030476 which is less than 0.05. We therefore reject the hypothesis that
there is no heteroskedasticity. We conclude that there appears to be a heteroskedasticity
problem with our regression.

The next step is to try to determine a way to reduce this heteroskedasticity problem.

We note that in the auxiliary regression u significantly affects
2
) (
t
c . Therefore, we
hypothesize that

var(
t
) =
2
/u
t
.

And therefore,

2
) var( ) var( o c q = =
t t t
u .

Thus, if we multiply both sides of the original regression equation by
t
u we should be
able to get a more stable variance with the new regression.








The new regression equation becomes

t t t o t t
u u u gw q | | + + =
2
3
1
) (


When we run this new regression using Gretl we get the following results.



OLS estimates using the 97 observations 1979:1-2003:1
Dependent variable: gw1

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

45) sqrtu 0.0945155 0.0140978 6.704 < 0.00001 ***
46) sqrt32u -1.99646 0.459607 -4.344 0.000035 ***

The model has no constant term.

Mean of dependent variable = 0.00664381
Standard deviation of dep. var. = 0.00887929
Sum of squared residuals = 0.00709129
Standard error of residuals = 0.00863974
Unadjusted R-squared = 0.0648689
Adjusted R-squared = 0.0550255
F-statistic (2, 95) = 31.8783 (p-value < 0.00001)
Durbin-Watson statistic = 1.24988
First-order autocorrelation coeff. = 0.348225



Note that the estimated coefficients are not much different than the regression before.


The results of the new estimation are given below.











The heteroskedasticity test used by Gretl is again run on our residuals and the result is

White's test for heteroskedasticity
OLS estimates using the 97 observations 1979:1-2003:1
Dependent variable: uhat^2

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)

45) sqrtu 0.0117885 0.00693129 1.701 0.092329 *
46) sqrt32u 0.694944 0.444486 1.563 0.121337
48) sq_sqrtu -0.173376 0.109028 -1.590 0.115185
49) sq_sqrt3 -13.4967 8.52143 -1.584 0.116623

The model has no constant term.
F is calculated as in Sect. 4.4 of Ramanathan's Introductory Econometrics.
R-squared is the square of the correlation between the observed and fitted
values of the dependent variable.

Unadjusted R-squared = 0.0340014

Test statistic: TR^2 = 3.298139,
with p-value = P(Chi-square(3) > 3.298139) = 0.347902


This indicates no heteroskedasticity of the type tested is now present.

Das könnte Ihnen auch gefallen