Sie sind auf Seite 1von 20

Introduction

Multicollinearity and Micronumerosity


Model Specification

Multicollinearity, Model Specification: Precision


and Bias
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009

February 9, 2009

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

The Classical Linear Model:


1

Linearity: Y = X + u.

Strict exogeneity: E(u|X) = 0

No Multicollinearity: (X) = K, w.p.1.

No heteroskedasticity/ serial correlation: V (u|X) = 2 In .

Gauss/Markov: = (X 0 X)1 X 0 Y is best linear unbiased.


This does not mean that is good. It is interesting to explore what
things make it worse: less precise (higher variance) and more
biased.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Multicollinearity, Micronumerosity and Imprecisions

A crucial assumption is the no-multicollinearity assumption,


(X) = K, which guarantees (X 0 X) is invertible, so the OLS
problem has a unique solution.
Any violation to this assumption, so (X) < K will refer to as
exact multicollinearity and elliminates the possibility of finding
unique OLS estimates.
High multicolinearity is a rather contradictory notion where
(X) = K, but the correlation among variables is not exact
but high. In such case, no classical assumptions are
removed, so the Gauss/Markov result holds.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

The following result suggest why practitioners worry about high


multicollinearity
Result:
V (j ) = h

2
(1 Rj2 )Sjj

with Rj2 is the R2 coefficient of regressing Xj on all other


P
explanatory variables, and Sjj = ni=1 (Xji Xj )2

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Proof: By the FWL theorem,


Pn

i=1
j = Pn

Y
Xji
i

2
i=1 Xji

and

V (j ) = Pn

2
i=1 Xji

2
Pn

2
Xji
Sjj
Sjj

i=1

where Xj Mj Xj and Mj is a matrix that gets residuals of


regression Xj on all other explanatory variables in the model.
The result follows by noting
Pn
Pn
2
2
i=1 Xji
i=1 Xji
2
Rj = 1
= 1 Pn

Sjj
(Xji Xj )2
i=1

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Factors affecting V (j )
Go back to our result
V (j ) =

2
1
2
=
n (1 Rj2 )(Sjj /n)
(1 Rj2 )Sjj

Later on we will see that Sjj /n should be a rather stable


magnitude. So there are three main factors that contribute to the
variance:
1

2 , the error variance.

n, the sample size.

Rj2 , the correlation between Xj and all other variables.

It is important to note that high multicolinearity affects the


variance in the same manner as the number of observations
(micronumerosity).
Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

It is interesting to remark that under high multicollinearity there


might be situations with really low t significance statistics and high
R2 and high global significance F statistics.
We have already explore that high multicollinearity induces
high variance, and hence is compatible with low ts.
R2 is related to the distance between Y and the span of X,
which does not depend on the degree of correlation among its
components.
Check carefully what significance ts mean and what global
significance F means.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Model a) High multicollinearity


cor(x,y)=0.998983
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04171
0.04426
0.943
0.348
y
0.57840
0.83608
0.692
0.491
x
1.33508
0.83893
1.591
0.115
Residual standard error: 0.4415 on 97 degrees of freedom
Multiple R-squared: 0.9635,
Adjusted R-squared: 0.9628
F-statistic: 1282 on 2 and 97 DF, p-value: < 2.2e-16

Model b) Low multicollinearity


cor(x,y1)= 0.4047114
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0009127 0.0465794
-0.02
0.984
y1
0.9773821 0.0220314
44.36
<2e-16 ***
x
1.0398014 0.0436223
23.84
<2e-16 ***
Residual standard error: 0.4655 on 97 degrees of freedom
Multiple R-squared: 0.9766,
Adjusted R-squared: 0.9762
F-statistic: 2028 on 2 and 97 DF, p-value: < 2.2e-16
Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Specification errors, bias and imprecision

So far we have considered that our linear model Y = X + u is


correct
Consider the following case
Y = X1 1 + X2 2 + u
where all classical assumptions hold K1 and K2 are the columns of
X1 and X2 . Trivially, our original model corresponds to
X = [X1 X2 ], with K = K1 + K2 .

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Consider the following scenarios regarding 2 and the


corresponding estimation strategies:
Omission of relevant variables: 2 6= 0, but we wrongly
proceed as if 2 = 0, that is, we regress Y on X1 only.
Inclusion of irrelevant variables: 2 = 0, but we wrongly
proceed as if 2 might be 6= 0, that is, regress Y on X1 and
X2 when we could have ignored X2 .

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Biases

Let us compare results for the estimation of 1 in the two scenarios


I) Omission of relevant variables
First note that in this case
Y = X1 1 + u
with u = X2 2 + u. Let 1 = (X10 X1 )1 X10 Y .
It is easy to see that 1 will be biased unless E(X2 |X1 ) = 0. This
is a really important result: not all omissions lead to biases.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

II) Inclusion of Irrelavant Variables


In this case we would estimate 1 jointly with 2 by regressing Y
on X1 and X2 , that is, 1 is a subvector of
=

1
2

= (X 0 X)1 X 0 Y

It is important to see that under the classical assumptions and


hence 1 will be unbiased. Why?

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Variances
Let us compute the bias of 1 explicitely,
1 = (X10 X1 )1 X10 Y
= (X10 X1 )1 X10 (X1 1 + X2 2 + u)
E(1 |X1 ) = 1 + (X10 X1 )1 X10 E(X2 |X1 )
{z
}
|
bias
From here, it easy to check
V (1 |X) = 2 (X10 X1 )1
Using the FWL theorem
V (1 |X) = 2 (X10 M2 X1 )1
with M2 = I X2 (X20 X2 )1 X20 .
Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification



Now: V (1 |X) V (1 |X) = 2 (X10 M2 X1 )1 (X10 X1 )1
Aside: If A B is psd, then B 1 A1 is psd. (Greene (2000, pp.49)).

Note: X10 X1 X10 M2 X1 = X10 (I M2 )X1 = X10 P2 X1 .


Since P2 is idempotent, for every c <n :
c0 X10 P2 X1 c = c0 X10 P2 P20 X1 c = z 0 z 0
for z P20 X1 c, so by the previous result V (1 |X) V (1 |X) is
positive semidefinite.
In words: the estimator that omits X2 has smaller variance.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Bias-variance trade-off

To summarize:
In practice we do not know which model holds (the large one
or the small one)?
The trade-off: estimating a small model (omit variables)
implies a gain in precision and a likely bias. A large model is
less likely to be biased and will be more inefficient.
Variable omission does not necessarily lead to biases.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

Ommited Variable Bias: an example

Computer generated data, but based on Appleton, French and


Vanderpump (Ignoring a Covariate: an Example of Simpons
Paradox, The American Statistician, 50, 4, 1996)
Y = risk of death.
SM OKE = consumption of cigarrettes.

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

. reg y smoke
Source |
SS
df
MS
-------------+-----------------------------Model | 7613.25147
1 7613.25147
Residual | 3839.18734
98 39.1753811
-------------+-----------------------------Total | 11452.4388
99
115.6812

Number of obs
F( 1,
98)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

100
194.34
0.0000
0.6648
0.6614
6.259

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smoke | -1.819348
.1305081
-13.94
0.000
-2.078337
-1.560359
_cons |
158.5975
4.774249
33.22
0.000
149.1231
168.0718
------------------------------------------------------------------------------

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Introduction
Multicollinearity and Micronumerosity
Model Specification

. reg y smoke age


Source |
SS
df
MS
-------------+-----------------------------Model | 11350.9524
2 5675.47622
Residual | 101.486373
97 1.04625126
-------------+-----------------------------Total | 11452.4388
99
115.6812

Number of obs
F( 2,
97)
Prob > F
R-squared
Adj R-squared
Root MSE

=
100
= 5424.58
= 0.0000
= 0.9911
= 0.9910
= 1.0229

-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smoke |
.9431267
.050902
18.53
0.000
.8421004
1.044153
age |
.9804631
.0164039
59.77
0.000
.9479059
1.01302
_cons |
12.84084
2.560392
5.02
0.000
7.759169
17.92251
-----------------------------------------------------------------------------. cor y smoke age
(obs=100)
|
y
smoke
age
-------------+--------------------------y |
1.0000
smoke | -0.8153
1.0000
age |
0.9797 -0.9080
1.0000

Walter Sosa-Escudero

Multicollinearity, Model Specification: Precision and Bias

Das könnte Ihnen auch gefallen