Specification Econometria

Introduction
Multicollinearity and Micronumerosity

Model Specification
Multicollinearity, Model Specification: Precision

and Bias
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
February 9, 2009
Multicollinearity, Model Specification: Precision and Bias
Introduction
Model Specification
The Classical Linear Model:

1
Linearity: Y = X + u.
Strict exogeneity: E(u|X) = 0
No Multicollinearity: (X) = K, w.p.1.
No heteroskedasticity/ serial correlation: V (u|X) = 2 In .
Gauss/Markov: = (X 0 X)1 X 0 Y is best linear unbiased.

This does not mean that is good. It is interesting to explore what
things make it worse: less precise (higher variance) and more
biased.
Introduction
Model Specification
Multicollinearity, Micronumerosity and Imprecisions
A crucial assumption is the no-multicollinearity assumption,

(X) = K, which guarantees (X 0 X) is invertible, so the OLS
problem has a unique solution.
Any violation to this assumption, so (X) < K will refer to as
exact multicollinearity and elliminates the possibility of finding
unique OLS estimates.
High multicolinearity is a rather contradictory notion where
(X) = K, but the correlation among variables is not exact
but high. In such case, no classical assumptions are
removed, so the Gauss/Markov result holds.
Introduction
Model Specification
The following result suggest why practitioners worry about high

multicollinearity
Result:
V (j ) = h
2
(1 Rj2 )Sjj
with Rj2 is the R2 coefficient of regressing Xj on all other

P
explanatory variables, and Sjj = ni=1 (Xji Xj )2
Introduction
Model Specification
Proof: By the FWL theorem,

Pn
i=1
j = Pn
Y
Xji
i
2
i=1 Xji
and
V (j ) = Pn
2
i=1 Xji
2
Pn
2
Xji
Sjj
Sjj
i=1
where Xj Mj Xj and Mj is a matrix that gets residuals of

regression Xj on all other explanatory variables in the model.
The result follows by noting
Pn
Pn
2
2
i=1 Xji
i=1 Xji
2
Rj = 1
= 1 Pn

Sjj
(Xji Xj )2
i=1
Introduction
Model Specification
Factors affecting V (j )
Go back to our result
V (j ) =
2
1
2
=
n (1 Rj2 )(Sjj /n)
(1 Rj2 )Sjj
Later on we will see that Sjj /n should be a rather stable

magnitude. So there are three main factors that contribute to the
variance:
1
2 , the error variance.
n, the sample size.
Rj2 , the correlation between Xj and all other variables.
It is important to note that high multicolinearity affects the

variance in the same manner as the number of observations
(micronumerosity).
Introduction
Model Specification
It is interesting to remark that under high multicollinearity there

might be situations with really low t significance statistics and high
R2 and high global significance F statistics.
We have already explore that high multicollinearity induces
high variance, and hence is compatible with low ts.
R2 is related to the distance between Y and the span of X,
which does not depend on the degree of correlation among its
components.
Check carefully what significance ts mean and what global
significance F means.
Introduction
Model Specification
Introduction
Model Specification
Introduction
Model Specification
Model a) High multicollinearity

cor(x,y)=0.998983
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04171
0.04426
0.943
0.348
y
0.57840
0.83608
0.692
0.491
x
1.33508
0.83893
1.591
0.115
Residual standard error: 0.4415 on 97 degrees of freedom
Multiple R-squared: 0.9635,
Adjusted R-squared: 0.9628
F-statistic: 1282 on 2 and 97 DF, p-value: < 2.2e-16
Model b) Low multicollinearity

cor(x,y1)= 0.4047114
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0009127 0.0465794
-0.02
0.984
y1
0.9773821 0.0220314
44.36
<2e-16 ***
x
1.0398014 0.0436223
23.84
<2e-16 ***
Residual standard error: 0.4655 on 97 degrees of freedom
Multiple R-squared: 0.9766,
Adjusted R-squared: 0.9762
F-statistic: 2028 on 2 and 97 DF, p-value: < 2.2e-16
Introduction
Model Specification
Specification errors, bias and imprecision
So far we have considered that our linear model Y = X + u is

correct
Consider the following case
Y = X1 1 + X2 2 + u
where all classical assumptions hold K1 and K2 are the columns of
X1 and X2 . Trivially, our original model corresponds to
X = [X1 X2 ], with K = K1 + K2 .
Introduction
Model Specification
Consider the following scenarios regarding 2 and the

corresponding estimation strategies:
Omission of relevant variables: 2 6= 0, but we wrongly
proceed as if 2 = 0, that is, we regress Y on X1 only.
Inclusion of irrelevant variables: 2 = 0, but we wrongly
proceed as if 2 might be 6= 0, that is, regress Y on X1 and
X2 when we could have ignored X2 .
Introduction
Model Specification
Biases
Let us compare results for the estimation of 1 in the two scenarios

I) Omission of relevant variables
First note that in this case
Y = X1 1 + u
with u = X2 2 + u. Let 1 = (X10 X1 )1 X10 Y .
It is easy to see that 1 will be biased unless E(X2 |X1 ) = 0. This
is a really important result: not all omissions lead to biases.
Introduction
Model Specification
II) Inclusion of Irrelavant Variables

In this case we would estimate 1 jointly with 2 by regressing Y
on X1 and X2 , that is, 1 is a subvector of
=
1
2
= (X 0 X)1 X 0 Y
It is important to see that under the classical assumptions and

hence 1 will be unbiased. Why?
Introduction
Model Specification
Variances
Let us compute the bias of 1 explicitely,
1 = (X10 X1 )1 X10 Y
= (X10 X1 )1 X10 (X1 1 + X2 2 + u)
E(1 |X1 ) = 1 + (X10 X1 )1 X10 E(X2 |X1 )
{z
}
|
bias
From here, it easy to check
V (1 |X) = 2 (X10 X1 )1
Using the FWL theorem
V (1 |X) = 2 (X10 M2 X1 )1
with M2 = I X2 (X20 X2 )1 X20 .
Introduction
Model Specification

Now: V (1 |X) V (1 |X) = 2 (X10 M2 X1 )1 (X10 X1 )1
Aside: If A B is psd, then B 1 A1 is psd. (Greene (2000, pp.49)).
Note: X10 X1 X10 M2 X1 = X10 (I M2 )X1 = X10 P2 X1 .

Since P2 is idempotent, for every c <n :
c0 X10 P2 X1 c = c0 X10 P2 P20 X1 c = z 0 z 0
for z P20 X1 c, so by the previous result V (1 |X) V (1 |X) is
positive semidefinite.
In words: the estimator that omits X2 has smaller variance.
Introduction
Model Specification
Bias-variance trade-off
To summarize:
In practice we do not know which model holds (the large one
or the small one)?
The trade-off: estimating a small model (omit variables)
implies a gain in precision and a likely bias. A large model is
less likely to be biased and will be more inefficient.
Variable omission does not necessarily lead to biases.
Introduction
Model Specification
Ommited Variable Bias: an example
Computer generated data, but based on Appleton, French and

Vanderpump (Ignoring a Covariate: an Example of Simpons
Paradox, The American Statistician, 50, 4, 1996)
Y = risk of death.
SM OKE = consumption of cigarrettes.
Introduction
Model Specification
. reg y smoke
Source |
SS
df
MS
-------------+-----------------------------Model | 7613.25147
1 7613.25147
Residual | 3839.18734
98 39.1753811
-------------+-----------------------------Total | 11452.4388
99
115.6812
Number of obs
F( 1,
98)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
100
194.34
0.0000
0.6648
0.6614
6.259
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smoke | -1.819348
.1305081
-13.94
0.000
-2.078337
-1.560359
_cons |
158.5975
4.774249
33.22
0.000
149.1231
168.0718
------------------------------------------------------------------------------
Introduction
Model Specification
. reg y smoke age

Source |
SS
df
MS
-------------+-----------------------------Model | 11350.9524
2 5675.47622
Residual | 101.486373
97 1.04625126
-------------+-----------------------------Total | 11452.4388
99
115.6812
Number of obs
F( 2,
97)
Prob > F
R-squared
Adj R-squared
Root MSE
=
100
= 5424.58
= 0.0000
= 0.9911
= 0.9910
= 1.0229
-----------------------------------------------------------------------------y |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smoke |
.9431267
.050902
18.53
0.000
.8421004
1.044153
age |
.9804631
.0164039
59.77
0.000
.9479059
1.01302
_cons |
12.84084
2.560392
5.02
0.000
7.759169
17.92251
-----------------------------------------------------------------------------. cor y smoke age
(obs=100)
|
y
smoke
age
-------------+--------------------------y |
1.0000
smoke | -0.8153
1.0000
age |
0.9797 -0.9080
1.0000

Specification Econometria

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Specification Econometria

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction

Multicollinearity and Micronumerosity

Multicollinearity, Model Specification: Precision

Multicollinearity, Model Specification: Precision and Bias

The Classical Linear Model:

Strict exogeneity: E(u|X) = 0

No Multicollinearity: (X) = K, w.p.1.

No heteroskedasticity/ serial correlation: V (u|X) = 2 In .

Gauss/Markov: = (X 0 X)1 X 0 Y is best linear unbiased.

Multicollinearity, Model Specification: Precision and Bias

Multicollinearity, Micronumerosity and Imprecisions

A crucial assumption is the no-multicollinearity assumption,

Multicollinearity, Model Specification: Precision and Bias

The following result suggest why practitioners worry about high

with Rj2 is the R2 coefficient of regressing Xj on all other

Multicollinearity, Model Specification: Precision and Bias

Proof: By the FWL theorem,

where Xj Mj Xj and Mj is a matrix that gets residuals of

Multicollinearity, Model Specification: Precision and Bias

Later on we will see that Sjj /n should be a rather stable

2 , the error variance.

n, the sample size.

Rj2 , the correlation between Xj and all other variables.

It is important to note that high multicolinearity affects the

Multicollinearity, Model Specification: Precision and Bias

It is interesting to remark that under high multicollinearity there

Multicollinearity, Model Specification: Precision and Bias

Multicollinearity, Model Specification: Precision and Bias

Multicollinearity, Model Specification: Precision and Bias

Model a) High multicollinearity

Model b) Low multicollinearity

Multicollinearity, Model Specification: Precision and Bias

Specification errors, bias and imprecision

So far we have considered that our linear model Y = X + u is

Multicollinearity, Model Specification: Precision and Bias

Consider the following scenarios regarding 2 and the

Multicollinearity, Model Specification: Precision and Bias

Let us compare results for the estimation of 1 in the two scenarios

Multicollinearity, Model Specification: Precision and Bias

II) Inclusion of Irrelavant Variables

It is important to see that under the classical assumptions and

Multicollinearity, Model Specification: Precision and Bias

Multicollinearity, Model Specification: Precision and Bias

Note: X10 X1 X10 M2 X1 = X10 (I M2 )X1 = X10 P2 X1 .

Multicollinearity, Model Specification: Precision and Bias

Multicollinearity, Model Specification: Precision and Bias

Ommited Variable Bias: an example

Computer generated data, but based on Appleton, French and

Multicollinearity, Model Specification: Precision and Bias

Multicollinearity, Model Specification: Precision and Bias

. reg y smoke age

Multicollinearity, Model Specification: Precision and Bias

Das könnte Ihnen auch gefallen