Sie sind auf Seite 1von 3

Hausman test

1. First read the bias and efficiency on the web page.

2. There may be good reasons to suspect a failure of the zero-conditional-mean assumption. Turing to
IV or Generalized Method of Moments (GMM) for the sake of consistency must be balanced against
the inevitable loss of efficiency. The asymptotic variance of the IV estimator is always larger and
sometimes much larger than the OLS estimates.
3. GMM can have very poor small sample properties and if we are working with a small sample, it is
probably better to work with OLS instead.
4. Loss of efficiency is worth it and can be made up for, in part, by getting a larger sample.
5. The Hausman test is essentially a test of whether the loss in efficiency is worth removing the bias and
inconsistency of the OLS estimators.

6. The question is whether we can just treat the endogenous variables as if they were exogenous and not
worry about the correlation with the error.

Test statistic
1. The Hausman test involves fitting the model by both IV and OLS and compares a weighted square of
the difference between the two β estimators.
2. The weights is simply the difference in the variances between the consistent (IV) estimator and the
efficient (OLS) estimator.
3. The test is given by

β̂ ols
= � 2

β̂ ols = β+ � 2
� � � 2
x σ2
var β̂ ols = σ2 � 2 = �
( x2 ) x2
where normally P ux = 0. But if β ols is biased, then it is not. For IV

β̂ IV = � 2

β̂ IV = β+�
� � � 2
var β̂ IV = σ2 � 2
( xz)

Next define

= β̂ IV − β̂ OLS

� 2
z σ2
var(β̂ IV ) − var(β̂ OLS ) = σ 2 � 2 − �
( xz) x2
�� � �
s2 z 2 x2
= � 2 � 2 −1
x ( xz)

now the correlation coefficient rxz

� � 2
( xz)
rxz = � 2� 2
z x
� �
s2 1
var(β̂ OLS ) − var(β̂ IV ) = � 2 2
− 1
x rxz

Test statistic is � �2
β̂ IV − β̂ OLS
var(β̂ OLS ) − var(β̂ IV )

4. The matrix algebra view of this is

χ2 = (βˆc − βˆe )� [var(βc ) − (var(βe )−1 ](βˆc − βˆe )

where βc is the IV estimate and βˆe is the OLS estimate.

5. The test statistic is distributed as χ2 with kendog degrees of freedom, the number of endogenous
6. The test is perhaps best interpreted as not as a test of endogeneity (which it is typically) but rather
as a test of different estimators.
7. Under the null, the OLS is the appropriate estimation technique.

8. There are three ways to conduct the test in Stata.

(a) Fit the less efficient but consistent model. Then use the command estimates store iv. Next fit
the OLS model using regress followed by hausman iv, constant sigmamore
(b) Fig the OLS using ivreg2 and specify the regressors to be tested in the orthog() option.
(c) Fit IV model using ivreg and use ivendog to conduct an endogeneity test. The ivendog
command takes as arguments the variables you want to test for endogeneity. If the variable list
is empty, the full set of endogenous regressors is tested.
9. Methods (b) and (c) are easier to use since they involve one command, but you have to have ivreg2
installed on your computer (use findit ivreg2).

10. Baum (2006) points out that the Hausman command often fails, producing a negative χ2 statistic
and that these tests are not fully consistent with each other and can give conflicting results.
11. See help(hausman) in Stata. From the Stat help file:
The assumption that one of the estimators is efficient (i.e., has minimal asymptotic vari-
ance) is a demanding one. It is violated, for instance, if your observations are clustered or
pre-weighted, or if your model is somehow misspecified. Moreover, even if the assumption
is satisfied, there may be a ”small sample” problem with the Hausman test [due to the
lack of positive definiteness ”in finite samples”,] i.e., in your application. If this is the case,
the Hausman test is undefined. Unfortunately, this is not a rare event. Stata supports a
generalized Hausman test, suet that overcomes both of these problems.

12. Note that Baum reports that this command does not support the ivreg estimator.”

Field Technique
1. Similar to a test for measurement error

2. If there is no simultaneity, then OLS should generate efficient and consistent parameter estimates.
3. IV consistent but inefficient
4. Consider SD example again
q = αp + u
where q is quantity supplied. Income and costs are exogenous and therefore good instruments.
5. The null of no simultaneity says that p and u are uncorrelated.

6. To performan Hausman compute reduced form parameters

p̂ = π1 y + π2 C
p = p̂ + v̂

7. These πs can be estimated using OLS without violating assumptions of standard linear regression
8. p is the population parameter and is efficiently and consistently estimated.
9. Substitute this into the above equation

q = αp̂ + αv̂ + u

10. Now estimate

q = αp̂ + δv̂ + u (1)

11. Under null of no simultaneity, correlation between residuals, v, and teh error term will be zero as
sample size goes large.
12. Since we already know that that the correlation between p̂ and the error is zero, this will give a
consistent estimate of alpha.
13. However, δ will be inefficiently estimated because a degree of freedom is used up.
14. This suggests a simple test. Rewrite eqn 2 as

q = αp + (δ − α)v̂ + u (2)

where we have substituted p̂ = p − hatv.