Sie sind auf Seite 1von 24

1

ApEc 8212 Econometric Analysis -- Lecture #8



Simultaneous Equation Models (Wooldridge, Ch. 9)

I. Introduction

This is the last lecture on systems of equations (not
counting future lectures on panel data). Simultaneous
equation models are systems of equations in which
one or more of the dependent variables (the ys) is an
explanatory variable in the equation for another
dependent variable. An example of this was given at
the beginning of Lecture 7, where wages affected
hours of work and hours of work affected wages. The
model discussed today is a special case of Lecture 7:

a) all explanatory variables correlated with error
terms are also dependent variables; and

b) all IVs are explanatory variables in other equations.

In general, each equation in a system of simultaneous
equations represents a causal relationship. Another
way to think about this is that the equation should be
autonomous, that is an economic or social relationship
that makes sense without reference to the other
equations. That is, it is not just some combination of
other equations that has no meaning by itself.
2
II. Identification in a Linear System

Lets begin with a system of linear simultaneous equations:

y
1
= y
(1)

(1)
+ z
(1)

(1)
+ u
1




y
G
= y
(G)

(G)
+ z
(G)

(G)
+ u
G


where y
(h)
is all the variables in y except y
h
, for h = 1,
2, G. Vectors can vary in length across equations,
so (for any h) y
(h)
and
(h)
can have G
h
elements and
z
(h)
and
(h)
can have M
h
elements. We assume that
all these equations are structural equations. Almost
always, each z
(h)
contains a constant term.

Assume that all z variables are exogenous. That is:

E[zu
g
] = 0, g = 1, 2, G

where z includes all the z variables in the G equations.
We will always assume that E[zz] is nonsingular.

We could be slightly more restrictive and assume that:

E[u
g
| z] = 0, g = 1, 2, G

But this is not needed to prove consistency.
3
In general, there are many reasons why the y variables
are correlated with the us, so we cannot estimate this by
OLS (or GLS). For future reference define = Var(u).

A Simple Example

Consider a labor supply equation and a labor demand
equation for some population:

h
s
(w) =
1
log(w) + z
(1)

(1)
+ u
1
(supply)

h
d
(w) =
2
log(w) + z
(2)

(2)
+ u
2
(demand)

Of course, as long as markets clear we always have
supply equal to demand: h
s
(w) = h
d
(w) = h.

Define y
1
= h and y
2
= log(w). The model is:

y
1
=
1
y
2
+ z
(1)

(1)
+ u
1
(supply)

y
1
=
2
y
2
+ z
(2)

(2)
+ u
2
(demand)

It may seem odd to have y
1
as the dependent variable
in both equations, but there is nothing wrong with this.

Subtract the demand equation from the supply equation:

y
2
= z
(1)

21
+ z
(2)

22
+ v
2

4
where
21
=
(1)
/(
2
-
1
),
22
= -
(2)
/(
2
-
1
), and v
2
=
(u
1
u
2
)/(
2
-
1
).

This equation is the reduced form for y
2
because y
2

is a function of only exogenous variables (and an
error term that is uncorrelated with those variables).

To estimate the labor supply equation by instrumental
variables, we need the reduced form expression for y
2

to contain at least one z variable that is not in the
labor supply equation. That is, there must be at least
one variable in z
(2)
that is not in z
(1)
. One possibility
is the prices of the products sold by the employers.

Similarly, to estimate the labor demand equation there
must be at least one variable in z
(1)
that is not in z
(2)
.
Household characteristics unrelated to worker
productivity, such as the income of other household
members or the number of small children, may work.

Identification in a General Model

Return to the general model. Consider the first equation:

y
(1)
= y
(1)

(1)
+ z
(1)

(1)
+ u
1
= x
(1)

(1)
+ u
1


where x
(1)
simply combines y
(1)
and z
(1)
and
(1)

combines
(1)
and
(1)
.
5
Assume that a set of reduced form equations exists
for all the other y variables, which can be written as:

y
(1)
= z
(1)
+ v
(1)
, where E[zv
(1)
] = 0.

Define S
(1)
as an MM
1
selection matrix of zeros
and ones that, when pre-multiplied by z, gives z
(1)
:
z
(1)
= zS
(1)
. (M = total number of z variables.) The
rank condition for single equation IV estimation is:

rank(E[zx
(1)
]) = K
1
(see Lecture 4)

where K
1
= G
1
+ M
1
. This rank condition can be
restated in terms of the
(1)
and S
(1)
matrices because
E[zx
(1)
] = E[z(y
(1)
| z
(1)
)] = E[z(z
(1)
| zS
(1)
)] =
E[zz][
(1)
| S
(1)
]. Assuming that E[zz] has full rank,
the rank condition can be stated as:

rank([
(1)
| S
(1)
]) = G
1
+ M
1
= K
1


Since [
(1)
| S
(1)
] is an M (G
1
+M
1
) matrix, a
necessary condition for this rank condition to hold is
that M G
1
+ M
1
. This can be restated as:

Order condition: M M
1
G
1


This simply states that the number of z variables not in
equation 1 (which you can use as IVs) must equal or
exceed the number of variables that need instrumenting.
6
Identification Using the Structural Parameters

The rank condition given above was in terms of the
parameters of the reduced form estimates
(1)
for
equation 1. It is useful to state the conditions for
identification in terms of the structural parameters,
that is in terms of the s and s. This is also useful
if you want to investigate what restrictions on the
structural parameters help achieve identification. To
begin, lets write the G equations in a different way:

y
1
+ z
1
+ u
1
= 0



y
G
+ z
G
+ u
G
= 0

All we did here is subtract each y that was on the left
hand side from both sides. So now y includes all of
the y variables, hence no need for a (1) subscript.
The zs have lost there subscript because now they
include all the z variables, but some of the parameters
in the s (and in the s) will be set to zero to show
which variables are in each equation. Note that y,
and all s, are G1 column vectors, and z and all the
s are M1 column vectors. (In my notation, unlike
Wooldridge, y and z are column vectors.)

7
These G equations can be put into matrix notation as:

y + z + u = 0

where is a GG matrix, in which the gth column is

g
, is an MG matrix, in which the gth column is

g
, and u is a G1 column vector of error terms.

For a reduced form to exist, we must assume that is
nonsingular (this is usually quite plausible). We can
get the reduced form for all G equations simply by
post-multiplying by
-1
and rearranging terms:

y = - z[
-1
] - u
-1
z + v

The variance-covariance matrix for the error terms in
v can be defined as = E[vv] =
-1

-1
, = Var(u).

We can always use OLS to estimate y = z + v
because v is just a linear combination of the elements
in u, and E[zu
g
] = 0 for all g. The question is:
Under what conditions can we estimate , and ?
Without some restrictions on the elements of , or
, we cannot estimate any of the elements in them.
To see this, post multiply y + z + u = 0 by some
random GG nonsingular matrix F:

yF + zF + uF = 0 or y* + z* + u* = 0
8
where * F, * F, u* uF and Var[u*] =
FF. You can show using linear algebra that y* +
z* + u* = 0 and y + z + u = 0 have the same
reduced form (just post-multiply both sides of y*
+ z* + u* = 0 by *
-1
). So the reduced form alone
does not allow us to obtain estimates of , and .

So what can we do? First, introduce a last new piece
of notation. Define B
(

. Then define the set of


admissible linear transformations as all GG
nonsingular matrices, denoted again by F, for which:

1. BF satisfies all restrictions on B.

2. FF satisfies all restrictions on .

In fact, the only way that we can identify , and
is to set enough restrictions so that the only
admissible F is the identity matrix I
G
.

What do we mean by restrictions? Usually, we are
most interested in identifying B (i.e. and ), so we
set restrictions on it. We usually let be unrestricted
(though we will see one exception below).

Two very common types of restrictions are:
9
1. Normalization.

For each equation, normalize one
g
coefficient to be
equal to -1. For example, in the first equation set
11

to -1. This means we can write y
1
+ z
1
+ u
1
= 0 as
y
(1)
= y
(1)

(1)
+ z
(1)

(1)
+ u
1
. In the second equation,
normalize
2
as -1, and so on. This yields G
restrictions just by using unrestrictive normalizations.

2. Zero restrictions & linear restrictions within equations.

Set some elements of B equal to zero, or impose
some restrictions on linear combinations of elements
of B. To see how this works, consider the first
equation. Define
1
as the (G+M)1 column vector
that stacks
1
on top of
1
. Both types of restrictions,
applied to equation (1), can be expressed as requiring

R
1

1
= 0

where R
1
is a J
1
(G+M) matrix of known constants,
and J
1
is the number of restrictions.

An example will make this clearer. Consider the
first equation within a 3 equation system that has 4 z
variables (so G = 3 and M = 4):

y
1
=
12
y
2
+
13
y
3
+
11
z
1
+
12
z
2
+
13
z
3
+
14
z
4
+ u
1

10
Note that
1
= (-1,
12
,
13
) and
1
= (
11
,
12
,
13
,
14
).
(Also, setting z
1
= 1 gives a constant term.)

Suppose that we want 2 restrictions:
12
= 0 and
13
+

14
= 3. Then J
1
= 2 and R
1
is:

R
1
= |
.
|

\
|
1
0
1 0
0 0
0 0
0 0
0 3
1 0


Requiring R
1

1
= 0 gives these 2 restrictions.

Returning to the general case, how many restrictions
are enough to identify the parameters in equation 1
(identify
1
)? Let F be some GG nonsingular
matrix, and write it in terms of its column vectors so
that F = [f
1
|f
2
| f
G
]. Finally, use F to construct B* =
BF, that is, B* is a linear transformation of B, and the
first column in B* is
1
* Bf
1
.

We need to find some restrictions or conditions so
that
1
* does not imitate or be an imposter for
1
.
Impostering occurs if
1
* satisfies the restriction R
1
:

R
1

1
* = R
1
(Bf
1
) = (R
1
B)f
1
= 0

This equality will hold if f
1
= e
1
(1, 0, 0, 0), since
in this case
1
* = Bf
1
=
1
. Ignoring normalization, it
also holds for any f
1
= c
1
e
1
, where c
1
is any constant.
11
In fact, identification holds if these are the only forms
of the vector f
1
that satisfy R
1

1
* = 0. If there is some
other vector f
1
that satisfies R
1

1
* = 0, there is some

1
* that is an imposter of
1
, and
1
is not identified.

In other words, if R
1

1
* = 0 is true only for f
1
vectors
of the form c
1
e
1
, then the null space of R
1
B (a J
1
G
matrix) has dimension unity. This implies that the
rank condition for identifying
1
in equation 1 is:

Rank(R
1
B) = G - 1

Theorem 9.2 (Rank Condition for Identification):
Let
1
be the (G+M)1 column vector of structural
parameters for equation 1, with one parameter of
1

set to -1. Let the restrictions on
1
be R
1

1
= 0. Then

1
is identified if and only if rank(R
1
B) = G 1.

There are 2 practical problems when you apply this:

1. Even if you are interested only in equation 1,
you need to specify the entire B, that is you need
to specify all M equations.

2. You dont yet know the parameters for B, since
you havent estimated it yet. But in fact we can
still use it, as will be seen in an example below.

12
Lets write the J
1
G matrix R
1
B as [R
1

1
| R
1

2
| R
1

G
].
Since the restrictions assumed are that R
1

1
= 0, the
first column of R
1
B is a vector of zeros. Thus at
most R
1
B has a rank of G 1. So the really question
behind this rank condition is whether any of the
remaining columns of R
1
B are linear combinations of
each other; if not, then the rank condition is satisfied.

Since is nonsingular, it has a rank of G, and so B
must also have a rank of G (no matter what is in ).
You can use linear algebra to show that this implies
that the rank condition holds only if rank(R
1
) G 1.
This is a necessary but not sufficient. But as long as
the restrictions in R
1
are not redundant, rank(R
1
)
must be J
1
. This gives the following order condition:

Theorem 9.3 (Order Condition for Identification):
A necessary condition for identifying equation 1 is:

J
1
G - 1

That is, the number of restrictions has to be at least as
large as one less the number of endogenous variables
in the system. This is easy to check. Note that we
have G 1 and not G because the normalization of

11
to be -1 is the Gth restriction.

This order condition is quite intuitive in terms of IV
estimation. Focusing on the most common restriction,
13
that some parameter equals zero, then each restriction
either reduces the number of endogenous variables
that we need to instrument (by setting one element of

1
equal to zero) or provides us with an instrument
(by setting one element of
1
equal to zero).

To summarize, here is what you do to figure out if
equation 1 is identified:

1. Set one element of
1
(usually
11
) to be -1.

2. Based on economic theory (or any other
arguments you can come up with), set up R
1
, the
J
1
(G+M) matrix of restrictions that apply to
1
.

3. If J
1
< G 1, equation 1 is not identified (oops!).

4. If J
1
G 1, equation 1 may be identified. You
will need to check the rank condition that
Rank(R
1
B) = G 1, so you will have to specify
restrictions for the other equations (e.g. which
parameters in other equations can be set to zero).

There is one piece of good news. In almost all cases
if the order condition holds then so does the rank
condition, so in practice most people dont check the
rank condition (unless they get very strange results).

14
But as a warning, here is an example of a model with
3 equations and 4 exogenous variables in which the
order condition holds but the rank condition fails:

y
1
=
12
y
2
+
13
y
3
+
11
z
1
+
13
z
3
+ u
1


y
2
=
21
y
1
+
21
z
1
+ u
2


y
3
=
31
z
1
+
32
z
2
+
33
z
3
+
34
z
4
+ u
13


Assume that z
1
is a constant. Consider equation 1. It
has a normalization and 2 restrictions:
12
= 0,
14
= 0.
The order condition is satisfied since J
1
= 2 and G = 3.

But look at the rank condition. It can be written as:

R
1
= |
.
|

\
|
1
0
0 0
0 1
0 0
0 0
0 0
0 0


B is a 73 matrix of parameters (w/ 3 normalizations):

B =
34 24 14
33 23 13
32 22 12
31 21 11
23 13
32 12
31 21
1
1
1
o o o
o o o
o o o
o o o




15
We get R
1
B =
(

o
o
o o
o o
34
32
24 14
22 12
. This looks OK, but
when we impose all of the system restrictions we get:
R
1
B =

(
o
o
34
32
0 0
0 0
. Rank[R
1
B] =1; rank condition fails.

The intuition here is simple. For equation 1 we need 2
instruments, for y
2
and y
3
. In theory, we have two: z
2

and z
4
. But in equation 2 neither z
2
nor z
4
have explan-
atory power for y
2
. So we dont have good IVs for y
2
.

Note: The order condition in Theorem 9.3 is the same
as the order condition M M
1
G
1
if the restrictions
imposed are just normalizations of one y variable in
each equation and exclusion restrictions (setting some
parameters equal to zero). See Wooldridge, pp.250-51.

Finally, some rather obvious terminology.

1. If the rank condition fails, we say the equation is
underidentified, which means it cannot be
estimated (unless addition restrictions are added).

2. If the rank condition holds exactly, then the
equation is just identified.

3. If the rank condition holds with a strict
inequality, the equation if overidentified.
16
Once we figure out which equations are identified,
we can estimate them using the methods in Lecture 7.
This is discussed on pp.252-256 of Wooldridge. You
can also use cross-equation or covariance restrictions
to identify equations. See Wooldridge, pp.256-261.


III. Exogeneity and Causality (Davidson and
MacKinnon, Chapter 18, Section 2)

Economists and econometricians often call variables
exogenous or endogenous without being precise
about the meaning. In fact, there is more than one
kind of exogeneity. For time series models (models
where the observations have a specific order over
time) the concepts are rather subtle. For non time
series models exogeneity is much simpler and quite
intuitive. Finally, there is the concept of causality,
which is related to exogeneity but is a distinct concept.

Exogeneity

Wooldridge gives three definitions of exogeneity for
analysis of panel data. The first two are:

Contemporaneous Exogeneity E[u
t
| x
t
] = 0, t = 1, 2, T

Strict Exogeneity E[u
t
| x
1
, x
2
x
T
] = 0, t = 1, 2, T

17
Strict exogeneity implies contemp. exogeneity, but
contemp. exogeneity does not imply strict exogeneity.

For pure cross-sectional data (only 1 time period) these
two definitions are identical; they are slightly stronger
versions of Assumption OLS.1 (Lecture 3, E[xu] = 0).

The difference between these two definitions is
important when some of the x variables are lagged
values of y. In this case contemporaneous exogeneity
still holds, but allowing lagged values of ys implies
that the x
t
could be correlated with past values of u.
Yet it is unlikely that x
t
will be correlated with future
values of u, so Wooldridge gives a third definition of
exogeneity (for use in dynamic panel data models):

Sequential Exogeneity E[u
t
| x
1
, x
2
x
t
] = 0, t = 1, 2, T

In contrast, many economic models use the term
exogenous in a different way. They specify that
exogenous variables come from outside the model or
are not controlled by the economic decisionmaker.
This definition is not the same as the econometric
definitions given above. First, it is possible that an
exogenous variable (i.e. one determined outside the
system) is correlated with the error in a regression.
An example is estimation of a farm production
function. Suppose that two variables that determine
18
productivity are rainfall and sunshine. These are
clearly exogenous in the economic sense. They are
also correlated (it only rains on cloudy days).
Suppose we have data on rainfall but not on sunshine,
so sunshine is an omitted variable that goes into the
error term. In this case rainfall is correlated with the
error term, since rainfall and sunshine are correlated.

Second, there can be a model with variables that are
endogenous in the economic sense on the right-hand
side of an econometric model, yet the error term is not
correlated with them. Consider another farm production
function. Suppose output is determined by fertilizer,
labor and rainfall. We do not have data on rainfall, so it
ends up in the error term. Fertilizer and labor inputs are
clearly endogenous in the economic sense; the farmer
chooses them. Yet the error term (rainfall) is pretty
random and thus may be uncorrelated with fertilizer and
labor (assuming they are chosen before rainfall occurs).

Thus the economic definition of exogeneity is not the
same concept as a definition of exogeneity based on
lack of correlation between a right hand side
variable and the error term. We now examine
econometric/statistical definitions of exogeneity that
are deeper than lack of correlation between the
error and a right-hand side variable (but are still not
the same as the economic definition of exogeneity).
19
Precise Statistical Definitions of Exogeneity

With this introduction, we now turn to the general
econometric/statistical approach to exogeneity. Start
with a system of M structural equations (new notation):

y
t
'I + x
t
'B = c
t
'

where I is an MM matrix (M is the number of y
variables) and B is a KM matrix (K is the number of
x variables). What do we mean when we say that x
t
'
is exogenous? The first concept is strict exogeneity:
x
t
'

is strictly exogenous if all elements of x
t
' are
independent of all the error terms in the vector c
s
', for
all t from 1 to T and all s from 1 to T. This is denoted:

x
t
'

c
s
' for all t, s = 1, 2, T

where

denotes statistical independence. Note


that this implies Wooldridges definition of strict
exogeneity, since the conditional distribution of any
variable (and thus any conditional moment of that
variable), conditioning on some independent variable,
equals the unconditional distribution of that variable.

For time series and panel data models this may be
too strong. In particular, we may want to use past
20
values of y as exogenous variables. In this notation
such variables are part of the x variables, not the y
variables. A weaker condition that lets us use past
values of y as x variables is predeterminedness: x
t
' is
independent only of current and future values of c
s
':

x
t
'

c
t+s
' for all t = 1, 2, T, for all s > 0

This allows one to use past values of ys as x variables.
Note that this implies (but is not implied by) the
sequential exogeneity definition of Wooldridge.

It turns out that predeterminedness is not very
useful without specifying more about the equation
that the x variables (including lagged values of y) are
in. For example, consider a very simple model:

y
t
= |x
t
+ c
1t


x
t
= o
1
x
t-1
+ o
2
y
t-1
+ c
2t


where E[c
t
c
t
'] =
(

22 12
12 11
o o
o o
, E[c
t
c
s
'] =
(

0 0
0 0
if s = t.

In this model x
t
is not predetermined in the first
equation if o
12
= 0 (why?), so OLS estimation of the
first equation will be biased (and inconsistent).
21
Next, consider the expectation of y
t
conditional on x
t

and past values of x
t
and y
t
.

E[y
t
| x
t
, y
t-1
, x
t-1
, ] = |x
t
+ E[c
1t
| x
t
, y
t-1
, x
t-1
, ]

= |x
t
+ E[c
1t
|c
2t
] (cs independent over time)

= |x
t
+ (o
12
/o
22
)c
2t
(check any statistics textbook)

= |x
t
+ (o
12
/o
22
)(x
t
- o
1
x
t-1
- o
2
y
t-1
).

This implies that we can write:

y
t
= (| + o
12
/o
22
)x
t
- o
1
o
12
/o
22
x
t-1
- o
2
o
12
/o
22
y
t-1
+ v
t


where v
t
is uncorrelated with x
t
.

So rewriting the equation for y
t
so that it is a function
not only of x
t
but also of x
t-1
and y
t-1
, gives an equation
where x
t
is predetermined because it is no longer
correlated with the error term. Note that in that
equation the parameter on x
t
is not |, but (| + o
12
/o
22
),
and the error is also different. The intuition here is
that correlation between the error term and some
variable x depends on the other variables in the model.

For times series analysis economists want something
like predeterminedness that clarifies exactly what
22
models or equations are under consideration. This
leads to weak exogeneity, a more abstract concept.

To define weak exogeneity, first note that the joint
distribution of the variables in y
t
' and x
t
' conditional on
past values of x' and y' can be expressed as the product
of the joint distribution of y
t
' conditional on x
t
' and
past values of x
t
' and y
t
' and the joint distribution of x
t
'
conditional on past values of x
t
' and y
t
'. That is,
regardless of the exogeneity of x
t
' we can write:

f
t
(x
t
', y
t
' O
t-1
;u) = f
t
y
(y
t
' x
t
', O
t-1
;u) f
t
x
(x
t
' O
t-1
;u)

where O
t-1
represents past values of x and y from time 1
to t-1, and u is the vector of parameters that determine
the joint distribution of x and y conditional on O
t-1
.
We are now ready to define weak exogeneity.
Suppose that we can divide u into two components
such that u = (u
1
, u
2
). Then the variables x
t
' are
weakly exogenous for estimation of u
1
if we can
rewrite the above conditional density function as:

f
t
(x
t
', y
t
' O
t-1
;u) = f
t
y
(y
t
' x
t
', O
t-1
;u
1
) f
t
x
(x
t
' O
t-1
;u
2
)

This is pretty abstract, but the intuition is that y
t
' has no
feedback into the current (marginal) distribution of
x
t
' over and above what past values of y' and x' have.
23
The key things to remember are:

1. Weak exogeneity is less restrictive than strict
exogeneity; you can use past values of y as regressors.

2. Weak exogeneity is only relevant for models that
have some kind of time series element to them.

3. Predeterminedness is not weak exogeneity.
Predeteminedness refers to a specific equation
in a model, while weak exogeneity refers to
the joint distribution of all variables in all
equations of the model and all the parameters
and equations underlying that joint distribution.

Granger (Non)Causality

This concept is less abstract. In fact, the real thing
we are interested is called Granger noncausality.
Recall that x
t
is weakly exogenous if y
t
has no
explanatory power for x
t
, but it is still possible for
past values of y (i.e. y
t-1
, y
t-2
, etc.) to play a role in
determining x
t
even after conditioning on past values
of x. Granger noncausality rules out any role for
past values of y after accounting for the predictive
power of past values of the x variables.

24
The definition is: Past values of y (denoted by y
t-1
) do
not Granger cause x
t
' (current values of x) if and only if:

f
t
x
(x
t
'| O
t-1
) = f
t
x
(x
t
'| x
t-1
)

where x
t-1
represents all past values of x variables. Thus
past values of y have no predictive power for current
values of x once past values of x have been used.

Granger noncausality is concerned with the ability to
forecast future values of x and y variables. If it holds past
values of y do not help forecast current and future x.

Three final comments:

1. Granger noncausality and weak exogeneity are not
the same thing. You can have one without the other
and neither implies the other. Going back to the 2
equation model on p.20, y does not Granger cause x
if and only if
2
= 0. In contrast, x is weakly
exogenous with respect to estimating if and only if

12
= 0 (see Davidson and MacKinnon, pp.628-629).

2. Granger noncausality is used for forecasting, and
weak exogeneity is used to estimate structural models.

3. For specific parameters, if x
t
' is weakly exogenous
and past values of y do not Granger cause x
t
', then we
say that x
t
' is strongly exogenous.

Das könnte Ihnen auch gefallen