Michael Murray SUR and SES

Web Extension 5
Sections 14.10 and 14.11;

Appendix 14.A
More Estimators for Systems
of Equations
14.10
WHAT IS THE DGP?
Seemingly Unrelated Regressions

In learning about simultaneous equations models, our central
focus has been on the endogenous explanators that make ordinary least squares (OLS) inconsistent and that threaten the
identification of the equations. What about systems of equations in which there are only untroublesome explanators?
OLS is consistent when applied to such equations because the
explanators are contemporaneously uncorrelated with the disturbances. Is there anything more to be said about such
seemingly unrelated regressions?1 Econometrician Arnold
Zellner of the University of Chicago, who first examined such
multiple equations, did find more to say. Zellner found that
systems of equations sometimes permit more efficient estimation than if the individual equations were estimated one at a
time, separately from one another. Unrelated systems of equations with no troublesome explanators, but with contemporaneous correlations across the disturbances of the equations
Zellner named seemingly unrelated regressions SUR. Zellner
showed how we can estimate such equations more efficiently
by estimating them all together rather than by using OLS to
estimate each equation separately. This section introduces
EXT 5-1
EXT 5-2
Web Extension 5
SUR here as a stepping stone to the next section, in which we examine simultaneous equations models with disturbances that are correlated across equations.
The DGP
To illustrate a system of seemingly unrelated equations, consider a sample of
firms, all of which produce two products, quilts and mattresses. The per unit cost
of producing quilts, Cqi, depends on the price of linen, Pli and the price of dyes, Pdi .
The per unit cost of producing mattresses, Cm
i , depends on the price of linen and
the price of foam, Pfi.
Thus we have a DGP with two equations:
Cqi = a0 + a1Pli + a2Pdi + eqi
14.14
l
f
m
Cm
i = b 0 + b 1Pi + b 2Pi + ei ,
14.15
and
in which eqi and em

i both satisfy the GaussMarkov Assumptions individually and
have the further shared characteristics that cov(eqi, em
j ) = 0 for i not equal to j,
and cov(eqi, em
i ) = sqm for all i. We further assume that the price variables are independent of the disturbances, on the assumption that the small producers in our
sample have no effect on the prices they face for their inputs.
It is the correlation of disturbances that earmarks seemingly unrelated regressions. In this example, the intuition for a correlation is the notion that firms able
to produce quilts more cheaply than average may also tend to produce mattresses
more cheaply than averageproducers efficient in one operation tend to be efficient in other operations, as well. (According to this intuition, sqm 7 0, but the
covariance need only be nonzero for seemingly unrelated regression estimation to
be appropriate.)
The SUR Estimation Procedure

Individually, Equations 14.14 and 14.15 satisfy the GaussMarkov Assumptions.
OLS would seem the appropriate estimation procedure for either equation. What,
then, is the intuition for why OLS might not be the efficient estimation procedure
for the seemingly unrelated Equations 14.14 and 14.15? Recall our reason for
weighting some observations more heavily than others. In the model with no intercept, we saw that disturbances of a given size misled us less about the slope
when they were attached to observations with larger Xs. In the model with an intercept, this intuition applied to observations that lay further from the mean X in
the sample. The GaussMarkov Assumptions of homoskedasticity and no serial
correlation assured us that any one observation was as likely as any other to have
a disturbance of a given size. Hence, we found that the OLS weights were optimal.
More Estimators for Systems of Equations
EXT 5-3
Suppose, however, we knew the correlation between the disturbances and the
variances of the disturbances, and we knew the actual value of the first disturbance in the quilt equation, but not the value of the first disturbance in the mattress equation. Let us further assume the first observation has explanators that
take on the sample average values. Would we use OLS to estimate the mattress
equation? Not if we think this observation lies closer to the true line than do
other observations. The assumed information about the quilt disturbance, coupled with the information about the covariance between the disturbances, undermines the presumption that all observations are equally likely to have disturbances of a given size, and therefore undermines the rationale for the optimality
of OLS. Suppose, for example, that the disturbance on the first observation for
the cost of quilts is very small in magnitude, suggesting that the underlying disturbance in the mattresses equation for the first observation is also very small. Because this observation probably lies closer to the true line than we would otherwise think, we want to use that information in estimating the line.
How does this intuition apply in the actual circumstances in which we do not
know the size of a given residual? The choice of weighting the first observation in
the mattress regression will depend in part on how well we think we can mimic
the first observations disturbance in the quilt regression. If we can mimic the disturbances particularly well, we may decide to consider that information when we
weight the first observation in the mattress equation. Because unbiased estimation
will impose constraints on the weights we use, weighting one observation more
heavily than in OLS requires weighting another observation less heavily, if we are
to maintain unbiasedness.
Residuals mimic disturbances, but they are not equal to them. How reliably
residuals mimic the disturbances for a given observation depends, in part, on how
well we estimate the true parameters (the variances of the parameter estimators),
and, in part, on how far the explanators for that observation are from their average values. (We are less confident in our estimates of the expected value of Y for
Xs far from the mean Xs.) Therefore, how the quilt information influences our
mattress equation weight for the first observation will depend on the values of the
explanators in the quilt equation. Similarly, how influenced our weight for the
first observation in the quilt equation will be depends on the values of the explanators in the mattress equation.
There are special cases in which OLS efficiently estimates seemingly unrelated
regressions. For one, if both equations suggest the same weights for an explanator, there is no reason to choose one over the other. One special case is that in
which the explanators in one equation are identical to those in the other equations; OLS is the efficient estimator for seemingly unrelated regressions in this
case. A further special case applies when an equation contains only a subset of the
explanators that appear in other equations. If that subset takes on the same values
EXT 5-4
Web Extension 5
in every equation, then OLS efficiently estimates the equation containing the subset of explanators.
The standard procedure for estimating seemingly unrelated regressions is a
variant of the feasible generalized least squares (FGLS) strategy developed in
Chapters 10 and 11:
1. Estimate the equations using ordinary least squares.
2. Use the least squares residuals from (1) to estimate the variances and contemporaneous covariance of the disturbances:
s2q =
1
e2
n - kq - 1 a qi
s2m =
1
e2
n - km - 1 a mi
and
sqm =
1
e e ,
n - max(kq, km) - 1 a qi mi
where the subscripts q and m refer to the equations being estimated, for example the quilt and mattress cost equations, Equations 14.14 and 14.15.
3. Combine the regressions into one large equation, but allow the coefficient on
each variable to differ across the several equations. For example, in the quilt
and mattress example, we replace Equations 14.14 and 14.15 with the equivalent model
Cj = a0Dqj + a1Pj* l + a2Pj* d
+ b0Dqj + b1Pj* * l + b2Pj* f + ej,
14.16
for j = 1, . . . , 2n, in which

Cj = Cqj for j = 1, , n,
and
Cj = Cm
j-n for j = n + 1, , 2n;
Dqj = 1 for j = 1, , n
and
= 0 otherwise;
Dmj = 1 for j = n + 1, , 2n
and
= 0 otherwise;
Pj* l = Plj for j = 1, , n
and
= 0 otherwise;
Pj* * l = Plj for j = n + 1, , 2n
and
= 0 otherwise;
Pj* d = Plj for j = 1, , n
and
= 0 otherwise;
EXT 5-5
and Pj* f = Plj for j = n + 1, , 2n
and
= 0 otherwise;
ej = eqj for j = 1, , n
and
ej = em
j for j = n + 1, , 2n.
This is called a stacked regression because several equations are stacked together in a single regression equation. OLS applied to Equation 14.16 obtains
the same results as OLS applied to Equations 14.14 and 14.15 separately.
4. Perform GLS on the stacked regression, using the estimated variances and covariances in place of the actual.
Although FGLS can improve the efficiency with which each equation is estimated, there is a risk incurred by treating equations as a seemingly unrelated system. Specification errors, such as omitted variables, in one equation can bias the
coefficient estimates in all the equations. OLS applied one equation at a time will
be biased when applied to the misspecified equations, but the other equations will
be unbiasedly estimated. Many researchers shy away from SUR estimation because they do not want to risk tainting all their estimates with the misspecification of a single equation.
14.11
HOW DO WE MAKE
ESTIMATOR?
AN
Full Information Estimation Methods

In simultaneous equations models, our central concern has been the simultaneity
bias of OLS. In seemingly unrelated regression models, our concern has been with
the efficiency of estimators when the disturbances are contemporaneously correlated across equations. In this section, we discuss the joining of these two concerns. Just as FGLS methods can sometimes improve on the efficiency of OLS,
sometimes they can also improve on the efficiency of IV estimators.
Estimators of simultaneous systems that jointly estimate all the structural
equations of a system are called full information estimators. Three-stage least
squares and full information maximum likelihood are two full information estimators described in this section. Procedures that estimate individual structural
equations of a system separately from one another, such as OLS and 2SLS, are
called limited information estimators. Another limited information estimator, introduced in this section, is the limited information maximum likelihood estimator.
Three-Stage Least Squares

Three-stage least squares is the most commonly used full information estimator.
Three-stage least squares (3SLS) combines two-stage least squares and the FGLS
estimator for seemingly unrelated regressions. The three steps of 3SLS are as
follows:
EXT 5-6
Web Extension 5
1. Estimate the reduced form equations by OLS and form fitted values for each
troublesome explanator based on the variables reduced form equation.
2. Replace each troublesome explanator in each equation by its fitted value
from (1) and perform OLS for each structural equation. As in the second step
of the seemingly unrelated regressions equation, estimate the variances and
contemporaneous covariances of the equations disturbances, this time using
the 2SLS residuals, instead of OLS residuals.
3. Perform FGLS as for seemingly unrelated regressions, using the estimated
variances and contemporaneous correlations among the disturbances of the
several structural equations, but replace any endogenous explanators with
their fitted values from (1) before performing the SUR estimation.
Three-stage least squares consistently estimates systems of identified equations but does not consistently estimate systems of equations that include one or
more underidentified equations. Consequently, before beginning the three steps of
3SLS, we must determine which equations are underidentified, and not include
them in the 3SLS procedure.
Notice that in estimating the variances and cross-equation correlation of the
disturbances, 3SLS relies on fitted values of the reduced forms, in place of the endogenous variables themselves. But these OLS reduced form estimates do not incorporate the information embodied in the exclusion restrictions of the structural
equations. This observation points the way to yet another estimator for simultaneous equations, full information maximum likelihood (FIML). Well further discuss the FIML estimator after we explore an example of 3SLS.
3SLS and the Fulton Fish Market

Most econometric software packages perform 3SLS on request. To better understand 3SLS, lets return to Graddys Fulton Fish Market data, which are in the file
whiting.*** on this books Web site (www.aw-bc.com/murray). Table 14.5 contains the three-stage least squares estimates of the supply and demand for whiting
fish in the Fulton Fish Market. We still find a significant effect of price on demand
and still do not precisely measure the effect of quantity on the supply priceand
still do not reject the null hypothesis of perfectly elastic supply.
The parameter estimates do not change much between 2SLS (Table 14.3) and
3SLS (Table 14.5) in this particular case. Nor do the estimated standard errors.
Neither change much because the disturbances are only weakly correlated contemporaneously across equations. The reported estimated covariance of the residuals is 0.05. The estimated standard deviations of the disturbances in the demand
and supply equations are 0.69 and 0.35, respectively (the standard errors of the
regressions in Table 14.5). Thus the correlation coefficient of the contemporaneous disturbances is 0.05>(0.69 # 0.35) = 0.21.
EXT 5-7
Table 14.5 3SLS Estimation of the Supply and Demand for Whiting
System: SANDD
Estimation Method: Three-Stage
Least Squares
Date: 11/09/02 Time: 16:30
Sample: 1 111
Included observations: 111
Total system (balanced) observations: 222
Linear estimation after one-step weighting matrix
Coefficient
Std. Error
t-Statistic
Prob.
DEMAND
constant
PRICE
Day1
Day2
Day3
Day4
8.527301
0.942795
0.023597
0.508797
0.552579
0.091847
0.150803
0.335759
0.200875
0.195435
0.200147
0.195392
56.54589
2.807956
0.117470
2.603406
2.760865
0.470066
0.0000
0.0055
0.9066
0.0099
0.0063
0.6388
SUPPLY
constant
QTY
Windspd
Rainy
Cold
Stormy
Mixed
Windspd2
2.454753
0.046738
1.011353
0.008251
0.044219
0.377246
0.202540
0.148608
5.599931
0.120080
3.879499
0.088971
0.074296
0.136818
0.091789
0.672736
0.438354
0.389219
0.260692
0.092734
0.595178
2.757289
2.206580
0.220901
0.6616
0.6975
0.7946
0.9262
0.5524
0.0063
0.0284
0.8254
Determinant residual covariance
0.050408
Equation: QTY = C(1) + C(2)*PRICE + C(3)*DAY1 + C(4)*DAY2 + C(5)*DAY3 + C(6)*DAY4

Instruments: C DAY1 DAY2 DAY3 DAY4 WINDSPD RAINY COLD STORMY MIXED WINDSPD2
Observations: 111
R-squared
0.182118
Mean dependent var
8.523430
Adjusted R-squared
0.143172
S.D. dependent var
0.741672
S.E. of regression
0.686529
Sum squared resid
DurbinWatson stat
1.341118
49.48880
Equation: PRICE = C(7) + C(8)*QTY + C(9)*WINDSPD + C(10)*RAINY + C(11)*COLD +

C(12)*STORMY + C(13)*MIXED + C(14)*WINDSPD2
Instruments: C DAY1 DAY2 DAY3 DAY4 WINDSPD RAINY COLD STORMY MIXED WINDSPD2
Observations: 111
R-squared
0.197158
Mean dependent var
Adjusted R-squared
0.142596
S.D. dependent var
S.E. of regression
0.353657
Sum squared resid
DurbinWatson stat
0.727180
0.193681
0.381935
12.88252
EXT 5-8
Web Extension 5
Full Information Maximum Likelihood

Three-stage least squares does not incorporate all the identifying information
found in an overidentified model. The reduced form estimates used in the first
stage are not forced to match the reduced form implied by the parameter estimates from the third stage. Full information maximum likelihood uses all of the
identifying information found in an overidentified model. Full information maximum likelihood (FIML) jointly estimates all the structural parameters of a model
by maximum likelihood, subject to all the identifying restrictions contained in the
model. Maximum likelihood estimation is a strategy that estimates parameters by
selecting the parameter values that make the observed sample data least surprising; that is, the observed data would be less likely to arise for any alternative parameter values than they are for the maximum likelihood estimates of the parameters. (Supplement 4 on this books companion Web site, www.aw-bc.com/
murray, contains an extensive discussion of maximum likelihood estimation.) Full
information maximum likelihood estimates the covariance structure of the disturbances across equations jointly with the parameters of the equations themselves.
Table 14.6 contains the FIML estimates of the whiting fish demand and supply
equations. There is little difference between the 3SLS and FIML estimates in this
case.
Three-stage least squares yields different estimates than does full information
maximum likelihood, but when the disturbances are normally distributed, both
3SLS and FIML are asymptotically efficient estimators. Full information maximum likelihood is computationally burdensome to perform, so three-stage least
squares is the most commonly used full information estimator.
Full information estimators jointly estimate all the parameters of a system in
one procedure. A peril of this approach is that misspecifications of one equation
generally undermine the consistency of the estimates for all the equations. Econometricans have, over time, grown increasingly wary of this pitfall of full information methods, and they consequently rely on full information methods less and
less. However, when the specification of the entire system of equations is particularly trustworthy, the efficiency gains of full information methods sometimes warrant their use.
Limited Information Maximum Likelihood

FIML has a limited information estimator cousin, called limited information
maximum likelihood. To avoid contagion from misspecified equations, LIML estimates each structural equation separately. Limited information maximum likelihood (LIML) couples each structural equation with the reduced form equations
for that equations endogenous explanators to create a mini-system of equations,
and it estimates the parameters of that system by maximum likelihood.
Table 14.6 FIML Estimates of the Supply and Demand for Whiting
System: SANDD
Estimation Method: Full Information
Maximum Likelihood (Marquardt)
Date: 11/09/02 Time: 16:37
Sample: 1 111
Included observations: 111
Total system (balanced) observations: 222
Convergence achieved after 31 iterations
Coefficient
Std. Error
z-Statistic
Prob.
DEMAND
constant
PRICE
Day1
Day2
Day3
Day4
8.535494
0.936917
0.003248
0.521458
0.561473
0.097179
0.187009
0.370707
0.223671
0.225277
0.236276
0.253373
45.64224
2.527375
0.014522
2.314741
2.376338
0.383541
0.0000
0.0115
0.9884
0.0206
0.0175
0.7013
SUPPLY
constant
QTY
Windspd
Rainy
Cold
Stormy
Mixed
Windspd2
3.087104
0.100217
1.112579
0.010789
0.048856
0.392793
0.214600
0.163582
5.762400
0.147332
3.936755
0.093329
0.089252
0.157426
0.106364
0.687638
0.535732
0.680212
0.282613
0.115603
0.547389
2.495097
2.017603
0.237889
0.5921
0.4964
0.7775
0.9080
0.5841
0.0126
0.0436
0.8120
Log Likelihood
144.3302
Determinant residual covariance
0.055259
Equation: QTY = C(1) + C(2)*PRICE + C(3)*DAY1 + C(4)*DAY2 + C(5)*DAY3 + C(6)*DAY4

Observations: 111
R-squared
0.183640
Mean dependent var
8.523430
Adjusted R-squared
0.144766
S.D. dependent var
0.741672
S.E. of regression
0.685890
Sum squared resid
Durbin-Watson stat
1.346313
49.39672
Equation: PRICE = C(7) + C(8)*QTY + C(9)*WINDSPD + C(10)*RAINY + C(11)*COLD +

C(12)*STORMY + C(13)*MIXED + C(14)*WINDSPD2
Observations: 111
R-squared
0.137879
Mean dependent var
Adjusted R-squared
0.079289
S.D. dependent var
S.E. of regression
0.366480
Sum squared resid
DurbinWatson stat
0.795456
0.193681
0.381935
13.83371
EXT 5-9
EXT 5-10
Web Extension 5
Estimating the structural equation with its associated reduced form equations is
less apt to contaminate a well-specified structural equation with the misspecification of another equation, because the reduced form equations are much less likely
to be misspecified than the various structural equations. There is no risk of mistakenly omitting a relevant exogenous variable in a reduced form equation because reduced form equations always include all of the exogenous variables. Structural
equations, in contrast, are often misspecified by unwarranted exclusions.
LIML performs FIML on a subset of equations. An intuitive attraction of
LIML is that it can exploit any correlations among the one structural equations
disturbances and the disturbances of the associated reduced form equations. Despite this seeming advantage in theory, LIML and 2SLS have the same asymptotic
properties. In practice, Monte Carlo studies in the literature suggest that LIML
approaches its asymptotic normal distribution more quickly than 2SLS and that
LIML often outperforms 2SLS in small samples. Nonetheless, 2SLS remains the
limited information estimator used more often, perhaps because its theoretical
small-sample statistical properties are somewhat more attractive than those of
LIML, or perhaps because it is easier to implement in some econometric software
packages.
An Organizational Structure
for the Study of Econometrics
1. What Is the DGP?
2. What Makes a Good Estimator?
3. How Do We Create an Estimator?
Limited information methods:
Indirect least squares (ILS)
Two-stage least squares (2SLS)
Limited information maximum likelihood (LIML)
Seemingly unrelated regression estimation (SUR)
Full information methods:
Three-stage least squares (3SLS)
Full information maximum likelihood (FIML)
4. What Are an Estimators Properties?

The perils of full information methods: inherited misspecification bias
5. How Do We Test Hypotheses?
EXT 5-11
Summary
This extension of Chapter 14 first introduces an estimation procedure for jointly
estimating several nonsimultaneous regression equations when their disturbances
are correlated. It then introduces two more estimators, three-stage least squares
(3SLS) and full information maximum likelihood (FIML) estimators, which estimate all the parameters of a system of simultaneous equations jointly. Although
3SLS and FIML are more efficient than 2SLS, they also risk transmitting specification biases across equations. A correctly specified equation is consistently estimated by 2SLS, even if all the other equations in the model are misspecified. But
3SLS and FIML estimates of properly specified equations may be biased by a single misspecified equation in the system.
Concepts for Review

Full information estimators
Full information maximum likelihood
(FIML)
Limited information estimators
Limited information maximum

likelihood (LIML)
Seemingly unrelated regressions (SUR)
Three-stage least squares (3SLS)
(*** indicates a file on this books companion Web site,

www.aw-bc.com/murray.)
Questions for Discussion

1. Are any variables truly exogenous? Discuss.
2. When might FIML be inferior to 3SLS? Discuss.
Problems for Analysis

1. Lawrence Klein built one of the earliest models of the macro economy and estimated
it for the United States. Kleins Model I contains (i) a consumption function in which
consumption (C) depends on wages and salaries in the public and private sectors (Wg
and Wp) and on property income (P):
Ct = a0 + a1(Wgt + Wpt) + a2Pt + eC
t;
(ii) an investment equation in which investment (I) depends on current and lagged
property income and on the initial stock of capital (K):
It = b0 + b1Pt + btPt-1 + b3Kt + elt;
EXT 5-12
Web Extension 5
and (iii) a wage equation in which wages depend on current and lagged national
product (Y) and a time trend (T):
Wt = g0 + g1Yt + g2Yt-1 + g3Tt + eW
t .
The three identities in Model I are: (i) national private product (Y) equals consumption plus investment plus government spending (G) minus government wages (Wg):
Yt = Ct + It + Gt - Wgt;
(ii) national income (N), which equals national product minus net exports and taxes
(X), equals wages plus property income:
Nt = Yt - Xt = Wpt + Pt;
and (iii) the change in the capital stock equals investment:
Kt = Kt - 1 + It - 1.
The exogenous variables in Kleins system were government spending (G), government wages (Wg), indirect business taxes plus net exports (X), and a time trend (T).
Additional predetermined variables were the capital stock, which was measured at
the beginning of the year (K), lagged property income (Pt-1), and lagged national
product (Yt-1). The file Klein1.*** contains the data with which to estimate Kleins
model.
a. Estimate each structural equation in Kleins model by OLS, 2SLS, 3SLS, and
FIML. Briefly compare the results from the four procedures.
b. Using the 2SLS residuals, test any overidentifying restrictions in Kleins three
structural equations.
c. Estimate the reduced form equations for the Klein model, one for each endogenous variable. Use the estimated reduced form to assess the effect of increased
government spending on the level of national income. How do the structural equations add to your understanding of the effect of government spending on national
income beyond what you learned from the reduced form?
Endnotes
1. Arnold Zellner, An Efficient Method of Estimating Seemingly Unrelated Regressions
and Tests of Aggregation Bias, Journal of the American Statistical Association 57
(1962): 500509.
EXT 5-13
Appendix 14.A
A Matrix Algebra Representation of
Systems of Equations
Just as matrix algebra provides compact representation and manipulation of the
data for a single equation, it also provides a compact representation and manipulation of the data for a system of equations. This appendix uses matrix algebra to
examine the identification of equations within systems of equations. The matrix
algebra for representing two-stage-least squares (2SLS), the most common procedure for estimating identified equations, is in Appendix 13.B.
14.A.1
A System of Equations
This appendix examines a system of G equations, each accounting for one endogenous variable. The system also contains (K + 1) predetermined variables.
Predetermined variables include both exogenous variables, those determined outside of the system, and lagged dependent variables. This appendix limits its attention to DGPs in which all predetermined variables are nontroublesome. This excludes DGPs with troublesome lagged dependent variable explanators.
We would ordinarily write as the structural equation that determines Y1, the
first endogenous variable as
G
Y1t = a Yjtaj + a Xjt bj + et (t = 1, , T),

j=2
14.A.1
j=0
if the first endogenous variable depended on both the other endogenous variables
and the predetermined variables. Some coefficients in Equation 14.A.1 might be
equal to zero. Equation 14.A.1 is the starting point for describing a system of
equations. This section adapts Equation 14.A.1 to describe such a system.
Accounting for Multiple Equations

Because there are G structural equations in our system, we need some further subscripts to denote that the coefficients and disturbances in Equation 14.A.1 are
those from the first structural equation. Therefore, rewrite that equation as
G
Y1t = a Yjtaj1 + a Xjt bj1 + e1t (t = 1, , T),

j=2
j=0
14.A.2
EXT 5-14
Web Extension 5
and rewrite it again for symmetry as

G
Y1t - a Yjtaj1 = a Xjt bj1 + e1t (t = 1, , T),

j=2
j=0
or, even more symmetrically, as

G
- a Yjtaj1 = a Xjt b j1 + e1t (t = 1, , T),

j=1
14.A.3
j=0
in which Y1i is swallowed into the summation expression. Equation 14.A.3 is a

form that could apply equally well to any structural equation, if we replace the 1s
in Equation 14.A.3 with a number appropriate to the structural equation in question. We could, for example, write the i-th structural equation as
G
- a Yjtaji = a Xjt b ji + eit (t = 1, , T).

j=1
14.A.4
j=0
For convenience, we define gji = -aji, so that Equation 14.A.4 becomes

G
a Yjtgji = a Xjt b ji + eit (t = 1, , T).

j=1
14.A.5
j=0
To express Equation 14.A.5 in matrix form, define

and
Y = [Y1Y2 YG],
X = [X0X1 XK],
E = [E1E2 EG],
in which the Yi, the Xi, and the Ei are (T * 1) column vectors containing the observations for the corresponding variables, and
i = [g1ig2i gGi]
Bi = [b 0 b 2i b Ki].
Notice that in i,gii = 1. We can rewrite Equation 14.A.5 as
Yi = XBi + Ei.
An even more compact notation combines all G structural equations into one matrix formulation. Define
= [12 G]
EXT 5-15
and
B = [B1B2 BG]
and write
Y = XB + E,
14.A.6a
Y - XB - E = 0.
14.A.6b
or
Equation 14.A.6a is a matrix representation of G structural equations with K predetermined variables in the system. The matrix (Y - XB - E) in Equation
14.A.6b is a (T * G) matrix. Its columns contain the G structural relationships,
with each row corresponding to one observation on all G structural relationships.
The Reduced Form

If the endogenous variables are genuinely determined in the system of equations
defined by Equation 14.A.6, then we can solve that equation to express each endogenous variable as a function of the predetermined variables alone. When this
is true, we say that the model in 14.A.6 is complete. We assume our system is
complete, so we can solve Equation 14.A.6 for Y:
Y-1 = XB-1 + E-1
or
Y = XB-1 + E-1 = X + N,
14.A.7
where = B-1 and N = E-1. Equation 14.A.7 is the reduced form for the
model in Equation 14.A.6. Equation 14.A.7 implies that
B = .
14.A.8
If the predetermined variables are not perfectly collinear, the reduced form
equations are identified. Because the reduced form equations explanators are all
predetermined variables, we can consistently estimate the coefficients of those
equations, the elements of , by OLS. In contrast, we may be unable to consistently estimate the coefficients of a structural equation like that in Equation
14.A.2,
G
Y1t = a Yjtaj1 + a Xjt b j1 + e1t (t = 1, , T),

j=2
j=0
EXT 5-16
Web Extension 5
because the endogenous explanators may be correlated with the disturbances. If

none of the predetermined variables are excluded from this equation, for example, there are no available instrumental variables for the endogenous variables
with which to consistently estimate the equations coefficients.
It is only the co-movement of the endogenous variables with the predetermined variables that can tell us about an individual structural equations coefficients. Any co-movements among endogenous variables may be affected by feedbacks among the endogenous variables across equations. Consequently, all the
information we have with which to identify the structural equations is contained
in the reduced form relationships of Equation 14.A.7.
Structural Equations and Reduced Form Equations Revisited

The structural relationships of Equation 14.A.6 (a and b) contain G + K + 1 coefficients in each of the G structural relationships, or G2 + GK + G coefficients
in all. The reduced-form equations, on the other hand, contain K + 1 coefficients
in each of G equations, or GK + G coefficients in all. Because the number of reduced form parameters is smaller than the number of structural coefficients, there
are infinitely many different matrices that could serve as and B and satisfy
= B-1.
In general, the reduced form does not uniquely determine the structural model.
This is the essence of the identification problem in simultaneous equations models. Because it is the reduced form parameters that we know we can consistently
estimate, we can only consistently estimate the structural parameters when they
are retrievable from the reduced form. But when can we retrieve the reduced coefficients for a particular structural equation from the reduced form equation?
The key to identifying the i-th structural equation lies in the i-th reduced form
equation. Just as the i-th columns of the right- and left-hand sides of 14.A.6b
yield the i-th structural equation, with dependent variable Yi:
Yi - XBi - Ei = 0,
14.A.9
the i-th column of , i, yields the reduced form equation for the i-th endogenous
variable, Yi. We know that the i-th element of i is 1. Identification requires further restrictions on i and Bi. Consider, for example, the first structural equation.
Not all endogenous variables need appear in the first structural equation, nor
do all predetermined variables need appear in the first equation. Lets divide endogenous variables into three groups, Y1 itself, Yin1, and Yout1. Yin1 contains all
the endogenous variables that appear as explanators with nonzero coefficients in
the first structural equation. Yout1 contains all the endogenous variables with zero
coefficients in the first structural equation. Yin1 is a Gin1 * T matrix and Yout1 is a
EXT 5-17
Gout1 * T matrix; Gin1 + Gout1 + 1 = G. For convenience, suppose that Y is

arranged such that
Y = [Y1Yin1Yout1].
14.A.10
X = [Xin1Xout1],
14.A.11
Similarly, suppose
in which Xin1 contains all the predetermined variables that appear with nonzero
coefficients in the first structural equation and Xout contains all the predetermined
variables with zero coefficients in the first structural equation. Xin1 is a Kin1 * T
matrix and Xout1 is a Kout1 * T matrix. Kin1 + Kout1 = K + 1.
It proves informative to rewrite 14.A.9 with the explicit division of Y and X
into the groups given by Equations 14.A.10 and 14.A.11:
1
Bin1
[Y1Yin1Yout1 C in1 S - [Xin1Xout1] B
R - Ei,
Bout1
out1
14.A.12
in which in1 is a Gin1 * T matrix containing the first structural equations coefficients for the endogenous variables in Yin1, out1 is a Gout1 * T matrix containing
the first structural equations coefficients for the endogenous variables in Yout1,
and Bin1 and Bout1 are similarly defined. Because excluded variables have zero coefficients, we can rewrite Equation 14.A.12 as
1
Bin1
[Y1Yin1Yout1]C in1 S - [Xin1Xout1] B
R - Ei.
0
0
We can similarly rewrite the reduced form equation
Y = X + N
as
[Y1Yin1Yout1] = [Xin1Xout1] B
P in11
P out11
in1in
out1in
in1out
R + Ni,
out1out
in which
P in11
P out11
in1in
out1in
in1out
R =
out1out
EXT 5-18
Web Extension 5
and
P in11 contains the reduced form coefficients for the predetermined variables that
appear in the first structural equation from the first endogenous variables reduced form equation; it is 1 * Kin1;
P out11 contains the reduced form coefficients for the predetermined variables that
do not appear in the first structural equation from the first endogenous variables reduced form equation; it is 1 * Kout1;
in1in contains the reduced form coefficients for the predetermined variables that
appear in the first structural equation from the reduced form equations for
the endogenous explanators included in the first structural equation; it is
Gin1 * Kin1;
in1out contains the reduced form coefficients for the predetermined variables that
appear in the first structural equation from the reduced form equations for
the endogenous explanators excluded from the first structural equation; it is
Gout1 * Kin1;
out1in contains the reduced form coefficients for the predetermined variables that
do not appear in the first structural equation from the reduced form equations for the endogenous explanators included in the first structural equation;
it is Gin1 * Kout1; and
out1out contains the reduced form coefficients for the predetermined variables excluded from the first structural equation from the reduced form equations for
the endogenous explanators excluded from the first structural equation; it is
Gout1 * Kout1.
With this more elaborate rendering of structural and reduced form equations,
we can determine when Equation 14.A.12 is identified.
Identifying a Structural Equation

Recall that the first column of Equation 14.A.8 is
B1 = 1,
or
Bin1
P in11
R = B
0
P out11
in1in
out1in
1
in1out
R C in1 S .
out1out
0
14.A.13
We can rewrite the first row of Equation 14.A.13 as

Bin1 = P in11 + in1inin1,
14.A.14
EXT 5-19
which is a Kin1 * 1 matrix. We can rewrite the second row as

0 = P out11 + out1inin1,
or
P out11 = - out1inin1,
14.A.15
which is a Kout1 * 1 matrix.

Notice that Equation 14.A.15 contains only the structural coefficients for the
endogenous explanators in the first structural equation, in1, and reduced form
parameters. If we can solve Equation 14.A.15 for in1 as a function of out1in and
P out11, the parameters in in1 are identified. Moreover, we could then substitute
for in1 in Equation 14.A.14 and obtain Bin1 as only a function of reduced form
parameters. All the coefficients of the first structural equation, Bin1 and in1,
would therefore be identified. When, then, can we solve Equation 14.A.15 for
in1 and thereby identify the first structural equation?
We certainly cannot solve Equation 14.A.15 for in1 if in1 contains more unknown parameters than there are linear relationships in Equation 14.A.15. We
need at least as many equations as unknowns in Equation 14.A.15. That is, Gin1
must be less than or equal to Kout1. For an equation to be identified, it must exclude at least as many predetermined variables as it includes endogenous explanators. This is the order condition for identification, necessary for identification, but
not sufficient.
Nor can we generally solve Equation 14.A.3 by inverting out1in. If
Kout1 7 Gin1, out1in is not a square matrix, and so it has no inverse. However, in
the special case in which Kout1 = Gin1, out1in is a square matrix; as long as its inverse exists, which it will if the columns of out1in are not linearly dependent, we
-1
could form out1in
and write
-1
-1
out1in
P out11 = - out1in
out1inin1 = in1.
In this case, the first structural equation is exactly identified. When Kout1 7 Gin1,
a similar condition determines whether the first structural equation is identified.
Consider a case in which Kout1 7 Gin1. Suppose that we can select a subset of
the rows of out1in such that the resulting square matrix is invertible. Call this
-1
. Remove the same rows from P out11
matrix *out1in and call its inverse * out1in
*
and call the resulting matrix P out11.
We call the number of linearly independent columns in a matrix the column
rank of the matrix. We call the number of linearly independent rows the row
rank. Equation 14.A.15 tells us the relationship among these matrices,
*
*
= - out1in
in1,
P out11
EXT 5-20
Web Extension 5
from which we can obtain

* -1 P out11
*
* -1 in1 = in1.
out1in
= - out1in
-2
Which rows we delete from out1in
and P out11 does not matter, as long as the
*
resulting matrix, out1in
, is invertible. For example, in Section 14.5, in the discussion of overidentification, we learned that a particular reduced form led to two relationships between a structural parameter, b 1, and the reduced form parameters:
pq0
b 1 = pp0
and
pq1
b 1 = pp1 .
The reduced form parameters lead to b 1 using either formula. Consequently, b 1
can be consistently estimated from consistent reduced form estimates b 1 is identified. The surfeit of riches rules out using indirect least squares as our estimation
procedurein finite samples, there is no unique solution for the structural equations from the reduced form equations in finite samplesbut b 1 is identified, so it
can be consistently estimated by some available means.
When for M 7 R and an M * R matrix, A, we can remove rows from the
matrix and form an invertible R * R matrix, we say that the matrix A is of rank
R. A sufficient condition for the i-th equation in a system to be identified, is that
* in is Gin . This is the rank condition for identification. When the
the rank of out
i
i
rank condition fails for an equation, there are multiple sets of coefficient values
for that equation that are consistent with the systems reduced form; the equation
is underidentified. When the rank condition is satisfied, there is only one set of
coefficients for the equation that are consistent with the systems reduced form;
the equation is identified.
Identification and Instrumental Variables (IV) Estimation

The identification of an equation logically precedes its estimation. If an equation
is underidentified, its parameters cannot be estimated consistently. It follows that
underidentification has implications for estimation procedures. In particular, underidentification makes instrumental variables estimates inconsistent. When the
order condition fails (when there are fewer excluded predetermined variables
than included endogenous variables), we have too few potential instruments for
estimating the equation by instrumental variables. Failure of the order condition
makes computing the IV estimator impossibleit does not exist. We can hardly
EXT 5-21
miss failures of the order condition when we perform IV estimation; the computer
will tell us that we have tried to divide by zero or that some matrix is singular
(that is, does not have an inverse).
When the order condition is satisfied, but the rank condition fails, we have
the needed number of potential instruments, but we cannot form enough linearly
independent combinations of them to make (Z X) invertible in large samples. A
failure of the rank condition when the order condition is satisfied does not stop us
from computing the IV estimator, even in very large samples; the sampling errors
in estimating the reduced-form coefficients will probably lead us to construct instruments that are not perfectly correlated within our samples. Nevertheless, the
IV estimates are inconsistent in this case because the equation is not identified.
Concepts for Review

Column rank
Row rank

Michael Murray SUR and SES

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Michael Murray SUR and SES

Hochgeladen von

Copyright:

Verfügbare Formate

Web Extension 5

Sections 14.10 and 14.11;

Seemingly Unrelated Regressions

in which eqi and em

The SUR Estimation Procedure

More Estimators for Systems of Equations

for j = 1, . . . , 2n, in which

Pj* l = Plj for j = 1, , n

Pj* * l = Plj for j = n + 1, , 2n

Pj* d = Plj for j = 1, , n

More Estimators for Systems of Equations

and Pj* f = Plj for j = n + 1, , 2n

Full Information Estimation Methods

Three-Stage Least Squares

3SLS and the Fulton Fish Market

More Estimators for Systems of Equations

Determinant residual covariance

Equation: QTY = C(1) + C(2)*PRICE + C(3)*DAY1 + C(4)*DAY2 + C(5)*DAY3 + C(6)*DAY4

Mean dependent var

S.D. dependent var

Sum squared resid

Equation: PRICE = C(7) + C(8)*QTY + C(9)*WINDSPD + C(10)*RAINY + C(11)*COLD +

Mean dependent var

S.D. dependent var

Sum squared resid

Full Information Maximum Likelihood

Limited Information Maximum Likelihood

More Estimators for Systems of Equations

Equation: QTY = C(1) + C(2)*PRICE + C(3)*DAY1 + C(4)*DAY2 + C(5)*DAY3 + C(6)*DAY4

Mean dependent var

S.D. dependent var

Sum squared resid

Equation: PRICE = C(7) + C(8)*QTY + C(9)*WINDSPD + C(10)*RAINY + C(11)*COLD +

Mean dependent var

S.D. dependent var

Sum squared resid

4. What Are an Estimators Properties?

5. How Do We Test Hypotheses?

More Estimators for Systems of Equations

Concepts for Review

Limited information maximum

(*** indicates a file on this books companion Web site,

Questions for Discussion

Problems for Analysis

More Estimators for Systems of Equations

Y1t = a Yjtaj + a Xjt bj + et (t = 1, , T),

Accounting for Multiple Equations

Y1t = a Yjtaj1 + a Xjt bj1 + e1t (t = 1, , T),

and rewrite it again for symmetry as

Y1t - a Yjtaj1 = a Xjt bj1 + e1t (t = 1, , T),

or, even more symmetrically, as

- a Yjtaj1 = a Xjt b j1 + e1t (t = 1, , T),

in which Y1i is swallowed into the summation expression. Equation 14.A.3 is a

- a Yjtaji = a Xjt b ji + eit (t = 1, , T).

For convenience, we define gji = -aji, so that Equation 14.A.4 becomes

a Yjtgji = a Xjt b ji + eit (t = 1, , T).

To express Equation 14.A.5 in matrix form, define

More Estimators for Systems of Equations

The Reduced Form

Y1t = a Yjtaj1 + a Xjt b j1 + e1t (t = 1, , T),

because the endogenous explanators may be correlated with the disturbances. If

Structural Equations and Reduced Form Equations Revisited

More Estimators for Systems of Equations

Equation: QTY = C(1) + C(2)PRICE + C(3)DAY1 + C(4)DAY2 + C(5)DAY3 + C(6)*DAY4

Equation: PRICE = C(7) + C(8)QTY + C(9)WINDSPD + C(10)RAINY + C(11)COLD +

Equation: QTY = C(1) + C(2)PRICE + C(3)DAY1 + C(4)DAY2 + C(5)DAY3 + C(6)*DAY4

Equation: PRICE = C(7) + C(8)QTY + C(9)WINDSPD + C(10)RAINY + C(11)COLD +