Sie sind auf Seite 1von 16

Linear Dependencies Represented by Chain Graphs

Author(s): D. R. Cox and Nanny Wermuth


Source: Statistical Science, Vol. 8, No. 3 (Aug., 1993), pp. 204-218
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2245958
Accessed: 22-11-2017 16:44 UTC

REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/2245958?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and


extend access to Statistical Science

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
Statistical Science
1993, Vol. 8, No. 3, 204-283

Linear Dependencies Represented


by Chain Graphs
D. R. Cox and Nanny Wermuth

Abstract. Various special linear structures connected with covariance


matrices are reviewed and graphical methods for their representation
introduced, involving in particular two different kinds of edge between
the nodes representing the component variables. The distinction between
decomposable and nondecomposable structures is emphasized. Empirical
examples are described for the main possibilities with four component
variables.

Key words and phrases: Chain model, conditional independence, covari-


ance selection, decomposable model, linear structural equation, multivari-
ate analysis, path analysis.

1. INTRODUCTION The need to discuss special structures arises partly


because the relations of marginal independence and
This paper has three broad objectives. The first is to
conditional independence expressed thereby are often
illustrate the rich variety of special forms of association
of substantive interest and partly because in a satu-
and dependence that can arise even with as few as
rated model with p component variables, that is, one
three or four variables. The second is to show the
in which the covariance matrix is unrestricted other
value of graphical representation in clarifying these
than to being positive definite, there are 1I2p(p - 1)
dependencies; for this we introduce graphs with two
correlations, and reduction of dimensionality may be
different kinds of edge and some further features which
desirable to avoid a superabundance of parameters.
are also new. The third objective is to show the impor-
There are strong connections with, in particular, the
tance in interpretation of the distinction between de-
long history of work in path analysis in genetics, in
composable and nondecomposable models.
simultaneous equations in econometrics and linear
A series of examples will be used in illustration,
structural models in psychometrics and with the body
partly to show that many of the special structures do
of recent work applying graph-theoretic ideas to the
indeed arise in applications and partly to show in out-
study of systems of conditional independencies arising
line the implications for interpretation, although refer-
especially in the study of expert systems.
ence to the subject matter literature is necessary for
In Section 2 we review some general properties of
a full account. Most of the examples arise from recent
linear regression systems as related to the covariance
investigations at University of Mainz. For purposes of
matrix of the variables and stress the distinction be-
exposition we have chosen examples with at most four
tween multivariate regression and block regression and
variables; that is, we have simplified by omitting men-
between decomposable and nondecomposable struc-
tion of variables which analysis had shown to have no
tures. In Section 3 we introduce the main conventions
bearing on the points at issue.
useful in a graph-theoretic representation of the inde-
We confine the discussion to those problems with
pendency relations that may hold; in Section 4 we
essentially linear structure in which the interrelation-
discuss relations with previous work, and in Section 5
ships are adequately captured by the covariance matrix
we give a series of empirical examples for four vari-
of the variables. Of course in applications, checks for
ables. The paper concludes in Section 6 with some general
nonlinearities and outliers are required, and these have
discussion. The emphasis throughout is on the struc-
been done for all examples whenever we had access to
ture and interpretation of the various models rather
the raw data.
than on the procedures for fitting.

D. R. Cox is Warden, Nuffield College, Oxford OX1


2. SOME PROPERTIES OF COVARIANCE MATRICES
1NF, United Kingdom. Nanny Wermuth is Professor,
Psychologisches Institut, Johannes Gutenberg- Uni- It is convenient to set out some properties of systems
versitat Mainz, Postfach 3980, D-55099 Mainz, Ger- of linear least squares regressions derivable from a
many. covariance matrix. These are full regression equations

204

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 205

in a multivariate normal distribution. There is through- In a saturated multivariate regression (2) each compo-
out the usual interplay between relatively weak second- nent of Ya is regressed separately on the full set of
order properties of least squares regression and the components Yb.
strong properties derivable from an assumption of mul- On the other hand in a saturated block regression
tivariate normality, such as that zero correlation or each component of Ya is regressed not only on Yb
zero partial correlation implies independence or condi- but also on all remaining components of Ya. Then the
tional independence. regression equation parameters are instead propor-
We consider first thep X 1 vector Y = (Y1, . .. , Yp)ltional to the elements of the matrix ( aa, ab) (Wermuth,
with mean E(Y) = ,u. We denote the positive definite 1992). The reason is that the expected value of a compo-
covariance matrix by cov(Y) = E, and its inverse, the nent Y1 of Ya given all remaining variables of Y can be
concentration matrix, therefore by E-1; the diagonal obtained by taking expectations in
elements of s are the variances (Uii), those of E-1 are ,aay a
(3) Ya + Yb = -a
the precisions (aii). The off-diagonal elements of E are
the covariances (aij), those of E-1 are the concentrations where E(CWa) = 0, var
(i>). A marginal correlation Pij is expressible via ele- equation by the concentration a". Equation (3) is de-
ments of the covariance matrix, in a way similar to rived from a block triangular decomposition of the
that in which a partial correlation, Pij.k, given all of the concentration matrix, E-1 = ATT-1A, where
remaining variables k = {1, . .. , p}\{i, j}, is expressible (Zwaa)- Eab
via elements of the concentration matrix:
-1/2 -1/2 (4) 0 Ibb
Pij = aij(aiiajj) ,Pijk - -ii( jiigie)

T-1 bb.a)
This implies in particular that in the usual notation
(Dawid, 1979a) for independence,
as the first Pa equations of (T-1A)(Y - E(Y)) = co. The
Yi L Yj, if andonlyifuai = O, residuals co have zero mean and covariance matrix T-1.
Y 1L Yj I Yk, if andonlyifa'1j = O, For a block regression, the resulting coefficient of
where as above k = {1, ... ,p}\{i, j}. variable Yj in the ith equation is minus a partial regres-
To study regression models, we partition Y into Ya sion coefficient given all remaining variables of Y, that

and Yb, pa X 1 and Pb X 1, respectively, Pa + Pb =is, p,given all remaining response and explanatory vari-
and call the two parts response and explanatory vari- ables. On the other hand, in a multivariate regression
ables. Let the covariance matrix and the concentration the coefficient of Yj in the ith equation is a partial
matrix be conformally partitioned: regression coefficient given all remaining variables of
Yb, that is, given all remaining explanatory variables.
To express this distinction more formally, we write a
( 1) ' =1 = b), Z aa ab partial regression coefficient /)ij.d for {1, ... ,p} =
a U b = {{i, j}, d, g} in terms of elements of the condi-
then the covariance matrix Ebb of the explanatory vari-
tional covariance matrix of (Ye, Yj) given Yd and of
ables and correspondingly their concentration matrix
elements
-1 bb.a = ,bb _ (Iab)T(Iaa)-1lab do not contain pa- of the concentration matrix of (Yi, Yj), having
marginalized over Yg, as
rameters needed to specify a standard regression model
of Ya on Yb. Instead, their observed counterparts are UiJ. d aijg
Aij.d-
taken as fixed or indeed sometimes are fixed by sam- ajj.d a g
pling design.
Note that in the case of a block regression g is empty
We now distinguish between a multivariate regres-
and d is the set of all remaining variables of Y, that
sion and a block regression. To simplify the notation
is, d = (a U b)\{i,j}, while in the case of a multivariate
we shall without essential loss of generality take often
regression d = b\{j}, and g = a\{i}. Note further that
E(Y) = 0. We describe the distinct parameters in the
two types of regression models, that is, the two ways (5) Yi 1 Yj I Yd, if and onlyif,6ij.d = 0
of parametrizing the conditional distribution of Ya
To judge the relative strength of the dependence of a
given Yb. For a multivariate regression of Ya on Yb,
response on several explanatory variables, it is some-
that is, for Ya = nalbYb + ea with E(ga) = 0, E(eaYgT) = 0,
times useful to compare the standardized regression
the regression equation parameters flalb and the resid-
ual variance var(ea) can be written in a matrix as coefficients, that is, f?j.d = flij.daj'aii
One of the major distinctions between multivariate
(Eaa, rIalb), where
regression and block regression lies in the meaning of
(2) Halb = JabJbbE -1 T the relation between two components Yi and Yj, both
var(iEa) = Zaa~b = Zaa - abZJbb ZJab- within Ya, and in the meaning of the r.elation of a

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
206 D. R. COX AND N. WERMUTH

component Yi from Ya to a
as described component
above Y,
from (, ab) and the concentration fro
describe this in detail it is useful to recall how a partial
matriX Nbb.a = E-l
regression coefficient -relates to a partial correlation Considering, for instance, a multivariate regression
coefficient chain model instead of a multivariate regression model
can lead to a simpler structure. This is the case in
Example 7 but not in Example 2 of Section 5 since the
Aij.d = Pij. dt Pij. adg
explanatory variables can be taken to be marginally
uncorrelated in the former but not in the latter.

measured OYf~~d
Thus, in a block regression, that is, where d
(a U b)\{i, j}, the relation between Y1 from Ya and Yj is
In the next more complex regression chain model
the joint density of three (vector) variables Ya, Yb and
measured essentially by the partial correlation given
Y, is specified via
all remaining variables of Y, no matter whether Yj is
from Ya or it is from Yb. By contrast in a multivariate fabc = falbcfblcfc,
regression, that is, where d = b\{j}, the measure of the
that is, via a regression of Ya on Yb and Yc, a regression
relation of Yi from Ya to Yj from Yb is proportional to
of Yb on Y, and the marginal distribution of Yc. This
the partial correlation given the variables in Yb other
would be an adequate approach if the components of
than Yj; the correlation between Yi and Yj both within Ya are the response variables of primary interest hav-
Ya is given all variables in Yb. Thus, a larger set of ing Yb and Yc as potential explanatory variables, if Yb
variables is considered simultaneously in block regres-
plays the role of an intermediate variable containing
sion if compared with the corresponding multivariate
potentially explanatory components for Ya and possi-
regression. Written in matrix notation their parame-
ble responses to Yc and, finally, if Yc consists of explan-
ters are related by
atory variables whose joint distribution is to be analyzed.
(6) lIalb ( aa)- aab = (,aa)-1 A particularly important family of regression chains
are the univariate recursive regressions in which, for
(7) E = (Zaab) Ialbl Eaa= (Zaab) a given ordering of the components of Y = (Yi, . . ..
Yq)T, we define the model via the regression of Yr on
Some of the special models we shall consider corre- Yr+1...,Yp for r=1,...,q; q cp-1. Anindepen-
spond to specifying some elements of regression equa- dence hypothesis is said to be decomposable if it speci-
tions to be zero, that is, to structures that appear fies one or more of the regression coefficients in such
simplified if compared with the saturated model. The a system to be zero. Early descriptions of univariate
choice between block regression and multivariate re- recursive regressions have been given by Wright (1921,
gression is then largely determined by the research 1923) with an emphasis on applications in genetics and
questions and by a decision as to which of the two by Tinbergen (1937) for the study of business cycles.
parametrizations permits a simpler description of the By contrast a nondecomposable independence hy-
relations. For instance; in each of Examples 1, 2 and pothesis consists of a set of k independence relations
7 of the empirical examples of Section 5 we can think for k distinct variable pairs that cannot, in its entirety,
of two variables as joint responses, Ya = (Y, X)T, and be reexpressed in terms of vanishing coefficients in the
of two variables as explanatory, Yb = (V, W)T. A sim- above form: that is, no ordering of the variables would
plifying description is possible with block regression produce a decomposable independence hypothesis with
but not with multivariate regression in Example 1, while the same implications from the same distributional
a simpler structure results with multivariate regres- assumption. The following arguments apply provided
sion than with block regression in Examples 2 and 7. that there are no so-called forbidden states, that is,
If not only the conditional distribution of Ya given states of zero probability (Dawid, 1979a).
Yb is of interest, but the marginal relations among For instance, for a trivariate normal distribution of
component variables within Yb as well, we are led to a Y, Z, X the hypothesis Y J X I Z and X J Z I Y
simple type of regression chain model: we specify the corresponds to zero concentrations for pairs (Y, X) and
joint density via (X, Z) and it implies X JE (Y, Z). This hypothesis can
be reexpressed by Y 11 X I Z and X L Z correspond-
fab = falbfb,
ing to 8%,,. = A,I = 0 in a univariate recursive system
and make a choice for falb among a multivariate and afor (Y, X, Z)'. Thus the hypothesis is decomposable
block regression. even though initially not expressed in that form. On
A specification of the joint distribution of Ya, Yb by the other hand, no ordering of the variables would
a saturated multivariate regression chain model has permit us to specify the hypothesis Y IL X and
Z IL U as zero restrictions in a univariate recursive
(Ea.b9 Ilalb) as parameters for the conditional distribu-
regression system. Thus the hypothesis is nondecom-
tion of Ya given Yb and Ebb for the marginal distribution
of Yb. With a saturated block regression chain model posable. Further examples for nondecomposable hy-
the parameters are the regression coefficients obtained potheses are discussed in Section 5.

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 207

They arise in applications with four or more vari- graph is drawn. Fifth, the analysis of the whole associa-
ables, as we shall see below, but suffer from a number tion structure can be achieved with the help of a se-
of disadvantages both in terms of the difficulty of quence of separate univariate linear regression analyses
fitting, but more importantly, in terms of indirectness (Wold, 1954).
of interpretation. The need for such models was noted The word causal is used in a number of different
by Haavelmo (1943) who pointed out substantive re- senses in the literature; for a review see Cox (1992).
search questions about relations which form a system Glymour et al. (1987) and Pearl (1988) have developed
of equations to be fulfilled simultaneously, but which valuable procedures for finding relatively simple struc-
are not a system of univariate recursive regressions. tures of conditional independencies which they define
His subject matter example is as follows: consumption to be causal. We prefer to restrict the word to situa-
in an economy per year depends on total income, invest- tions where there is some understanding of an underly-
ment per year depends on consumption and total in- ing process. From this perspective it is unrealistic to
come is the sum of consumption and investment. A think that causality could be established from a single
slightly simplified version of Haavelmo's argument for empirical study or even from a number of studies of
the simultaneous treatment of equations is given in similar form. We aim, however, by introducing appro-
Section 4. As a consequence of his results, the class of priate subject matter considerations into the empirical
linear structural equations was developed to study analysis, to produce descriptions and summaries of the
simultaneous relations. It is mainly discussed in econo- data which point toward possible explanations and
metrics (Goldberger, 1964), in psychometrics (Jore- which in some cases of univariate recursive systems
skog, 1973) and in sociology (Duncan, 1969); it includes could be consistent with a causal explanation.
univariate recursive regression systems and multivari-
ate regressions as a subclass but, in general, a zero
coefficient in a structural equation does not correspond 3. SOME GRAPHICAL REPRESENTATIONS
to an independence relation. More generally the graphi-
With only three component variables, the number o
cal representations to be introduced in Section 3 are
possible special independency models is fairly small
equivalent to those used in path analysis and in discus-
but with four and more components there is a quite
sions of structural equations only in rather special
rich and potentially confusing variety of special cases
cases. We deal with this important point further in
to be considered. Graphical representation helps clarify
Section 4.
the various possibilities, and it is convenient to intro-
A representation in terms of univariate recursive
duce the key ideas and conventions in terms of three
regressions combines several advantages. First, and
variables.
most importantly, it describes a stepwise process by
A systematic account of graphical methods by Whit-
which the observations could have been generated and
taker (1990) emphasizes undirected graphs, that is,
in this sense may prove the basis for developing poten-
systems in which all variables are treated on an equal
tial causal explanations. Second, each parameter in
footing. Here we use largely directed graphs to empha-
the system has a well-understood meaning since it is a
size relations of response and dependence; it is fruitful
regression coefficient: that is, it gives for unstandard-
also to allow two different kinds of edge between the
ized variables the amount by which the response is
nodes of a graph and to introduce some additional
expected to change if the explanatory variable is in-
special features.
creased by one unit and all other variables in the First we introduce, where appropriate, a distinction
equation are kept constant. As a consequence, it is
between the response variables of primary interest,
also known how to interpret each additional zero re-
one or more levels of intermediate response variables,
striction: in the case of jointly normal variables, each and explanatory variables, all in general with several
added restriction introduces a further conditional inde-
component variables. The distinction between variable
pendence, and it is known how parameters are modified types is usually introduced on a priori subject matter
if variables are left out of a system (Wermuth, 1989).
considerations, for example via the temporal ordering
Third, general results are available for interpreting
of the variables. Sometimes, however, there are several
structures, that is, for reading all implied independen- such provisional interpretations and some may be sug-
cies directly off a corresponding graph (Pearl, 1988;
gested by the data under analysis. The distinction
Lauritzen et al., 1990) and for deciding from the graphs between variable types is expressed in the graphs via
of two distinct models whether they are equivalent
(c) below.
(Frydenberg, 1990a). Fourth, an algorithm exists (Pearl The following conventions have been used in con-
and Verma, 1991; Verma and Pearl, 1992) which de- structing the graphs in this paper and are illustrated in
cides for arbitrary probability distributions and an their simplest form in Figures 1-3 for three variables:
almost arbitrary list of conditional independence state-
ments whether the list defines a univariate recursive (a) each continuous variable is denoted by a node, a
system; if it does, a corresponding directed acyclic circle;

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
208 D. R. COX AND N. WERMUTH

~~ ox~~~
S1 S SH~I~x~
00

(a) (b) (c) (a) (b) (c) (d)

FIG. 3. Four distributionally equivalent ways of specifying


Y 11. X; (a) linear structure in covariances with pyv = 0, Pxv
Pyx = 0; (b) univariate recursive regression model with 8v..
ivy. * 0, %yX = 0; (c) multivariate regression chain model wi
Pyv.x * 0, ivL * 0, fly. = 0; (d) multiple regression of V on
independent regressors Y, X, with fvy.x * 0, fivx.y * 0, Pyx
(d) (e) (f)
FIG 1. Six distributionally equivalent ways of specifying a satu-
rated model for three variables. (a) Joint distribution of Y, X, V
with three substantial concentrations: (b) joint distribution of in a symmetrical way, for instance as both response
Y, X, V with three substantial covariances; (c) multivariate regres- variables, and connected by undirected edges (lines
sion chain model with regressions of Y on V and of X on V and
without arrowheads, for correlations), whereas rela-
with correlated errors; (d) block regression chain model with
tions between variables in different boxes are shown
regressions of Y on X, V and of X on Y, V; (e) univariate regression
of Yon X, V and joint distribution of X, V; (f) univariate recursive by directed edges (arrows, for regression coefficients)
regression system with Y as response to X, V; X as intermediate such that an arrow points from the explanatory vari-
response to V. For instance, graph (e) with double lines round able to the response;
the right-hand box would represent the standard linear model for
(g) graphs drawn with boxes represent substantive
regression of Y on fixed explanatory variables X, V.
research hypotheses (Wermuth and Lauritzen, 1990)
in which the presence of an edge means that the corre-
sponding partial correlation is large enough to be of
(b) there is at most one connecting line between each
substantive importance. This corresponds to the notion
pair of nodes, an edge;
that the model being represented is the simplest appro-
(c) variables are graphed in boxes so that variables
priate one in the sense that relations considered to be
in one box are considered conditionally on all boxes to
unimportant are not part of the model; graphs obtained
the right (in line with the notation P(A I B) for the
by removing the boxes represent statistical models in
probability of A given B) so that the response variables
which a connecting edge places no such constraint on the
of primary interest are in the left-hand box and its
correlation, that is, it could also be a zero correlation;
explanatory variables are in boxes to the right;
(h) a row of unstacked boxes implies an ordered
(d) if full lines are used as edges, each variable is
sequence of (joint) responses and (joint) intermediate
considered conditionally on other variables in the same
responses, each together with their explanatory vari-
box (as well as those to the right), whereas if dashed
ables. Boxes are stacked if no order is to be implied,
lines are used variables are considered ignoring other
in order to indicate independence of several (joint)
response variables in the same box, that is, marginally
variables conditionally on all boxes to the right;
with respect to response variables in the same box;
(i) if the right-hand box has two lines around it, then
(e) the absence of an edge means that the corre-
the relations among variables in this box are regarded
sponding variable pair is conditionally independent,
as fixed at their observed levels; this is to indicate a
the conditioning set being as specified in (d);
regression model instead of a regression chain model,
(f) variables in the same box are to be regarded
the latter containing parameters also for those compo-
nents which are exclusively explanatory.

V ~~~yV O O
In the present paper we use only graphs with edges
of one type, that is, either all full lines or all dashed
lines. It would be possible to have mixture of the two
types of edge in the same graph, for example provided
(a) (b) (c) (d) that all the edges within one block are of the same
FIG. 2. Four distributionally equivalent ways of specifying type and all the edges directed at a particular block
Y IL X I V; (a) covariance selection model for Y, X, V having are of the same type.
parameters Pyv.x * 0, Pzv.y ? 0, and Pyx.v = 0; (b) univariate re- In a sense the distinction between full and dashed
cursive regression model with byv.x * 0, /yx.v = 0, 8vx ? 0; (c) edges serves a double purpose. The distinction between
block regression chain model with Y, V as joint responses to X
full and dashed arrows from one box to another deter-
and with independent parameters Pyv.x * 0, 8yxv = 0, vX.y * 0;
(d) two independent regressions of Y on V and of X on V with
mines the different conditioning sets used in the vari-
/Jyv * 0, /Ixv * 0, Pyx.v = 0. ous regression equations under consideration. The

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 209

distinction between full and dashed lines within a box tions. That is, the coefficients p in these equations
specifies whether it is the concentration or the covari- have an interpretation as regression coefficients. Direct
ance matrix of the residuals that is the focus of interest. calculation shows that
In this sense the nature of the edges corresponds to
the parameters of interest.
cov(E,eyx) = cov(Y - pX,X - pY) = -p(l -p2)
The joint distribution of all variables is in the present which is nonzero unless p = 0. That is, the two regres-
context specified by the vector of means and the covari- sion equations imply correlated residuals except for
ance or the concentration matrix. However any such degenerate cases.
given matrix may correspond to a number of models On the other hand, if we were to adopt
with quite different interpretations in the light of the
distinction between types of variable as response, inter- Y-pX=Ey, X-pY=ex
mediate response or explanatory variable. A complete as structural equations with uncorrelated residuals,
graph, that is, one in which all edges are present, then another direct calculation shows that the regres-
represents a saturated model, that is, in the present sion of Y on X is
context a model without any specified independence
relations. E(Y I X = x)= E(YX) var(=y) + var( x)
To stress the distinction between the multivariate E(YIXx) X)x -a Cy p ar(x
regression and block regression contained in Figure 1,
which is not px, again unless p = 0. That is, the coeffi-
we write the corresponding equations explicitly. The
cients in these structural equations do not have an
multivariate regression equations implied by Figure lc
interpretation as regression coefficients, as was noted by
are
Haavelmo (1943).
E(Y IV = v) - -y = flyv (v -pv), To make the related point that missing edges in the
E(X V = v) -ax. = 8ixv (v -pv), graphical representation of linear structural equations
with (Van de Geer, 1971) do not in general have the indepen-
dency interpretation of chain graphs, consider the fol-
cov(8Y.U9 eX.V) = Pyx.v ( cyy.vcxx.v ) lowing two structural equations

By contrast the block regression equations implied by Y+ yXX+ yYV =y,


Figure ld are yXYY+X+ Yxww= ex,
E(Y X= x,V= v) -y illustrated in Figure 4. For correlated errors (ey, ex),
a count of parameters shows that this represents a
= -yx.v (XPx) + fiyv.x (v-v),
saturated model; that is, it allows an arbitrary covari-
E(XI Y = y, V =v)-llx ance matrix for (Y, X, V, W)T. That is, in particular, the
missing edges between V and X, and between W and
= /Xy.V(y -y) + 8XVY.(V -V)9 Y do not imply independencies, conditional or uncondi-
with tional. For some further discussion of possibilities for
interpreting the parameters in this model see Wermuth
fIyx. v = Pyx. v(ayy. v I axx.v ) ,f xy.v = Pyx.v ( lxx.v I ayy.v )
(1992) and Goldberger (1992). For linear structural
cov (ey.Xv, eX.yv) = -Pyx.v (ayy.xvaxx.yv) 1
equations in general, the interpretation of equation pa-
where the conditional variance of the variable given
all remaining variables is the reciprocal value of a
precision, for example, ayyxv = 1 / aYY. Relations be-
tween the sets of parameters in the two types of regres-
sions are given by Equations (6) and (7). K V Yyv Y

4. RELATIONS WITH PREVIOUS WORK Yxy Yyx

We illustrate the distinction between the graphical


chain models of the present paper and structural equa- \W tYxw X
tion models via two examplos. Suppose first that X
and Y are standardized to mean zero and variance one
and denote their correlation coefficient by p. Then
FIG. 4. Graphical representation of two structural equations in

Y=pX+egy X=PY+8x9 which the missing edges for (V, X) and (W, Y) do not correspond
to independencies and do not restrict the covariance matrix for
where (eys ex) are residuals from linear regression equa- (Y, X, W, V)T.

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
210 D. R. COX AND N. WERMUTH

rameters, be they present or missing, has to be derived result as further combinations of these four building
from scratch for each model considered. blocks.
However, an interpretation in terms of independenc- Univariate recursive regression graphs are essen-
ies is available also for structural equations, whenever tially identical to the directed acyclic graphs used in
such a model is distributionally equivalent to one of work on expert systems (Pearl, 1988). One of the latter
the chain graph models, that is, if the same joint results from one of the former by replacing the com-
distribution holds for the two types of models, possibly plete undirected graph of the explanatory variables by
specified in two distinct ways, and the parameter vec- an acyclic orientation, that is, by a univariate recursive
tors of the two models are in one-to-one correspon- regression graph in arbitrary order of the nodes and
dence. by discarding all boxes.
Three classes or families of models can be identified To investigate distributional equivalence it is helpful
to have this property. These are models that have a to use the notion of a skeleton graph introduced by
representation by a chain graph which is: Verma and Pearl (1992). A skeleton graph is obtained
from our Figures by removing boxes and arrows and
[1] a covariance graph, that is, a single box graph in
ignoring the type of edge. For instance, the skeleton
which all present edges are undirected dashed
graphs in Figures 2a to 2d are all the same. If the
lines, as in Figures lb and 3a;
skeletons differ then the corresponding models cannot
[21 a multivariate regression graph, that is, a two-box
be equivalent. But if the skeletons are the same, then
graph in which all present edges are dashed, being
the graphs may still imply different independencies, as
lines within and arrows between boxes, as in
in Figures 2 and 3.
Figures lc and 3c and in which the right-hand
Distributional equivalence to a model of univariate
box has two lines around it, the distribution of
recursive regressions is closely tied to our notion of a
its components being fixed.
nondecomposable independence hypothesis. We speak
[3] a univariate recursive regression graph, that is,
of a decomposable model if it is distributionally equiva-
a graph of q + 1 boxes, q of them with a single
lent to a model of univariate recursive regressions
response variable and the right-hand box with
and of a nondecomposable model otherwise. Thus, all
p - q additional explanatory variables, as in Fig-
saturated chain models for linear relations considered
ures lf, 2b and 3b. In addition the right-hand
in this paper are decomposable, since they all specify
box has two lines around it to indicate that only
the same joint distribution (Figure 1). A nonsaturated
the conditional distribution of Yi, ... ,Yq given
model is decomposable if and only if it contains not
the remaining variables is the model of interest.
even one nondecomposable independence hypothesis.
The conventions (a) to (i) for constructing chain In complex cases, such a model may contain large
graphs imply for univariate recursive regression graphs sections that are decomposable and in analysis and
that arrows have the same interpretation no matter interpretation account can be taken of that.
whether they are all dashed or whether they are all This notion of a decomposable model coincides with
full arrows. That is whenever there are no proper joint the notion of a decomposable graph when this graph
responses in a model then dashed and full edge arrows has undirected full edges, that is, when it is a concentra-
are interpreted in the same way. tion graph. For variables with a joint normal distri-
To distinguish better between dashed and full-edge bution a concentration graph specifies a covariance
graphs when their interpretation differs we suggest selection model (Dempster, 1972). Such a model is de-
speaking further of: composable if and only if the concentration graph is
triangulated, that is, if it does not contain a chordless
[4] a concentration graph, that is, a single box graph
n-cycle for n 2 4 (Wermuth, 1980; Speed and Kiiveri,
in which all edges are undirected full lines, as in
1986). A sequence of nodes (a,, . .. , an) is said to form
Figures la and 2a;
a chordless n-cycle in a chain graph if only consecutive
[5] a block regression graph, that is, a two-box graph
nodes and the endpoints of the sequence are connected
in which all present edges are full, being lines
by edges and a chordless cycle in a sequence of four
within and arrows between boxes, as in Figures
or more variables characterizes a nondecomposable in-
ld and 2c and in which the right-hand box has
dependence hypothesis in concentrations. An example
two lines around it.
is Form (i) for (Y, X, V, W) discussed in Section 5. A
Then, a multivariate regression chain graph can be special well-studied example of a decomposable covari-
viewed as a combination of a (sequence of) graph(s) ance selection model is represented by a chordless
[2] with [1] and a block regression chain graph as a n-chain in concentrations, that is, sequence of nodes
combination of a (sequence of) graph(s) [5] with [4]. (ai, ... , an) for which only consecutive nodes of the
More general chain graphs with both types of edges sequence are connected by edges. This is a Markov

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 211

y ;t
(a) (b) (a) (b)

FIG. 7. Univariate recursive regressions (a) specifying (iv):


FIG. 5. Block regression chain model (a) and covariance selection
Y L W I (X, V) and X JL V I W and independent multiple re-
model (b) both specifying the nondecomposable hypothesis (i):
gressions with independent explanatory variables (b) specifying
Y 1 W I (X, V) and X j V I (Y, W). (v): Y 1 X I (V, W and V 1 W.

chain model. An example is Form (vi) for (Y, X, V, W)


discussed in Section 5. (iii) Y I W and X I V,
Figures 1-3 show that not only full-edge but also
dashed-edge chain graph models can be decomposable, called the chordless four-cycle in correlations (see Fig-
that is, distributionally equivalent to a model of univar- ure 6b), a special case of covariance matrices with
iate recursive regressions. We characterize situations linear structure (Anderson, 1973).

in which this is not possible for four variables in the These may be contrasted with a decomposable model
next section. based on a recursive sequence of univariate regressions
with Y as response to X, V, W, with X as response to
5. SOME EMPIRICAL EXAMPLES V, W and with V as response to W and having restric-
tions on the same two variable pairs (see Figure 7a)
We now introduce eight special kinds of indepen-
dence hypothesis for four variables, together with their (iv) Y W I (X,V)andX J V I W.
associated graphs, and illustrate most of them via
Four further cases, the first two decomposable, the
empirical examples. All involve two or more indepen-
last two not, are
dency conditions. The special structures we shall con-
sider are as follows, the first three and the last two (v) Y XI(V,W)andV1LW,
being nondecomposable:
two independent regressions of Y and X on two inde-
(i) Y j WI (X,V)andX L VI (Y,W), pendent regressors V and W (see Figure 7b);
(see Figures 5a and 5b) called the chordless four-cycle (vi) Y (V, W) I XandX J W I V,
in concentrations and which correspond to the van-
called a chordless four-chain in concentrations or a
ishing of two elements in the concentration matrix,
Markov chain (see Figures 8a and 8b), that is, a
and hence to a special case of the covariance selection
chordless four-chain in a system of univariate recursive
models (Dempster, 1972). It can also be viewed as a
regressions again with Y as response to X, V, W, with
chordless four-cycle in a block regression chain model
X as response to V, W and with V as response to W
with joint responses Y, X and joint explanatory vari-
and having response Y and explanatory variable W as
ables V, W. Next we consider
chain endpoints;
(ii) Y J WI VandX JL VI W,
(vii) Y J WandX J1 Vand V H W,
called a chordless four-cycle in a multivariate regres-
called a chordless four-chain in covariances (see Figures
sion chain model (see Figure 6a) and which contains
9a and 9b) or a chordless four-chain in a multivariate
regressions of Y and X on V and W, being a special
regression chain model with Y, X as joint responses
case of the seemingly unrelated regressions of Zeilner
and having explanatory variables V, W as chain end-
(1962);
points;

y _0vY Q-O----0V

Y X V W Y X V W
xo-~~~Wx6 -----ow

(a) (b)
(a) (b)
FIG. 6. Multivariate regression chain model (a) specifying the
nondecomposable hypothesis (u): Y iL W I V and X iL V W FIG. 8. Univariate recursive regressions (a) and covariance selec-
and a linear in covariances structure (b) specifying the tion model both specifying the decomposable hypothesis (vi):
nondecomposable hypothesis (iii): Y iL W and X iL V. Y H (V, WMIXandX H WI V.

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
212 D. R. COX AND N. WERMUTH

to cope with stressful events. Questionnaires with


yol--r .--e ~v y x w which the state-trait versions of the emotions anxiety
Y o----<}---ov ----e
xo ----OwI
and anger are measured have been developed by Spiel-
berger et al. (1970, 1983). We obtained data for 684
female college students from C. Spielberger on the
(a) (b)
variables Y, state anxiety; X, state anger; V, trait
FIG. 9. Multivariate regression chain model (a) and a linear in
anxiety and W, trait anger; summaries are displayed
covariances structure (b) both specifying the nondecomposable
in Table 1.
hypothesis (vii): Y A1 W and X AL V and V AL W.
The upper corner of Table 1 shows close agreement
with the Form (i): Y 1 WI (X,V) andX I1 V I (Y,W),
(viii) Y W X,V) andX I V I (Y, W) see also Figures 5a and 5b. This nondecomposable
(vili) ~and ViLW; model has the simple interpretation that prediction of
called a chordless four-chain in a block regression chain either state variable is not further improved by adding
model with Y, X as joint responses and having explana- the other trait variable to the remaining two explana-
tory variables V, W as chain endpoints. The correspond- tory variables but it does not directly suggest a step-
ing chain graph has the same shape as the graph in wise process by which the data might have been
Figure 9a, but dashed lines and arrows are replaced generated.
by full lines and arrows. Example 2 [Table 2, Figure 6a, Form (ii)]. From a
For our present purpose we give for each empirical study of the status and reactions of patients awaiting
example correlations and standardized concentrations a particular kind of operation (Slangen, Kleeman and
showing these as the lower and upper triangle, respec- Krohne, 1992) we obtained as basic information for 44
tively, such as in Table 1. This allows direct detection female patients: Y, the ratio of systolic to diastolic
of linear marginal independencies between pairs of vari- blood pressure; X, the diastolic blood pressure; both
ables, as shown by very small marginal correlations, measured in logarithmic scale; V, body mass, that is,
that is, standardized covariances, and linear condi- weight relative to height, and W, age. Table 2 shows
tional independencies between pairs of variables given substantial correlations except for a small marginal
all remaining variables, as shown by very small partial correlation of pair (Y, W) and a small partial correlation
correlations, that is, standardized concentrations. of pair (X,V). These are not to be directly interpreted
For a formal analysis, consistency of data with a if -as appears reasonable -each of the blood pressure
particular structure would be examined via a likelihood variables is regarded as a potential response to body
ratio test or its equivalent, typically comparing a maxi- mass and age. Instead, the standardized regression
mum likelihood fit of the constrained model with that coefficients in a saturated multivariate regression of
of a saturated model. For the present purposes, how- Y, X on V,W display possible independencies of in-
ever, it is enough to rely on informal comparisons of terest. They show close agreement with Form (ii):
Y J W I VandX jL V I W, see also Figure 6a, wi
marginal correlations, partial correlations or standard-
ized regression coefficients, although such dimen- standardized regression coefficients
sionless measures are not in general appropriate for
VW 4W.A - (0.486 0.040\
comparing different studies.
Example 1 [Table 1, Figure 5, Form (i)]. Emotions
0AV.W ,X W./ \0.037 -0.2
as dispositions or traits of a person and emotions as and from Table 2 correlated errors since
states, that is, as evoked by particular situations, are This nondecomposable model gives as interpretation
notions central to research on stress and on strategies that diastolic blood pressure increases just with age

TABLE 1

Observed marginal correlations (lower half) and observed partial correlations given two remaining variables (upper half) means
and standard deviations for n = 684 students

y x V W
Variable State anx State ang Trait anx Trait ang

Y: = State anxiety 1 0.45 0.47 -0.04


X: = State anger 0.61 1 0.03 0.32
V: = Trait anxiety 0.62 0.47 1 0.32
W: = Trait anger 0.39 0.50 0.49 1
Mean 18.87 15.23 21.20 23.42
Standard deviation 6.10 6.70 5.68 6.57

Data for Example 1 to Form (i): Y 1 W

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 213

TABLE 2
Observed marginal correlations (lower half), observed partial correlations given all remaining variables (upper half), means and
standard deviations for n = 44 patients

y x V W
Variable Lratio bp Lsyst. bp Body mass Age

Y: = Log (syst/diast) bp 1 -0.566 -0.241 0.300


X: = Log diastolic bp -0.544 1 -0.107 0.491
V: = Body mass -0.253 0.336 1 0.572
W: = Age -0.131 0.510 0.608 1
Mean 0.453 4.29 0.379 29.52
Standard deviation 0.091 0.13 0.060 10.59

Data for Example 2 to Form (ii): Y 1 W I V and X 1 V I W and to Figure 6a.

after controlling for an increase in body mass and that Pairs of forms from the above special cases (i) to (iv)
the ratio of systolic to diastolic blood pressure is higher are mutually exclusive whenever the correlations of all
the lower the body mass for persons of the same age. variable pairs other than the two constrained pairs
But again, the model does not directly suggest a step- (Y, W) and (X, V) are substantial although with limited
wise process by which the data could have been gener- data it is of course possible that several different simpli-
ated. fied structures are consistent with the data. An excep-
Example 3 [Table 3, Figure 6b, Form (iii)]. In a study tion where two different sets of the above conditions
of strategies to cope with stressful events Kohlmann may hold simultaneously is provided by (i) and (iii);
(1990) collected data for 72 students replying to a that is, a chordless four-cycle in concentrations and in
German and an American questionnaire. They are both correlations can occur together if a very special struc-
intended to capture two similar strategies: Y, cognitive ture is present, that is if the marginal correlations in
avoidance and V, blunting are thought of as strategies the population satisfy orthogonalities such as
to reduce emotional arousal and X, vigilance and W,
Pyw = 0, Pxv = 0,
monitoring as strategies to reduce insecurity. The data
(8) PyvPuw + PyxPxw = 0,
in Table 3 agree well with Form (iii): Y II W and
PyvPyx + PvwPxw = 0.
X 1L V, see also Figure 6b, but not with (i) because in
this case the marginal correlations but not the partial The next set of data is an example of this special case.
correlations are small. Example 4 [Table 4, Figures 5b and 6b, Forms (i)
It is plausible to see strong positive correlations and (iii)]. In a study of effects of working conditions o
between both pairs of similar strategies, a moderate the manifestation of hypertension, Weyer and Hodapp
negative correlation between each set of competing (1979) report the correlations among the four potential
strategies measured one way and no correlation be- influencing variables displayed in Table 4 for 106
tween a strategy measured with one questionnaire and healthy employees. The variables, which are measured
the competing strategy measured with the other ques- with questionnaires, are Y, nervousness; X, stress at
tionnaire. However, this structure again cannot be work; V, satisfaction with work and W, hierarchical
reexpressed with zero regression coefficients in any status at work. The observations agree well with both
system of recursive univariate regressions; that is, it (i): Y I1 W (X,V) andX 1L V (Y,W) (see also Figure
does not have a direct explanation as a process by 5b) and with (iii): Y HI W and X 1 V (see also Figure
Iwhich the data could have been generated. 6b). There is no immediate interpretation; however, one

TABLE 3

Observed marginal correlations (lower half) and observed partial correlations given two rem
means and standard deviations for n = 72 students

y x V w
Variable Cogn. avoid. Vigilance Blunting Monitoring

Y: = Cognitive Avoidance 1 -0.30 0.49 0.21


X= Vigilance -0.20 1 0.21 0.51
V:- = Blunting 0.46 0.00 1 -0.25
W.-Monitoring 0.01 0.47 -0.15 1
Mean 17.49 12.57 3.71 10.40
Standard deviation 6.77 6.39 2.12 3.07

Data for Example 3 to Form (iii): Y HL W and

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
214 D. R. COX AND N. WERMUTH

TABLE 4
Observed marginal correlations (lower half) and observed partial correlations given two remaining variables (upper half)
for n = 106 healthy employees

y x V w
Variable Nervous Stress Satisf. Hier. Stat.

Y: = Nervousnous 1 0.33 0.26


0.00
X: = Stress at work 0.34 1 0.06
0.30
V: = Satisfaction with work 0.27 0.04 1 -0.35
W: = Hierarchical status 0.01 0.29 -0.34 1

Data for Example 4 to Forms (i) and (iii) and to Figures 5b and 6b simultaneously.

explanation for this special structure is that a different Example 6 [Table 6, Figure 8, Form (vi)]. In a condi-
combination of the questionnaire items of X, V would tioning experiment with 48 subjects (Zeiner and Schell,
lead to variables X*, V* such that the much simpler 1971), one purpose was to examine discrimination be-
structure (X*, Y) IH (V*, W) holds (Cox and Wermuth, tween a noxious and an innocuous stimulus in two
1992a). For the special structure (8) both the canonical periods of a conditioning experiment with Y, a long-
correlations and the transformation matrix to obtain interval discriminatory response (6-10 seconds); X, a
X*, V* can be expressed in closed form. short-interval discriminatory response (1-5 seconds) in
Example 5 [Table 5, Figure 7b, Form (v)]. For an the light of earlier responses: V, the strongest response
analysis of aggregate economic data von der Lippe (1977) in the first interval and W, the response to an innocu-
computed growth rates for 24 postwar years in Ger- ous stimulus before the experiment itself; all responses
many for Y, employment; X, capital gains; V, private are measured as skin resistance. The correlations dis-
consumption and W, exports. The correlation structure played in Table 6 suggest (Hodapp and Wermuth, 1983,
suggests that knowing the change in capital gain does p. 384) a Markov structure (vi) in which Y I I (V,W) I X
not help in predicting the change in employment for and X I1 W I V, see also Figures 8a and 8b, and thus
given change levels of the demand side, that is, consump- in which the long-interval discriminatory response de-
tion and export (Wermuth, 1979); in addition, changes pends directly only on the short-interval discriminatory
in consumption were not correlated with changes in response; this short-interval response is directly depen-
exports. This implies two independent responses to two dent on the strongest response in the short interval
independent explanatory variables or close agreement and the latter is well predicted by just the response to
to Form (v): Y X (V, W) and V j W; see also an innocuous stimulus before the experiment.
Figure 7b. Example 7 [Table 7, Figure 9, Form (vii)]. From an

TABLE 5
Observed marginal correlations (lower half) and observed partial correlations given two remaining variables (upper half)
of growth rates for n = 24 postwar years in Germany

y x V W
Variable Employment Capital gain Consumption Export

Y: = Employment 1 -0.11 0.68 0.55


X: = Capital gain 0.47 1 0.50 0.43
V: = Consumption 0.67 0.55 1 -0.51
W: = Export . 0.44 0.39 0.04 1

Data,for Example 5 to Form (v): Y 11 X (V, W) and V 1 W and to Figure 7b.

TABLE 6
Observed marginal correlations (lower half) and observed partial correlations given two remaining variables (upper half)
for n = 48 subjects

y x V W
Variable Long Short Strong Innoc

Y: = Long int. discriminatory response 1 0.70 -0.04 -0.12


X: = Short int. discriminatory response 0.72 1 0.29 0.14
V: = Strongest short interval response 0.30 0.54 1 0.62
W: = Response to innocuous stimulus 0.19 0.43 0.71 1

Data for Example 6 to Form (vi): Y (V,W) X and X jL V I W and to Figures 8a and 8b.

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 215

TABLE 7
Observed marginal correlations (lower half) and observed partial correlations given two remaining variables (upper half),
means and standard deviations for n = 39 diabetic patients

y x V W
Variable GHb Knowledge Duration Fatalism

Y: = Glucose control, GHb 1 -0.431 -0.407 -0.262


X: = Knowledge, illness -0.344 1 -0.111 -0.517
V: = Duration, illness -0.404 0.042 1 -0.028
W: = Fatalism, illness -0.071 -0.460 0.060 1
Mean 10.02 33.18 147.05 20.13
Standard deviation 2.07 7.86 92.00 5.75

Data for Example 7 to Form (vii): Y 1 V, Y _L W, and X 1 V and to Figures 9a and 9b.

investigation of determinants of blood glucose control (a) Nondecomposable hypotheses in block regression
(Kohlmann et al., 1991), we have data for 39 diabetic chain models [Form (i), Example 1, Table 1, Figure 5a
patients, who had at most 10 years of formal schooling. and Form (viii)]. In a block regression chain model the
The variables considered are Y, a particular metabolic components, even in the simplest case, are divided
parameter, the glycosylated hemoglobin GHb; X, a into responses Ya = (Y, X) and explanatory variables
score for particular knowledge about the illness, V, the Yb = (V, W) with a full directed arrow unless the corre-
duration of illness in months, and W, a questionnaire sponding regression coefficient in (3) is zero and a full
score measuring the patients external attribution to undirected line for the explanatory variables unless
"chance" of the occurence of events related to the ill- they are marginally uncorrelated. For four variables a
ness; an attitude called external fatalism. The correla- nondecomposable independence hypothesis in a block
tions in Table 7 suggest a structure of the Form (vii), regression chain model is characterized by a chordless
that is, with Y JL W,X JL V, and V 11 W, see also four-chain in the full edge chain graph, with the two
Figures 9a and 9b. One interpretation is that duration ends of the sequence being explanatory variables, that
of illness and external fatalism are independent explan- is, for (V, Y, X, W) in our examples. Figure 5a with
atory variables in two seemingly independent regres- Form (i) gives an example of the four-cycle which con-
sions, where metabolic adjustment is better (low values tains the described four-chain, while Form (viii) leads
of GHb) the longer the duration of the illness, knowl- to an example of the chordless four-chain;
edge about the illness is lower the higher the external (b) Nondecomposable hypotheses in concentrations
fatalism of a person, and after conditioning on duration [Form (i), Example 1, Table 1, Figure 5b]. Models of
and fatalism the metabolic adjustment is still better zero concentrations, that is, the covariance selection
the higher the knowledge (4. = -0.431). models of Dempster (1972), differ from block regression
models - from (a) - in treating all variables on an equal
footing, that is, having them in the same box where all
6. DISCUSSION
edges are full undirected lines unless the corresponding
There are a number of general issues arising from variables are partially uncorrelated given the re-
the special cases discussed in the previous section, maining component variables. For four variables a
especially the extension to more than four component nondecomposable hypotheses in concentrations is
variables and to models with other than only linear characterized by a chordless four-cycle in the associ-
dependencies; for the latter see Cox and Wermuth ated undirected graph of full edges, that is, in the
(1993). concentration graph. Figure 5b with Form (i) gives an
Graphs with, in our notation, full edges have an example of a chordless four-cycle in concentrations for
elegant connection with the theory of Markov random (V, Y,X, W).
fields which allows general properties to be deduced. (c) Nondecomposable hypotheses in multivariate re-
See Lauritzen (1989) for a survey of these topics and gression chain models [Form (ii), Example 2, Table 2,
Isham (1981) for a review of Markov random fields in Figure 6a and Form (vii), Example 7, Table 7, Figure
a broader context. Graphs with dashed edges, or possi- 9a]. In multivariate regression chain models the compo-
bly graphs with mixtures of dashed and full edges, do nents are - as for (a) - even in the simplest case divided
not have the same general features, and it is an open into responses Ya = (Y, X) and explanatory variables
question as to what exactly can be said about them in Yb = (V, W) with a dashed directed arrow unless the
generality. corresponding regression coefficient in (2) is zero, a
There are four types of nondecomposable indepen- dashed undirected line for the responses unless they are
dence hypotheses illustrated in Section 4 for four vari- partially uncorrelated given the explanatory variables,
ables, namely: and a dashed undirected line for the explanatory vari-

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
216 D. R. COX AND N. WERMUTH

ables unless they are marginally uncorrelated. For four the convergence properties of an algorithm? Again,
variables a nondecomposable independence hypothesis more is known for models represented by full-edged
in a multivariate regression chain model is character- graphs (Speed and Kiiveri, 1986; Frydenberg and Ed-
ized by a chordless four-chain in the dashed edge chain wards, 1989; Frydenberg and Lauritzen, 1989; Ed-
graph with the two ends of the sequence being explana- wards, 1992) than for models with dashed edge graphs.
tory variables, that is, for (V, Y, X, W) in our examples. Some of the latter may be fitted with algorithms suit-
Figure 6a with Form (ii) gives an example of the four- able for linear structural equations; for a discussion of
cycle which contains the described four-chain, while different alternatives see Lee, Poon and Bentler (1992).
Figure 9a with Form (vii) gives an example of the For mixtures of discrete and continuous variables,
four-chain. Both are seemingly unrelated regressions models corresponding to chain graphs with full edges
(Zellner, 1962) together with a specification for the have been intensively studied (Lauritzen and Wer-
distribution of the explanatory variables. muth, 1989; Lauritzen, 1989; Frydenberg, 1990b; Wer-
(d) Nondecomposable hypotheses in covariances muth and Lauritzen, 1990; Cox and Wermuth, 1992b;
[Form (iii), Example 3, Table 3, Figure 6b and Form Wermuth, 1993), but for models corresponding to chain
(vii), Example 7, Table 7, Figure 9b]. Models of zero graphs with dashed edges or possibly mixtures of
covariances, that is, models for hypotheses linear in dashed and full edges the extensions to discrete and
covariances (Anderson, 1973), have - as in (b) - a single mixtures of discrete and continuous variables remain
block of variables. All edges are dashed undirected to be developed.
lines unless the corresponding variables are marginally The issue of model choice in the analysis of data has
uncorrelated. For four variables a nondecomposable too many ramifications to be discussed satisfactorily
independence hypothesis in covariances is character- in the present paper; some different suitable strategies
ized by a chordless four-chain in the associated undi- for analyses with a moderate number of variables are
rected graph of dashed edges, that is, in the covariance discussed in Wermuth and Cox (1992). In general, if
graph. Figure 6b with Form (iii) gives an example of there is sufficient substantive knowledge to give a firm
the four-cycle which contains a chordless four-chain, indication both of the nature of the variables and of the
while Figure 9b with Form (vii) gives an example of independencies expected, then model choice consists
the four-chain. largely of testing the adequacy of the proposed model,
Models which contain even a single nondecompos- in particular in examining the supposedly zero correla-
able independence hypothesis cannot be distribu- tions, concentrations and regression coefficients. The
tionally equivalent to a model of univariate recursive less the guidance from subject matter considerations,
regressions. Our examples illustrate that such nonde- the more tentative will be the conclusions about model
composable structures arise in a number of different structure, but the broad principles of variable selection
contexts. There is need to identify them and to find in empirical regression discussed, for example, by Cox
explanations of how they could have been generated. (1968) and Cox and Snell (1974), will apply. In particu-
Criteria for establishing nondecomposability for more lar, where a number of different models of roughly
than four variables are not yet published for general equal complexity give satisfactory fits to the data, all
dashed-edge chain graphs, while for full-edge chain should be incorporated in the conclusions, unless a
graphs such criteria were given by Lauritzen and Wer- choice can be made on subject matter grounds.
muth (1989) and for undirected dashed line graphs by There are many aspects of the study of multiple
Pearl and Wermuth (1993). dependencies and associations not addressed in the
We have in this paper concentrated on the kinds present paper. In particular the role of latent or hidden
of special structure that can arise, especially on their variables in clarifying the interpretation of relatively
specification and interpretation, rather than on the de- complex structures has not been dealt with, nor has
tails of fitting and assessing model adequacy. Under the related matter of the effect of errors of observations
normal-theory assumptions maximum-likelihood fitting in possibly distorting dependencies. Finally, we reem-
and testing for nondecomposable models will call for phasize the point made in Section 3 that a key argu-
iterative procedures. A rather general asymptotically ment for aiming for univariate recursive regressions
efficient noniterative procedure based on embedding consistent with subject matter knowledge is that they
the model to be fitted in a saturated model is available suggest a stepwise process by which the data might
(Cox and Wermuth, 1990) either for direct use or as have been generated.
a starting point for iteration (Jensen, Johansen and
Lauritzen, 1991). Several issues are important for itera-
ACKNOWLEDGMENTS
tive algorithms. Is there a global maximum or are there
several local maxima? Which conditions guarantee the We are grateful to Morten Frydenberg for his helpfu
existence of maximum-likelihood estimates? What are comments, to the referees for very constructive sugge

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
LINEAR DEPENDENCIES 217

tions and to the British-German Academic Research convergent algorithms for maximizing a likelihood function.
Biometrika 78 867-878.
Collaboration Programme for supporting our joint
JORESKOG, K. G. (1973). A general method for estimating a linear
work.
structural equation system. In Structural Equation Models
in the Social Sciences (A. S. Goldberger and 0. D. Duncan,
eds.) 85-112. Seminar Press, New York.
REFERENCES
KOHLMANN, C.-W. (1990). Stretbewaltigung und Personlichkeit.
ANDERSON, T. W. (1973). Asymptotically efficient estimation of Huber, Bern.
covariance matrices with linear structure. Ann. Statist. 1 KOHLMANN, C. W., KROHNE, H. W., KOSTNER E., SCHREZENMEIR,
135-141. J., WALTHER, U. and BEYER, J. (1991). Der IPC-Diabetes-
Cox, D. R. (1968). Regression methods; notes on some aspects Fragebogen: ein Instrument zur Erfassung krankheits-
of regression analysis (with discussion). J. Roy. Statist. Soc. spezifischer Kontrolluberzeugungen bei Typ-I-Diabetikern.
Ser. A 131 265-279. Diagnostica 37 252-270.
Cox, D. R. (1992). Causality; some statistical aspects. J. Roy. LAURITZEN, S. L. (1989). Mixed graphical association models
Statist. Soc. Ser. A 155 291-301. (with discussion). Scand. J. Statist. 16 273-306.
Cox, D. R. and SNELL, E. J. (1974). The choice of variables in LAURITZEN, S. L., DAWID, A.P., LARSEN, B. and LEIMER, H. G.
observational studies. J. Roy. Statist. Soc. Ser. C 23 51-59. (1990). Independence properties of directed Markov fields.
Cox, D. R. and WERMUTH, N. (1990). An approximation to maxi- Networks 20 491-505.
mum-likelihood estimates in reduced models. Biometrika 77 LAURITZEN, S. L. and WERMUTH, N. (1989). Graphical models for
747-761. association between variables, some of which are qualitative
Cox, D. R. and WERMUTH, N. (1992a). On the calculation of and some quantitative. Ann. Statist. 17 31-57.
derived variables in the analyses of multivariate responses. LEE, S.-Y., POON, W.-Y. and BENTLER, P. M. (1992). Structural
J. Multivariate Anal. 42 167-172. equation models with continuous and polytomous variables.
Cox, D. R. and WERMUTH, N. (1992b). Response models for mixed Psychometrika 57 89-105.
binary and quantitative variables. Biometrika 79 441-461. PEARL, J. (1988). Probabilistic Reasoning in Intelligent Systems.
Cox, D. R. and WERMUTH, N. (1993). Some recent work on Morgan Kaufman, San Mateo, CA.
methods for the analysis of multivariate observational data PEARL, J. and VERMA, T. S. (1991). A theory of inferred causation.
in the social sciences. In Conference Proceedings of the 7th In Principles of Knowledge Representation and Reasoning
International Conference on Multivariate Analysis, Pennsylva- (J. A. Allen, R. Fikes and E. Sandewall, eds.). Morgan Kauf-
nia State Univ., May 1992. North-Holland, Amsterdam. To man, San Mateo, CA.
appear. PEARL, J. and WERMUTH, N. (1993). When can an association
DAWID, A. P. (1979a). Conditional independence in statistical graph admit a causal interpretation? In Conference Proceed-
theory (with discussion). J. Roy. Statist. Soc. Ser. B 41 ings of the 4th International Workshop on Artificial Intelli-
1-31. gence and Statistics, Fort Lauderdale, Florida. To appear.
DEMPSTER, A. P. (1972). Covariance selection. Biometrics 28 157- SLANGEN, K., KLEEMANN, P. P. and KROHNE, H. W. (1992). Coping
175. with surgical stress. In Attention and Avoidance; Strategies
DUNCAN, 0. D. (1969). Some linear models for two-wave, two- in Coping with Aversiveness (H. W. Krohne, ed.) 321-348.
variable panel analysis. Psychological Bulletin 72 177-182. Springer, New York.
EDWARDS, D. (1992). Graphical Modelling with MIM. Manual, SPEED, T. P. and KIIVERI, H. T. (1986). Gaussian Markov distri-
Univ. Copenhagen. butions over finite graphs. Ann. Statist. 14 138-150.
FRYDENBERG, M. (1990a). The chain graph Markov property. SPIELBERGER, C. D., GORSUCH, R. L. and LUSCHENE, R. E. (1970).
Scand. J. Statist. 17 333-353. Manual for the State-Trait Anxiety Inventory. Consulting
FRYDENBERG, M. (1990b). Marginalization and collapsibility in Psychologists Press, Palo Alto, CA.
graphical interaction models. Ann. Statist. 18 790-805. SPIELBERGER, C. D., RUSSELL, S. and CRANE, R. (1983). Assess-
FRYDENBERG, M. and EDWARDS, D. (1989). A modified iterative ment of anger. In Advances in Personality Assessment (J. N.
proportional scaling algorithm for estimation in regular expo- Butcher and C. D. Spielberger, eds.) 2 159-187. Erlbaum,
nential families. Comput. Statist. Data Anal. 8 143-153. Hillsdale, NJ.
FRYDENBERG, M. and LAURITZEN, S. L. (1989). Decomposition of TINBERGEN, J. (1937). An Econometric Approach to Business
maximum-likelihood in mixed interaction models. Bio- Cycle Problems. Hermann, Paris.
metrika 76 539-555. VAN DE GEER, J. P. (1971). Introduction to Multivariate Analysis
GLYMOUR, C., SCHEINES, R., SPIRTES, P. and KELLY, K. (1987). for the Social Sciences. Freeman, San Francisco.
Discovering Causal Structure. Academic, New York. VERMA, T. S. and PEARL, J. (1992). An algorithm for deciding if
GOLDIERGER, A. S. (1964). Econometric Theory. Wiley, New a set of observed independencies has a causal explanation.
York. In Uncertainty in Artificial Intelligence (D. Dubois, M. P.
GOLDBERGER, A. S. (1992). Models of substance; comment on "On Wellman, B. D'Ambrosio and P. Smets, eds.) 8 323-330.
block recursive linear regression equations," by N. Wermuth. Morgan Kaufmann, San Mateo, CA.
Revista Brasileira de Probabilidade e Estatistica 6 46-48. VON DER LIPPE, P. (1977). Beschaftigungswirkung durch Umver-
HAAVELMO, T. (1943). The statistical implications of a system of teilung? WSI-Mitteilungen 8 505-512.
simultaneous equations. Econometrica 11 1-12. WERMUTH, N. (1979). Datenanalyse und multiplikative Modelle.
HODAPP, V. and WERMUTH, N. (1983). Decomposable models: A Allgemeines Statistisches Archiv 63 323-339.
new look at interdependence and dependence structures in WERMUTH, N. (1980). Linear recursive equations, covariance se-
psychological research. Multivariate Behavioral Research 18 lection, and path analysis. J. Amer. Statist. Assoc. 75 963-
361-390. 972.
ISHAM, V. (1981). An introduction to spatial point processes WERMUTH,
and N. (1989). Moderating effects in multivariate normal
Markov random fields. Internat. Statist. Rev. 49 21-43. distributions. Methodika 3 74-93.
JENSEN, S. T., JOHANSEN, S. and LAURITZEN, S. L. (1991). Globally WERMUTH, N. (1992). On block-recursive regression equations

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms
218 D. R. COX AND N. WERMUTH

(with discussion). Revista Brasileira de Probabilidade e Es- tension. In Stress and Anxiety (I. G. Sarason and C. D.
tatistica 6 1-56. Spielberger, eds.) 6 337-349. Hemisphere, Washington, D.C.
WERMUTH, N. (1993). Association structures with few variables: WHITTAKER, J. (1990). Graphical Models in Applied Multivariate
characteristics and examples. In Theory and Methods for Statistics. Wiley, Chichester.
Population Health Research (K. Dean, ed.) 181-202. Sage, WOLD, H. 0. (1954). Causality and econometrics. Econometrica
London. 22 162-177.
WERMUTH, N. and Cox, D. R. (1992). Graphical models for de- WRIGHT, S. (1921). Correlation and causation. Journal of Agricul-
pendencies and associations. In Computational Statistics, tural Research 20 557-585.
Proceedings of the 10th Symposium on Computational Statis- WRIGHT, S. (1923). The theory of path coefficients: A reply to
tics, Neuchatel. (Y. Dodge and J. Whittaker, eds.) 1 235- Niles' criticism. Genetics 8 239-255.
249. Physica, Heidelberg. ZEINER, A. R. and SCHELL, A. M. (1971). Individual differences in
WERMUTH, N. and LAURITZEN, S. L. (1990). On substantive re- orienting, conditionality, and skin resistance responsitivity.
search hypotheses, conditional independence graphs and Psychophysiology 8 612-622.
graphical chain models (with discussion). J. Roy. Statist. ZELLNER, A. (1962). An efficient method of estimating seemingly
Soc. Ser. B 52 21-72. unrelated regressions and tests for aggregation bias. J.
WEYER, G. and HODAPP, V. (1979). Job-stress and essential hyper- Amer. Statist. Assoc. 57 348-368.

This content downloaded from 163.1.41.45 on Wed, 22 Nov 2017 16:44:19 UTC
All use subject to http://about.jstor.org/terms

Das könnte Ihnen auch gefallen