Beruflich Dokumente
Kultur Dokumente
Irini Moustaki1
Statistics Department
Athens University of Economics and Business
Karl G Joreskog
Department of Information Science
Uppsala University
Dimitris Mavridis
Statistics Department
Athens University of Economics and Business
We consider a general type of model for analyzing ordinal variables with covariate
effects and two approaches for analyzing data for such models, the item response theory
(IRT) approach and the PRELIS-LISREL (PLA) approach. We compare these two ap-
proaches on the basis of two examples, one involving only covariate effects directly on the
ordinal variables, and one involving covariate effects on the latent variables in addition.
INTRODUCTION
Latent variable models are used in social sciences for analyzing interrelationships
among observed variables. Latent variables are usually constructs that are not directly
measured such as intelligence, emotion, political belief, wealth, stress. Those unobserved
constructs are assumed to be measured through a set of observed variables also called
indicators. Methodology exists for analyzing discrete, continuous and categorical indicators.
The well known factor analysis model is a special case of a latent variable model with
indicators measured on a continuous scale.
The main idea behind latent variable models is that the latent variables account for
the dependencies among the indicators. The number of latent variables required is much
smaller than the indicators. In applications, it might be the case that additional covariates
or explanatory variables are required together with the latent variables to account for the
associations among the indicators. One might also be interested in measuring the effect of
explanatory variables (e.g. demographic variables) on the latent variables identified from
the model.
This paper looks at latent variable models for ordinal indicators that allow for co-
variate effects both on the indicators and on the latent variables. We concentrate here on
ordinal indicators since they are frequently met in social applications. The paper does not
aim to develop new methodology for handling ordinal data but rather to compare existing
methodologies in the literature in terms of easiness in fitting the models, model parameters
1
Request for reprints should be sent to Irini Moustaki, Athens University of Economics and Business,
Department of Statistics, 76 Patission Street, Athens 104 34, Greece. Email:moustaki@aueb.gr
and goodness-of-fit. The two approaches are the structural equation modelling (SEM) ap-
proach and the item response theory (IRT) approach. Regarding the SEM approach we will
concentrate here in the PRELIS-LISREL approach (PLA).
A latent variable model consists of two parts. The part of the model that accommo-
dates the effect of the latent variables and a set of observed covariates on the indicators
and is called here the measurement model with direct effects (to distinguish it from the
measurement model that only allows for latent variables) and the part of the model that
links a set of observed covariates with the latent variables and is called the structural part
of the model. Therefore, covariates are allowed to affect the indicators indirectly through
the latent variables or directly. However, there might be situations where one would like to
model the effect of a set of covariates on the latent variables and the effect of a different
set of covariates directly on the indicators. In the applications section, we discuss an ex-
ample in which we are interested in measuring overall satisfaction (latent variable) with the
National Health Service in respondents area from five ordinal indicators controlling for the
respondents political affiliation (observed covariate). In addition we allow for covariates
age and gender to affect the latent construct satisfaction.
In the literature there are two main approaches for conducting latent variable analysis.
One is the underlying variable approach (UVA) developed within the structural equation
modelling framework (SEM) which provides a general model that allows for covariate effects.
The UVA is supported by commercial software such as LISREL (Joreskog & Sorbom, 1999),
EQS (Bentler, 1992) and Mplus (Muthen & Muthen, 2000). In this paper, we will use
LISREL for computations. The other approach is the item response theory approach (IRT).
Within the IRT approach, latent variable models have recently been extended to allow for
covariate effects. Verhelst, Glas, and Verstralen (1994), Zwinderman (1997) and Glas (2001)
discussed the Rasch or the one parameter logistic model with covariate effects, Sammel,
Ryan, and Legler (1997) discussed an unidimensional latent trait model for binary and
normal outcomes that allow for covariate effects and Moustaki (2003) discussed a multi-
dimensional model for ordinal indicators with covariate effects. The models presented in
Moustaki (2003) will be compared here with the SEM approach. A comparison between
LISREL type models and IRT models for ordinal indicators without covariate effects can be
found in Moustaki (2000) and Joreskog and Moustaki (2001). Here, we extent that work to
more complex models that allow for covariate effects.
In this paper, we present two examples that differ in terms of the covariate effects
included. The first example includes only direct effects of covariates on the ordinal indicators
and the second example includes both direct and indirect effects of covariates. The LISREL
software will be used for the UVA. The LISREL software has two main components: PRELIS
(a preprocessor of LISREL) and LISREL. In PRELIS, the covariance or correlation matrix
is estimated and the measurement and structural model are fitted in LISREL. For the item
response theory approach we have developed our own software.
Latent variable models with covariate effects contain a large number of parameters
and they are very complex. An alternative would be to estimate the effect of covariates
on latent variables in two-stages. The measurement model is fitted first and factor scores
(latent scores) (Moustaki & Knott, 2000) are computed and used as dependent variables for
further analysis. Croon and Bolck (1997) found that the use of factor scores as observed
variables regressed on a set of covariates leads to biased estimates and therefore certain
corrections need to be applied.
NOTATION
Let y0 = (y1 , y2 , , yp ) be a vector of p ordinal indicators. Small letters are used
to denote both the variables and the values that these variables take. Let ci denote the
2
number of categories for the ith variable. The latent variables are denoted by a q 1 vector
z0 = (z1 , z2 , , zq ) where q < p. We will distinguish between two different sets of covariates.
Those covariates that affect the latent variables are denoted by w0 = (w1 , w2 , , wk ) and
those that affect the indicators directly by x0 = (x1 , x2 , , xr ). Covariates can be of any
type such as metric or categorical (dummy variables).
w1 H * y1
H
HH
HH
j
* z1 J
H
H
J H
H
w2
J
HH j y2
J *
J
x1
HH J
HHJ
JJ
HH^
j y3
3
Measurement model with direct effects
First, we model the associations among the y variables as explained by the latent
variables z and the covariates x. The covariates x and the latent variables z affect directly
the ordinal indicators. In addition, we allow the vector of covariates w to affect the vector of
latent variables z. The model and its estimation are discussed in detail in Moustaki (2003).
The probability of responding into a particular category s is defined as the difference
between two cumulative probabilities.
is (z, x) = is (z, x) i,s1 (z, x), i = 1, . . . , p; s = 1, . . . , ci (1)
where is (z, x) is the cumulative probability of a response in category s or lower of item yi ,
written as:
is (z, x) = i1 (z, x) + i2 (z, x) + + is (z, x)
The cumulative probability is (z, x) is modelled as a function of the latent variables
z and the observed covariates x:
q
X r
X
link[is (z, x)] = linkP (yi s | z, x) = s(i) ij zj + il xl ,
j=1 l=1
i = 1, . . . , p; s = 1, . . . , ci (2)
(i)
where s , ij , and il are parameters to be estimated. To simplify notation we just write
is .
The link function can be any monotonically increasing function that maps (0, 1) onto
(, ) such as the logit, the complementary log-log function, the inverse normal function,
(i)
the inverse Cauchy, or the log-log function. The parameters s are referred as cut-points
(i) (i) (i) (i)
on the logistic, probit or other scale where 1 < 2 < < ci , 0 = and
(i)
ci = +. The ij parameters can be considered as discrimination parameters or factor
loadings since they measure the effect of the latent variables z on some function of the
cumulative probability of responding up to a category of the ith item controlling for the
effect of the covariates x. In the one latent variable case the negative sign in front of the
slope parameter is used to indicate that as z increases the response on the observed item yi
is more likely to fall at the high end of the scale. The il are regression coefficients.
Plots of the response probabilities and the cumulative probabilities for different pa-
rameter values can be found in Moustaki (2003).
Let y = (y1 , y2 , . . . , yp ) represent the whole response pattern for a randomly selected
individual. The density function f (y | x) of the manifest variables y is:
Z + Z +
f (y | x) = g(y | z, x)h(z | w)dz (3)
where g(y | z, x) is the conditional density function of y given z and x and h(z | w) is the
density function of z conditional on w. The latent variables are assumed to be independent
with normal distributions. The covariates x are assumed to be fixed. The integrals in (3)
are approximated using Gauss-Hermite quadrature points. Other methods such as adaptive
quadrature, Laplace approximation or Monte Carlo methods can be used.
Under the assumption of conditional independence of y on z and x, the vector of
latent variables z and the vector of observed covariates x account for the interdependencies
among the observed ordinal variables so that when the latent variables are held fixed the
responses to the p observed variables are independent:
p
Y
g(y | z, x) = g(yi | z, x). (4)
i=1
4
For a manifest item yi the conditional probability of (yi | z, x) is given by:
ci
Y
g(yi | z, x) = is (z, x)yi,s
s=1
Yci
= (i,s i,s1 )yi,s , (5)
s=1
Structural model
Let us assume that the latent variables z are related to a set of observed covariates w
in a simple linear form:
z = Dw + (6)
where m refers to the m:th observation in the sample. The log-likelihood in (7) is maximized
using an E-M algorithm.
In order for the model to be identified, a necessary condition is that the covariates x
that have direct effects on the indicators must be different from the covariates w that affect
the latent variables.
5
where
(i) (i) (i) (i)
0 = , 1 < 2 < . . . < ci 1 , c(i)
i
= + ,
are parameters called thresholds. For variable yi with ci categories, there are ci 1 threshold
parameters.
The measurement model is the classical factor analysis model extended with a new
term that contains the covariates x:
q
X r
X
yi = ij zj + bil xl + ui , i = 1, 2, . . . , p , (9)
j=1 l=1
or in matrix form
y? = z + Bx + u (10)
where ui is an error term representing a specific factor and measurement error and yi is
an unobserved continuous variable underlying the ordinal variable yi . In classical factor
analysis yi is directly observed but here it is unobserved.
The structural part of the model is
z = Dw + (11)
PRELIS Step
Let x? = (x, w). As before let y(p1) be a vector of ordinal variables with underlying
variables y? . It is assumed that
y? | x? N ( + x? , ) .
To fix the scale of y? there are two equivalent specifications: the standard parameterization
and the alternative parameterization. In the applications section we use the standard pa-
rameterization. This fixes the origin of y? such that is 0 and the unit of measurement of
y? such that is a correlation matrix, see Joreskog (2002).
6
The rows of and and the diagonal elements of are estimated from the univariate
margins and the off-diagonal elements of are estimated from the bivariate margins, see
Joreskog (2002).
and ,
,
Denoting these estimates as , we have the following:
In the standard
The estimated conditional covariance matrix of y? for given x? is .
parameterization this is a correlation matrix.
The estimated unconditional covariance matrix of y? is
S 0 +
xx ,
LISREL Step
The LISREL step is straightforward. Equation (10) is interpreted as a measurement
model and equation (11) is interpreted as a structural model in LISREL (see, e.g., Joreskog,
et al, 2001, Chapter 1). The covariance structure implied by these equations and their
assumptions may be fitted to by either MLR or WLS.
ij
ij = (14)
i
bil
il = (15)
i
where i is the variance of ui in (9). This extends the results of Joreskog and Moustaki
(2001) to the case with covariate effects. To compare the PLA parameters ij and bil with
the corresponding IRT parameters ij and il , we standardize the latter as follows:
? ij
ij = qPq Pr (16)
2 2
j=1 ij + l=1 il Var(xl ) +1
ij
il? = qPq Pr (17)
2 2
j=1 ij + l=1 il Var(xl ) +1
7
In the case where there are covariates affecting the latent variables, i.e., where w-
variables are included in the model, no such standardization is needed because the LISREL
specification can be done in such a way that the unstandardized parameters of the two
approaches correspond.
APPLICATIONS
In this section we analyze a data set from the 1996 British Social Attitudes Survey2
(BSA). The data set has previously been analyzed with the logit IRT model in Moustaki
(2003).
First example
The data set consists of five ordinal indicators. Respondents were asked the question:
On the whole do you think it should or not be the governments responsibility to
The response alternatives given to the respondents were: definitely should be (DSB),
probably should be (PSB), probably not be (PNB) and definitely should not be (DSNB).
The sample size is 822.
A covariate x (available in the BSA survey) constructed to measure left to right
political identification is used, after it has been standardized, as a continuous explanatory
variable for the ordinal indicators.
There are 45 = 1024 possible response patterns but only 252 appear in the sample.
The fifteen most common response patterns are given in Table 1. We see that the response
alternative definitely should not be does not appear in any of them.
Table 2 gives the observed proportions for each category of the five ordinal indicators.
The bulk of the answers are in the first two categories especially for the indicators PriCon
and Housing.
8
Table 1: Fifteen most common response patterns.
Frequency Response pattern
88 111 11
41 222 22
23 212 12
22 222 12
22 322 22
19 222 32
18 112 11
17 322 32
15 212 22
14 112 22
13 211 11
11 221 11
11 212 11
10 121 11
10 222 21
IRT Probit, IRT Logit, PLA MLR, PLA WLS. We give parameter estimates and standard
errors for all four approaches but for evaluation of fit and for the models with covariates we
concentrate on the comparison of IRT Logit and PLA WLS.
The standardized factor loadings with their estimated standard errors are given in
Table 3. The IRT Logit estimates are generally larger than the IRT Probit estimates.
Similarly, the PLA WLS estimates are generally larger than the PLA MLR estimates. It is
also seen that the LISREL estimates are closer to the IRT Probit estimates than to the IRT
Logit estimates, particularly for MLR. This is to be expected since the Probit link function
corresponds to the assumption of underlying normality used in the PLA approach. The
difference between IRT Probit and PLA MLR is not a difference between models but rather
a difference between estimation methods. The IRT Probit is a full information maximum
likelihood method whereas the PLA MLR is a limited information maximum likelihood
method. Table 3 also shows that the standard errors are very similar across methods.
To compare the fit of the two approaches we first compare the fit to the bivariate
contingency table of the first pair of variables, namely JobEvery and PriCon. The chi-
square residuals for IRT Logit and the PLA WLS are given in Tables 4 and 5 respectively.
For IRT Logit, there are 6 chi-square residuals greater than 4. For PLA WLS there are 3
chi-square residuals greater than 4. The sum of the chi-square residuals is 51.75 for IRT
Logit and 39.84 for PLA WLS. Thus, for this pair of variables, the fit is better for PLA
WLS than for IRT Logit. But, as will be seen, for other pair of variables, it is the other way
around.
9
Table 3: Standardized Loadings
IRT PLA
Item Probit Logit MLR WLS
JobEvery .71(.03) .87(.02) .69(.03) .79(.02)
PriCon .54(.03) .76(.03) .53(.04) .62(.03)
LivUnem .78(.02) .91(.01) .79(.03) .81(.02)
IncDiff .76(.02) .90(.02) .75(.03) .77(.02)
Housing .78(.03) .92(.02) .78(.03) .82(.02)
PriCon
JobEvery DSB PSB PNB DSNB
DSB 4.93 10.90 0.34 0.25
PSB 2.65 8.87 1.58 5.07
PNB 5.01 1.65 1.44 0.10
DSNB 0.26 4.19 0.56 4.19
We extend this analysis to the rest of the pairs and we see that there are chi-square
residuals exceeding 4 in all pairs of items and in most cases there are many. This is an
indication that the one-factor model does not fit.
Tables 6 and 7 give the total GF contributions for all pairs of variables under IRT
Logit and PLA WLS, respectively. Although, using both approaches the results are not
satisfactory, we see that the total GF contribution of IRT Logit is almost half of that of
PLA WLS. Every pair of variables has 16 possible combinations of response categories and
if the GF contribution for a pair of items is larger than 16*4=64 then the fit is considered
to be poor. We see from Table 6 that the IRT model shows a satisfactory fit for all pairs
of variables whereas from Table 7 we see that the GF contributions are smaller than 64
for 5 pairs. The striking difference is that some GF contributions are much larger for PLA
WLS than for IRT Logit. For example, for the pair JobEvery and Housing, the total GF
contribution PLA WLS is more than three times that of IRT Logit.
PriCon
JobEvery DSB PSB PNB DSNB
DSB 3.04 8.48 0.17 1.11
PSB 2.48 6.36 1.55 3.00
PNB 3.28 0.68 0.59 0.01
DSNB 4.51 3.71 0.12 0.75
10
Table 6: Total GF Fits for IRT Logit
We will next examine the goodness-of-fit of the IRT Logit and PLA WLS. We have
11
looked on how well the models fit the two-way margins. The covariate is continuous and
therefore takes many different values. To check the fit of the model we take three values
(with many occurrences) and we check how good the model predicts the observed frequencies
of the bivariate margins of the indicators for these values of the covariate. We select the
values such that the first one comes from the left tail of the distribution, the second from
the middle and the third from the right tail. We select the values -1.239, -0.126 and 0.987
with frequencies 44, 103 and 53 respectively.
Tables 9, 10 and 11 give the sum of the chi-square residuals for the three values of
the covariate we have chosen a-priori for the IRT Logit model with one factor and direct
covariate effect. The fit has improved much compared to the fit we get when the one-factor
model is fitted without the covariate. The model shows bad fit only for a few pairs of
categories and the total GF has decreased in comparison with the one-factor model without
the covariate in all three cases. We should note that although we give the total GF for
three values of the covariate we have checked the fit for many other values of the covariate
and they all give similar results. Most of the problematic chi-square residuals involve the
response categories DSB and DSNB.
Table 9: Total GF Fits for IRT Logit for covariate value= -1.239
Table 10: Total GF Fits for IRT Logit for covariate value= -0.126
Table 11: Total GF Fits for IRT Logit for covariate value= 0.987
12
Tables 12, 13 and 14 give the chi-squared residuals obtained when the factor model
is fitted on the unconditional covariance matrix with WLS for the values -1.239, -0.126 and
0.987 of the covariate, respectively. In the LISREL model, for values of the covariate near
the mean value 0 the fit has improved, whereas for values at the tails of its distribution the
fit has deteriorated. The fit has deteriorated considerably for the value -1.239.
Table 12: Total GF Fits for PLA WLS for covariate value= -1.239
Table 13: Total GF Fits for PLA WLS for covariate value= -0.126
Table 14: Total GF Fits for PLA WLS for covariate value= 0.987
Second Example
The second application is also from the 1996 British Social Attitudes(BSA) Survey.
Five ordinal manifest variables were selected for the analysis. The items measure satisfaction
with the National Health Service in respondents area. The items asked are whether the
National Health Service in your area is, on the whole, satisfactory or in need of improvement.
The items asked are:
13
Quality of medical treatment by GPs [Quality]
Waiting areas at GPs surgeries [WaitingArea]
The response alternatives given to the respondents are: in need of a lot of improvement
(LI), in need of some improvement (SI), satisfactory (S), and very good (VG). These are
coded 1, 2, 3, 4, respectively. Item non-response varies between 1.5%-2.5%. After we
excluded the missing values we were left with 841 respondents. Here, we are interested in
building a model where the relationships among the five indicators are explained by a latent
variable and an observed covariate political identification and the latent variable is affected
by gender and age.
There are only 205 different response patterns. The most common response patterns
are given in Table 15.
The percentages for each category for each item are shown in Table 16. We see that
the majority of the responses fall in the two middle categories.
Table 16: Example 2: Frequency distribution for the observed ordinal items
Appointment AmountTime ChooseGP Quality WaitingArea
LI 11.4 6.5 6.7 3.8 3.6
SI 29.4 22.8 20.9 19.0 16.1
S 47.2 57.9 58.3 53.9 63.3
VG 12.0 12.7 14.1 23.3 17.1
14
The one-factor IRT logit model was fitted first to the five ordinal indicators. The
fit on the two-way margins was satisfactory if one looks at the chi-square residuals. There
were pair of categories that gave values greater than four but the total GF across categories
for all pair of items was not greater than 64. The LISREL model gave more or less the
same good fit except from two GF contributions. The LISREL model has almost double
GF-statistic in comparison with the IRT.
In our example the latent variable can be taken to measure overall satisfaction with
GPs. We would like to use the covariate political identification as an extra variable that
accounts for the relationships among the indicators. Political identification is measured as
an observed covariate with four categories: conservative, labour, liberal democrat (called
liberal for short) and other. We also want to measure the effect of gender and age on the
latent variable satisfaction. Age is given in four categories: 18-25, 26-44, 45-64, 65+. In
theory the covariate political identification should be a continuous variable but since it is
categorical in the data, it will be used as a set of three dichotomous dummy variables,
one for labour, one for liberal, and one for other. The category conservative is not used.
Similarly, since age is coded as an ordered categorical variable, it will be used as a set of
three dichotomous dummy variables, one for age 26-24, one for age 45-64, and one for age
65+. The age group 18-25 is not used. Thus, in the model we estimate one latent variable
Satisfaction, three x-variables Labour, Liberal, and Other, and four w-variables Female,
SecondAgeGroup, ThirdAgeGroup, FourthAgeGroup.
The details of the PLA approach is given in Appendix 2.
Table 17 gives the estimated standardized factor loadings and regression coefficients
for the IRT model and for the LISREL model.
Table 18 gives the estimated structural parameters for the IRT model and the LISREL
model.
IRT PLA
i di
Female -.06 (.04) -.07 (.07)
26-44 .19 (.10) .18 (.12)
45-64 .49 (.11) .49 (.13)
65+ .70 (.11) .70 (.14)
15
It is very difficult to test the fit of the model when there are many covariates because
we have to test the fit of the model for combinations of the values of the covariates. Also
we need a large sample to do that.
CONCLUSIONS
We have considered a general type of model for analyzing ordinal variables with co-
variate effects and two approaches for analyzing data for such models, the item response
theory (IRT) approach and the PRELIS-LISREL (PLA) approach. We have compared these
two approaches on the basis of two examples, one involving only covariate effects directly on
the ordinal variables, andone involving covariate effects on the latent variables in addition.
On the basis of these two examples, we find that parameter estimates are often close
but the IRT models fit the data better, often much better. We also find that the models
with covariates fit better than the models without covariates, although not much better.
Both approaches have their advantages and their disadvantages. It was expected
that the IRT method would give a better fit since it uses the whole response pattern and
no loss of information occurs, whereas LISREL uses only the univariate and the bivariate
margins. LISREL requires a large sample for the estimation of the asymptotic covariance
matrix and also we do not know the effects of the violation of the bivariate normality on
the estimated parameters. On the other hand, IRT models have been developed recently
and there is no flexible software available for fitting those models. If one wants to fit a
model with many factors, one will probably have to use LISREL, Mplus or EQS. LISREL
is a very easy to use and gives much potential to the user. LISREL also allows the user
to make the latent variables dependent, fix the dependence among them or fix any other
parameter in the model. The computational burden in the IRT models increases rapidly
as the number of factors increases, whereas this is not the case with LISREL. One way to
reduce the computational burden in IRT models is to decrease the number of quadrature
points but in that case the estimates may not be precise. LISREL provides many goodness-
of-fit measures or model selection criteria, but there are not many available for IRT models
particularly not for models with covariate effects.
Appendix 1
This Appendix gives the PRELIS and LISREL syntax files used in the analysis of
Example 1.
The data file is GVRESP.DAT. This is a text (ASCII) file where the six data values
are given on one line per person and with spaces between the numbers. There are no missing
values. The covariate PolIden is the last variable. This will not be used in the first part of
the analysis.
The following PRELIS syntax file is used to compute the polychoric correlations of the
five ordinal variables and their asymptotic covariance matrix. These are saved in the files
GVRESP.PM and GVRESP.ACP, respectively.
This gives the results reported in Table 1. The polychoric correlations are given in
Table 19.
To estimate the one-factor model with WLS we use the following SIMPLIS syntax file
16
Table 19: Matrix of Polychoric Correlations
This run gives the results reported in the last two columns of Table 3.
To use MLR instead of WLS just omit the line
The line
Options: LX=LOADINGS
is used to save the factor loadings with six decimals in the file LOADINGS. These factor
loadings are needed to compute the bivariate GF fits reported in Tables 5 and 7.
For the analysis of the five ordinal items with PolIden as covariate, we first compute
the covariance matrix of the underlying variables and the covariate. This is done with the
following PRELIS syntax file.
This will save the unconditional covariance matrix in the file GVRESP.CM and the cor-
responding asymptotic covariance matrix in the file GVRESP.ACC.
As a byproduct this run will give the conditional correlation matrix given in Table
20. The unconditional covariance matrix is given in Table 21.
Although we have taken the covariate into account, we see that all correlations remain
highly significant. As we see from the conditional correlation matrix (see Table 20), the
covariate alone does not account for the correlations of the variables underlying the ordinal
17
Table 20: Conditional Covariance Matrix
JobEvery PriCon LivUnem IncDiff Housing
JobEvery 1.000
PriCon 0.434 1.000
(0.037)
LivUnem 0.375 0.169 1.000
(0.037) (0.043)
IncDiff 0.376 0.278 0.428 1.000
(0.038) (0.041) (0.036)
Housing 0.314 0.184 0.657 0.441 1.000
(0.040) (0.044) (0.028) (0.037)
Note. Standard errors are given in brackets
indicators. Probably the introduction of a latent variable along with the covariate will
account for the correlations among the ordinal indicators better.
The joint unconditional covariance matrix that is going to be used for estimating
factor loadings and regression coefficients under the second method is given in Table 21.
We see from Table 21 that the covariate is more related to items JobEvery and IncDiff
than the other items.
To estimate the LISREL model with the covariate we use the following LISREL syntax
file.
This gives the WLS estimates and standard errors given in Table 8. In the output,
factor loadings are given in the first column and the regression coefficients are given in the
18
second column of the Gamma matrix. To obtain MLR estimates replace wls by ml on the
ou line.
Appendix 2
This Appendix gives the PRELIS and LISREL syntax files used in the analysis of
Example 2.
The data file is GPDATA.RAW. This is a text (ASCII) file where the 12 data values
are given on one line per person and with spaces between the numbers. There are no missing
values.
The following PRELIS syntax file is used to compute the covariance matrix for all 12
variables and the corresponding asymptotic covariance matrix. These are saved in the files
GP.CM and GP.ACC, respectively. These computations are done as explained briefly in
the underlying variable approach section. The PRELIS syntax file is (long variable names
can be used but PRELIS retains only the first eight characters):
The LISREL model is fitted to the covariance matrix using the following SIMPLIS
command file:
19
The three latent variables LABOUR, LIBERAL, and OTHER are defined to be
equal to the corresponding observed variables Labour, Liberal, and Other, respectively.
This is achieved by the four lines
LABOUR = 1*Labour
LIBERAL = 1*Liberal
OTHER = 1*Other
Set the Error Variance of LABOUR - OTHER to 0
This trick is necessary because in a LISREL model, there cannot be a path from an x-variable
directly to a y-variable if there are Eta-variables in the model.
The line
Set the Error Variance of Satisfaction to 1
This correspond to the assumption that The variance of in (11) is 1. This fixes the scale
for Satisfaction in such a way that the LISREL solution is comparable to the IRT solution.
The AD=OFF on the LISREL OUTPUT line is needed because the three latent variables
LABOUR, LIBERAL, and OTHER have zero error variances. Hence, the matrix in
LISREL is singular.
The AD=OFF on the LISREL OUTPUT line is needed because the scale for the latent
variables are not specified by fixed values in each column of y . SO tells LISREL to skip
this check.
The SIMPLIS input gives maximum likelihood estimates with standard errors esti-
mated under non-normality (RML). To obtain weighted least squares (WLS) estimates, just
put WLS on the LISREL OUTPUT line. The ML estimates are given in Tables 17 and 18.
References
Bentler, P. M. (1992). EQS: Structural Equation Program Manual. BMDP Statistical Software.
Croon, M., & Bolck, A. (1997). On the use of factor scores in structural equations models (Tech.
Rep. No. 97.10.102/7). Work and Organization Research Centre, Tilburg University. (WORC
Paper)
Joreskog, K. G. (2002). Structural equation modeling with ordinal variables using LISREL (Available
at http://www.ssicentral.com/lisrel/ordinal.htm).
Joreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and
multiple causes of a single latent variable. Journal of the American Statistical Association, 70,
631639.
Joreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: a comparison of three
approaches. Multivariate Behavioral Research, 36, 347387.
Joreskog, K. G., & Sorbom, D. (1999). LISREL 8 users reference guide. Chicago: Scientific Software
International.
Joreskog, K. G., Sorbom, D., Du Toit, S., & Du Toit, M. (2001). LISREL 8: New statistical features.
Chicago: Scientific Software International.
Moustaki, I. (2000). A review of exploratory factor analysis for ordinal categorical data. In R. Cudeck,
S. Du Toit, & D. Sorbom (Eds.), Structural equation modeling: present and future. Scientific
Software International.
20
Moustaki, I. (2003). A general class of latent variable models for ordinal manifest variables with
covariate effects on the manifest and latent variables. British Journal of Mathematical and
Statistical Psychology, 56, 337357.
Moustaki, I., & Knott, M. (2000). Generalized latent trait models. Psychometrika, 65, 391411.
Muthen, B. O., & Muthen, L. (2000). Mplus: The comprehensive modeling program to applied
researchers. 11965 Venice Boulevard, Suite 407, Los Angeles, CA 90066.
Sammel, R. D., Ryan, L. M., & Legler, J. M. (1997). Latent variable models for mixed discrete and
continuous outcomes. Journal of the Royal Statistical Society, B, 59, 667678.
Verhelst, N., Glas, C., & Verstralen, H. (1994). Oplm: Computer program and manual. Armhem:
CITO.
Zwinderman, A. (1997). Response models with manifest predictors. In W. van der Linden &
R. Hambleton (Eds.), Handbook of item response theory. Springer.
21