Sie sind auf Seite 1von 102

Copyright © 2009 by Mike Cheung, SEM 1

Workshop in Structural Equation Modeling (SEM)

Mike W.-L. Cheung, PhD


Assistant Professor
Department of Psychology
National University of Singapore
http://courses.nus.edu.sg/course/psycwlm/internet/

Lat update: 9 April 2009


Copyright © 2009 by Mike Cheung, SEM 2

1.1 Why do we need to use SEM?


1.Measurement error:
1.We cannot measure psychological constructs without the
contamination of measurement error.
2.Solution: Use latent variables with several indicators
2.Construct validity of measurement:
1.We want to validate the constructs of psychological measurement.
2.Solution: Confirmatory factor analysis
3.Complicated model or theory:
1.We are interested in testing complicated “causal models.”
2.Solution: Structural equation model
4.Handling missing data:
Copyright © 2009 by Mike Cheung, SEM 3

1.We want to handle missing data efficiently.


2.Solution: SEM with full information maximum likelihood (FIML)
Copyright © 2009 by Mike Cheung, SEM 4

1.2 Growth of SEM in academic publication


1.Before 1994 (Tremblay & Gardner, 1996)1

1 Tremblay, P. F., & Gardner, R. C. (1996). On the growth of structural equation modeling in psychological journals.
Structural Equation Modeling, 3, 93-104.
Copyright © 2009 by Mike Cheung, SEM 5

2.1994-2001 (Hershberger, 2003)2

2 Hershberger, S. L. (2003). The growth of structural equation modeling: 1994-2001. Structural Equation Modeling, 10,
35-46.
Copyright © 2009 by Mike Cheung, SEM 6

1.3 Outline of this workshop


2.Multiple regression
3.Path analysis
4.Confirmatory factor analysis
5.SEM
6.Handling missing data
Copyright © 2009 by Mike Cheung, SEM 7

2.1 Basics of Mplus


1.There are many SEM packages, e.g., LISREL, EQS, Amos, Mx and
Mplus.
2.We use Mplus in this workshop as it is one of the most powerful SEM
packages.
3.Data in Mplus are in plain text (ASCII format).
4.Quite often, our data are stored in SPSS.
5.We can export SPSS data into plain text easily by:
1.“Save as”
2.“Tab delimited (*.dat)”
3.Unselect “write variable names to spreadsheet”
Copyright © 2009 by Mike Cheung, SEM 8
Copyright © 2009 by Mike Cheung, SEM 9
Copyright © 2009 by Mike Cheung, SEM 10

2.2 Covariance and correlation matrices


1.Covariance (and correlation) matrices play an important role is SEM.
2.When data are multivariate normal, the means and covariance matrix
are the sufficient statistics. That is, we do not need the raw data in the
analysis.
3.Sample data:

Motivation (X) 1 2 3 4 5 7 8 12 13 15

Performance (Y) 9 13 10 12 16 12 20 13 17 22
Copyright © 2009 by Mike Cheung, SEM 11
Copyright © 2009 by Mike Cheung, SEM 12

4.Covariance between X and Y:


∑  X − M X Y − M Y 
=15.1
n−1
1.Interpretations:
1.Positive covariance: positive linear association
2.Negative covariance: negative linear association
3.Zero covariance: no linear association
2.“Problems”:
1.Range of covariance: −∞covariance∞
2.Is covariance=15.1 large or small? It depends on the variances of
both X and Y.
5.To solve the “problems” in covariance, we may standardize the
covariance by the standard deviations of X and Y.
Copyright © 2009 by Mike Cheung, SEM 13
cov  X , Y  15.1
6.Correlation between X and Y: SD SD = =.73
x y 4.9∗4.2
1.Range of correlation coefficient: −1≤r≤1
2.Effect size and interpretations (Cohen, 1988):3
2
1.Small effect: r=.1 or r =.01
2
2.Medium effect: r=.3 or r =.09
2
3.Large effect: r=.5 or r =.25

3 Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, N.J.: L. Erlbaum Associates.
Copyright © 2009 by Mike Cheung, SEM 14
Copyright © 2009 by Mike Cheung, SEM 15

7.Because of historical reasons, covariance matrix is the default input in


SEM.
8.It may not be appropriate to analyze correlation matrix.
9.The summary statistics may look like:
1.y: performance
2.x1: motivation; x2: ability

Covariance matrix: Correlation matrix:


y x1 x2 y x1 x2
y 10.606 5.536 2.744 y 1.0000 0.6095 0.3199
x1 5.536 7.781 2.995 x1 0.6095 1.0000 0.4076
x2 2.744 2.995 6.936 x2 0.3199 0.4076 1.0000
Mean: Standard deviations:
y x1 x2 y x1 x2
6.024 5.974 6.021 3.257 2.789 2.634
Copyright © 2009 by Mike Cheung, SEM 16

2.3 Multiple regression


1.A simple regression:
1.A simple regression specifies the direction of influence.
2.Model: y=b0b1 x 1e y or y = b0 b1 x 1
3.Since y is a dependent variable (DV), var(y) depends on other
parameters.
4.This means that the double arrows in x1 and y have different meanings
depending on whether the variables are independent variable (IV) or
DV.
2.In a graphical model, the error term or the error variance may be
specified as:
Copyright © 2009 by Mike Cheung, SEM 17

var(ey) var(x1)

y x1
b1

3.What is SEM? Sample vs. model implied covariance matrices

[ var  y
]
cov  y , x1 
cov  y , x1  var  x 1  = [ b12 var  x 1 var e y  b1∗var  x 1 
b1∗var  x 1  var  x 1  ]
[ 10.606 5.536
5.536 7.781 ]
Copyright © 2009 by Mike Cheung, SEM 18

4.Mathematically speaking, SEM is a statistical technique to fit a


proposed model against the data.
5.Let's do our first SEM!
1.Data: ex1.txt (plain text, ASCII format)
2.Input: ex1a.inp
3.Output: ex1a.out
Copyright © 2009 by Mike Cheung, SEM 19
Copyright © 2009 by Mike Cheung, SEM 20
TITLE: A simple regression model

DATA: FILE = ex1.txt;

VARIABLE: NAMES ARE y x1-x2;


USEVAR ARE y x1; ! We use y and x1 in this analysis

MODEL:
y ON x1; ! y is regressed ON x1
! Note. Variances of the independent variable (x1)
! and the error (y) will be estimated by default

OUTPUT: SAMPSTAT; ! Request the sample statistics for checking

Selected output:
MODEL RESULTS

Two-Tailed
Estimate S.E. Est./S.E. P-Value

Y ON
X1 0.712 0.093 7.687 0.000
Copyright © 2009 by Mike Cheung, SEM 21

Intercepts
Y 1.773 0.610 2.908 0.004

Residual Variances
Y 6.600 0.933 7.071 0.000

Interpretations:
1.The regression equation is y =1.770.71 x 1
2.The regression coefficient of x1 is 0.71, which is statistically significant
at .05.
3.The estimates divided by their corresponding standard errors (SEs)
approximately follow a standard norm distribution. If they are larger
than 1.96, they are statistically significant at .05.
Copyright © 2009 by Mike Cheung, SEM 22

6.A multiple regression:


1.Model: y=b0b1 x 1b 2 x 2e y or y = b0 b1 x 1 b2 x 2
7.A graphical model:
8.Mplus analysis:
v ar(ey ) v ar(x 1)

y x1
b1

b2
cov (x 1,x 2)

x2

v ar(x 2)
Copyright © 2009 by Mike Cheung, SEM 23

1.Input: ex1b.inp
2.Output: ex1b.out
TITLE: A multiple regression model
DATA: FILE = ex1.txt;
VARIABLE: NAMES ARE y x1-x2;
USEVAR ARE ALL; ! Use all variables in the analysis
MODEL:
y ON x1 x2; ! y is regressed ON x1 and x2
! Note. Covariances among the independent variables
! are estimated by default
x1 WITH x2; ! Optional: Request the variance/covariance of x1 and x2
OUTPUT: SAMPSTAT; ! Request the sample statistics for checking
Copyright © 2009 by Mike Cheung, SEM 24

Selected output:
MODEL RESULTS Two-Tailed
Estimate S.E. Est./S.E. P-Value
Y ON
X1 0.671 0.101 6.650 0.000
X2 0.106 0.107 0.992 0.321

X1 WITH
X2 2.965 0.785 3.775 0.000

Means
X1 5.975 0.278 21.526 0.000
X2 6.021 0.262 22.976 0.000

Intercepts
Y 1.379 0.725 1.901 0.057

Variances
X1 7.703 1.089 7.071 0.000
X2 6.867 0.971 7.071 0.000

Residual Variances
Y 6.535 0.924 7.071 0.000
Copyright © 2009 by Mike Cheung, SEM 25

Interpretations:
1.The estimated regression coefficients for x1 and x2 are 0.671 and 0.106,
respectively.
2.The coefficient of x1 is statistically significant while the coefficient of
x2 is not.
Copyright © 2009 by Mike Cheung, SEM 26

2.4 Path analysis


1.Path analysis tests models with hypothesized relationship among the
observed variables.
2.It is better than multiple regression by hypothesizing the psychological
process among the observed variables. For examples,

D1 D1

X1 Y1 1.00 X1 Y1 1.00

D2 D2

X2 Y2 1.00 X2 Y2 1.00
Copyright © 2009 by Mike Cheung, SEM 27

3.Model identification:
1.Model identification concerns whether there is a unique solution for
the model being tested.
2.If the model is not identified, there will be no solution for the
proposed model. Thus, we cannot test the proposed model.
4.Degrees of freedom (dfs) of a model:
1.Let p*=p(p+1)/2 be the no. of pieces of information available in the
covariance matrix
1.Let q be the no. of free parameters in the model
2.Then df = p* - q
5.A necessary but not sufficient condition for the identification of any
Copyright © 2009 by Mike Cheung, SEM 28

SEM model is df ≥0 .4
1.Underidentification:
1.df < 0
2.no solution
3.e.g., 3=x+y
2.Just identification:
1.df=0
2.perfect fit
3.e.g., 3=x+y and 1=x-y then x=2 and y=1
3.Overidentification:
1.df > 0
2.no perfect solution

4 Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Copyright © 2009 by Mike Cheung, SEM 29

3.e.g., 3=x+y, 2=x-y and 5=2x+y


6.All regression models are just identified models.
1.We cannot empirically test whether or not the proposed regression
models are appropriate.
2.If this is the case, how can we test whether or not a regression model
is “good?”
7.Overidentified models are usually preferred in SEM:
1.By using the falsification approach, we want to see whether our
proposed model can be rejected by the data.
Copyright © 2009 by Mike Cheung, SEM 30

2.5 Mediation model


1.One important application of path analysis is to test indirect effect.
2.Mediation model explains the psychological process between an
independent predictor and a dependent variable.

M
a b

c
X Y

3.Direct effect: c; Indirect effect: a*b; Total effect: a*b+c


Copyright © 2009 by Mike Cheung, SEM 31

4.To test the mediation effect, we have to test the significance of the
product term a*b.
5.The problem is that the sampling distribution of the product term is
usually non-normal.
Copyright © 2009 by Mike Cheung, SEM 32

6.Bootstrap confidence interval (CI) is one of the best approaches in


testing the indirect effect (MacKinnon et. al. 2002).5
7.Many SEM packages, e.g., Mplus, have functions to report bootstrap
CIs (e.g., Cheung, 2007).6

5 MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to
test the significance of the mediated effect. Psychological Methods, 7, 83-104.
6 Cheung, M.W.L. (2007). Comparison of approaches to constructing confidence intervals for mediating effects using
structural equation models. Structural Equation Modeling, 14, 227-246.
Copyright © 2009 by Mike Cheung, SEM 33

1.Data: ex2.dat
2.Input: ex2a.inp
3.Ouput: ex2a.out
TITLE: Simple mediating effect
DATA: FILE IS ex2.dat; ! Raw data are required for bootstrap
VARIABLE: NAMES X M Y; ! X: Independent variable, M: Mediator
! Y: Dependent variable
USEVARIABLES ARE ALL;

MODEL: ! Model being analyzed: X -> M -> Y


Y ON M; ! Path M -> Y
Y ON X; ! Direct effect in the model

M ON X; ! Path X -> M

OUTPUT: SAMPSTAT;

Selected output:
TESTS OF MODEL FIT
Copyright © 2009 by Mike Cheung, SEM 34
Chi-Square Test of Model Fit

Value 0.000
Degrees of Freedom 0
P-Value 0.0000

MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value

Y ON
M 0.396 0.094 4.228 0.000
X -0.033 0.105 -0.310 0.756

M ON
X 0.517 0.100 5.190 0.000

Intercepts
M -0.644 0.337 -1.913 0.056
Y 0.231 0.321 0.721 0.471
Residual Variances
M 11.182 1.581 7.071 0.000
Y 9.788 1.384 7.071 0.000
Copyright © 2009 by Mike Cheung, SEM 35

Interpretations:
1.The model is just identified. That is, the df=0. In other words, we cannot
tell whether or not the proposed model fits the data.
2.The direct effect is -0.03, p=.76.
3.The indirect effect is 0.396*0.517=0.20.
4.However, we don't know if the indirect effect is significant.
5.We may request a bootstrap CI on the indirect effect in Mplus:
1.Input: ex2b.inp
2.Output: ex2b.out
Copyright © 2009 by Mike Cheung, SEM 36
TITLE: Simple mediating effect with bootstrap CI
DATA: FILE IS ex2.dat; ! Raw data are required for bootstrap
VARIABLE: NAMES X M Y; ! X: Independent variable, M: Mediator
! Y: Dependent variable
USEVARIABLES ARE ALL;

ANALYSIS: BOOTSTRAP = 2000; ! If "bootstrap" is listed, percentile or BC


! bootstrap CIs will be produced; otherwise,
! Wald CI will be generated.
! "2000" is the no. of bootstrap replications.

MODEL: ! Model being analyzed: X -> M -> Y


Y ON M (p1); ! Path M -> Y; p1: first constraint
Y ON X; ! Direct effect in the model

M ON X (p2); ! Path X -> M; p2: second constraint

MODEL CONSTRAINT:
NEW(ind_effect); ! Create a new variable for indirect effect
ind_effect = p1*p2;

OUTPUT: CINTERVAL(bootstrap); ! Percentile bootstrap CI


Copyright © 2009 by Mike Cheung, SEM 37

Selected output:
CONFIDENCE INTERVALS OF MODEL RESULTS

Lower .5% Lower 2.5% Estimate Upper 2.5% Upper .5%

Y ON
M 0.177 0.222 0.396 0.584 0.642
X -0.307 -0.243 -0.033 0.165 0.225

M ON
X 0.248 0.316 0.517 0.694 0.752

Intercepts
M -1.482 -1.316 -0.644 0.031 0.227
Y -0.592 -0.368 0.231 0.858 1.066

Residual Variances
M 6.727 7.600 11.182 15.489 16.978
Y 6.104 6.953 9.788 12.283 13.055

New/Additional Parameters
IND_EFFE 0.076 0.100 0.204 0.334 0.386
Copyright © 2009 by Mike Cheung, SEM 38

Interpretations:
1.The estimated indirect effect is 0.20 with a 95% CI (0.10, 0.33).
2.Since the 95% CI does not include 0, the estimated indirect effect is
statistically significant at .05.
Copyright © 2009 by Mike Cheung, SEM 39

3.1 Confirmatory factor analysis (CFA)


1.CFA is a statistical procedure designed to test a hypothesis about the
relationship of certain hypothetical latent factors, whose number and
interpretation are given in advance, to the observed variables.
2.Differences between EFA (in SPSS) and CFA (in SEM):
EFA CFA
- Exploratory nature for theory development - Confirmatory nature for theory testing
- No. of factors are usually determined from data - No. of factors are specified a priori
- Factor loading pattern are not fixed - Factor loading pattern are fixed
- Factor rotation to achieve simple structure for - Fixing factor variances/loadings for
interpretation identification
- Factors are usually orthogonal - Factors are usually correlated
- Measurement errors are uncorrelated - Measurement errors can be correlated
- Not many formal statistical tests for model - Many formal statistical tests and goodness-of-
testing fit indices for model testing
- Correlation matrix as input - Covariance matrix as input
Copyright © 2009 by Mike Cheung, SEM 40

3.Basic steps in CFA:


1.Model specification: What is the proposed model?
2.Model identification: Will there be a solution for the model?
3.Parameter estimation: What are the results for the model?
4.Goodness of fit assessment: Does the model fit the data?
5.Model modification and comparison: Which model is better? What
are we going to do if the initial model does not fit the data?
Copyright © 2009 by Mike Cheung, SEM 41

3.2 How to conduct a CFA?


1.Model specification:
1.Three types of parameters in a CFA:
2.Factor loading matrix: 
3.Factor correlation matrix: 
4.Error variance matrix: 
5.A preferred approach is to draw a path diagram. Then we may
translate the model into Mplus language.
2.Example: A two-factor model
Copyright © 2009 by Mike Cheung, SEM 42

F1 F2

X1 X2 X3 X4

3.Model identification:
1.If there is no constraint,
1.p*=(4*5)/2=10,
2.q=(3 factor variances/covariance, 4 factor loadings, 4 error
variance)=11,
Copyright © 2009 by Mike Cheung, SEM 43

3.df=p*-q=-1!
4.Metric of a latent variable:
1.What is the mean of a latent variable?
2.What is the variance of a latent variable?
5.Solutions:
1.To overcome the identification problem in our example, we have to:
2.Approach 1:
1.Fix the factor variances at some specific positive values, usually 1.0
2.This applies to independent (exogenous) latent variables only
Copyright © 2009 by Mike Cheung, SEM 44

[ ]
a11 0

3. =
[ 1.0
]
cor  f 2, f 1  1.0 and
=
a21, 0
0 a32 , df = 1
0 a 42
4.We set the scale of the latent factors having a mean of 0 and a
variance of 1.0.
3.OR
4.Approach 2:
1.Fix a loading from the latent variable to one observed variable at a
specific non-zero value, usually 1.0
2.This applies to independent and dependent (endogenous) latent
variables
Copyright © 2009 by Mike Cheung, SEM 45

3.We have to fix one factor (loading) per latent factor

[ ]
1.0 0

4. =
a 21, 0
0 1.0 and
0 a 42
=
[
var  f 1 
]
cov  f 2, f 1  var  f 2  , df = 1
Copyright © 2009 by Mike Cheung, SEM 46

1.00 1.00

F1 F2
F1 F2

1.00 1.00

X1 X2 X3 X4
X1 X2 X3 X4

6.They are usually equivalent without further constraints:


7.That is, the model fit indices are the same for both models.
8.However, we usually prefer:
Copyright © 2009 by Mike Cheung, SEM 47

1.Single group analysis:


1.Fix the factor variances at 1.0 if possible.
2.We are seldom interested in the significance of the factor variance.
2.Multiple-group analysis: Fix the loadings at 1.0
9.Mplus syntax:
1.Data: ex3.dat
2.Inputs: ex3a.inp and ex3b.inp
3.Outputs: ex3a.out and ex3b.out
4.Language summary:
1.y ON x1 x2; y is regressed ON x1 and x2
2.x1 WITH x2; x1 (or its error) is correlated WITH x2 (or its
error)
Copyright © 2009 by Mike Cheung, SEM 48

3.f BY x1 x2 x3; latent factor f is measured BY x1, x2 and x3.


TITLE: Fixing the factor variances for TITLE: Fixing the path loadings for
identification identification

DATA: FILE IS ex3.dat; DATA: FILE IS ex3.dat;


VARIABLE: NAMES ARE y1 y2 y3 y4; VARIABLE: NAMES ARE y1 y2 y3 y4;

MODEL: MODEL:
! Free the loading of y1 ! Fix the loading of y1 at 1.0
f1 BY y1* y2; f1 BY y1@1.0 y2;
! Free the loading of y3 ! Provide starting value if necessary
f2 BY y3* y4; f2 BY y3@1.0 y4*0.5;

! Fix the factor variances OUTPUT: SAMP;


f1@1.0;
f2@1.0;
OUTPUT: SAMP;

Selected output: Selected output:


Chi-Square Test of Model Fit Chi-Square Test of Model Fit
Copyright © 2009 by Mike Cheung, SEM 49

Value 0.001 Value 0.001


Degrees of Freedom 1 Degrees of Freedom 1
P-Value 0.9770 P-Value 0.9770

10.Parameter estimation:
1.We try to find the parameter estimates such that the model implied
covariance matrix (theory) is as close to the sample covariance matrix
(data) as possible.
2.We usually use maximum likelihood (ML) estimation method. When
the sample size is reasonably large and the data are multivariate
normal, ML is a good method.
Copyright © 2009 by Mike Cheung, SEM 50

3.3 Chi-square test statistics


1.We have to highlight one major difference between SEM and other
statistical techniques (Steiger & Fouladi, 1997):7
1.Reject-support: Rejecting the null hypothesis supports the researcher's
belief (what we have learned in ANOVA, regression analysis,
MANOVA).
2.Accept-support: Accepting the null hypothesis supports the
researcher's belief (SEM).
3.Based on this rationale, SEM users usually do not want to reject the
null hypothesis (the proposed model).
2.Chi-square test (or likelihood ration) statistic:
1.If the proposed model is correct, the test statistic has a chi-square
7Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L.
L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 221-257). Mahwah, NJ:
Erlbaum. Available at http://www.statpower.net/Steiger%20Biblio/Steiger&Fouladi97.PDF.
Copyright © 2009 by Mike Cheung, SEM 51

distribution with (p*-q) degrees of freedom (df)


2.Actually, this is a “badness-of-fit” index: large chi-square statistic
indicates a poorly fitted model
3.The proposed model is rejected at .05 if the test statistic is larger than
the critical value.
3.However, there are several problems associated with chi-square test
statistic:
1.Model misspecification:
1.Are there any “true” models in the world?
2.Models are usually considered as approximations of the reality.
2.Violation of the underlying assumptions:
1.Data, residues more correctly, are normally distributed.
Copyright © 2009 by Mike Cheung, SEM 52

2.Large samples are used.


3.When the data are non-normally distributed, especially in clinical
studies, or in small sample sizes (e.g., N=100 or 200), chi-square
test statistics may not follow chi-square distributions.
3.Sensitive to the sample size used:
1.All proposed models will be rejected when the sample sizes are
large enough.
2.Large samples work against researchers!
2
4.In ex3a.out (and ex3b.out),  1=0.001, p=.977 . Thus, we do not
reject the proposed 2-factor model.
Copyright © 2009 by Mike Cheung, SEM 53

3.4 Goodness-of-fit indices


1.Many SEM users and researchers are aware of the problems of chi-
square statistics.
2.There are many goodness-of-fit indices developed as alternative
measures.
3.When asked why GFI (one of the fit index) had been added into
LISREL, Karl Jöreskog, the developer of LISREL, answered “
Well, users threaten us saying they would stop using LISREL if it
always produces such large chi-squares. So we had to invent something
to make people happy. GFI serves that purpose.”
4.Incremental fit indices:
1.They measure the relative improvement in fit by comparing a target
Copyright © 2009 by Mike Cheung, SEM 54

model with a baseline model.


2.The target model is usually the proposed model.
3.The baseline model is usually the model stating that all variables are
uncorrelated. It is known as the independence model.
4.For example,
1.Normed fit index (NFI; Bentler & Bonett, 1980):
2B −2T 2 2
1. NFI = 2  
, T and B are the chi-square statistics of the
B
target and the baseline (or null) models
2.It measures the proportionate reduction in the chi-square values
when moving from baseline to hypothesized model
2.Non-normed fit index (NNFI; Bentler & Bonett, 1980) which is also
Copyright © 2009 by Mike Cheung, SEM 55

known as Tucker-Lewis index (TLI) in Mplus:


2B /df B −2T /df T
1. NNFI = 2 , df T and df B are the dfs of the target
 B /df B −1
and the baseline models
2.It adjusts the complexity of the model
3.Comparative fit index (CFI; Bentler, 1990):
max [T2 −df T  , 0]
1. CFI =1− 2 2
max [T −df T  , B −df B  ,0]
2. 0≤CFI ≤1
5.What is a good fitted model?
1.Rule of thumb (without empirical support): at least > 0.90 (see Lance,
Butts, & Michels, 2006 for the historical reasons)8

8 Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did
they really say? Organizational Research Methods, 9, 202-220.
Copyright © 2009 by Mike Cheung, SEM 56

2.Goodness-of-fit indices are usually very good in path analysis. Why?


6.Residual based indices:
1.When the model fits well, the residuals (difference between the model
implied covariance matrix and the sample covariance matrix) should
be small.
2.Standardized root mean square residual (SRMR)
1.It measures the average value across the standardized residuals.
2.It ranges from zero (perfect fit) to one (very poor fit).
3.Rule of thumb: A well-fitting model < .05.
3.Root mean square error of approximation (RMSEA; Steiger & Lind,
1980)
1.Similar to SRMR.
Copyright © 2009 by Mike Cheung, SEM 57

2.Rules of thumb (Browne & Cudeck, 1993):


3.Close fit: < 0.05
4.Reasonable fit: 0.05 – 0.08
5.Inadequate fit: > 0.1
6.Advantage: Confidence intervals on RMSEA are available in most
SEM packages.
7.What do we need to report?
1.We usually report the chi-square test statistic and it associated p value,
some incremental fit indices and some residual based indices.
2.For instance, the goodness-of-fit indices for the two-factor model in
2
Ex3a.out are  df =1=0.001, p=.98 ; TLI=1.09, CFI=1.00 and
RMSEA=0.00. The fit is excellent!
Copyright © 2009 by Mike Cheung, SEM 58

3.Combinational rules suggested by Hu and Bentler (1999):9


1.NNFI (TLI), RNI (not discussed in this class) or CFI > 0.95 and
SRMR < .09 or
2.RMSEA < .05 and SRMR < .06
3.Although their suggestions are widely accepted and cited, their
recommendations are not without challenges (e.g., Marsh, Hau, &
Wen, 2004).10

9 Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria
versus new alternatives. Structural Equation Modeling, 6(1), 1-55.
10 Marsh, H. W., & Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches
to setting cutoff values for fit indices and dangers in overgeneralizing Hu and Bentler's (1999) findings. Structural
Equation Modeling, 11, 320-342.
Copyright © 2009 by Mike Cheung, SEM 59

3.5 Model modification and comparison


1.A preferred approach in doing SEM:
1.We may have several theoretically competing models.
2.We are interested in comparing which one is better:
1.If the models are nested, we may use a chi-square difference test to
compare them.
2.If the models are non-nested, we may use Akaike information
criterion (AIC) or Bayesian information criterion (BIC) to compare
them.
2.Comparing nested models
1.Two models are nested when Model A becomes Model B by fixing or
constraining some parameters.
Copyright © 2009 by Mike Cheung, SEM 60

2.There are three types of parameters in SEM:


3.Fixed parameters: (@0.5 in Mplus)
1.we assign some values to them;
2.no standard error is available
4.Free parameters: (*0.5 in Mplus)
1.the program estimates the best values for them
2.standard errors are available to test their significance
5.Constrained parameters: (see the bootstrap and the coming examples)
1.they are constrained to equal some other parameters
2.standard errors are available to test their significance
3.Our CFA example:
Copyright © 2009 by Mike Cheung, SEM 61

Model u: Factor loadings are free: Model r: Factor loadings are


constrained equal:

1.00 1.00 1.00 1.00

F1 F2 F1 F2

a a b b

X1 X2 X3 X4 X1 X2 X3 X4
Copyright © 2009 by Mike Cheung, SEM 62

4.The unrestricted model (Mu) becomes the restricted model (Mr) by


fixing the factor loadings.
5.Psychological implications of these two models:
1.Model u: Items with different reliabilities
2.Model r: x1 and x2 are equally good in measuring F1 while x3 and x4
are equally good in measuring F2.
6.Results on our CFA models:
1.Mu (ex3a.out): Unrestricted model of the chi-square value:
2
u df u =1=0.001, p=.977 .
2.Mr (ex3c.out): Restricted model of the chi-square value:
2
r df r =3=4.777, p=.189 .
7.If the models are nested, we may apply a chi-square difference test or
Copyright © 2009 by Mike Cheung, SEM 63

likelihood ratio (LR) test to compare them.


2 2
1.The difference on the chi-square statistics ( r −u ) is also
distributed as a chi-square distribution with ( df r −df u ) df when the
restrictive model is correct.
2
2.In our example, the chi-square difference test is   =4.776 ,
 df =2 , p = .09.
3.If it is not significant, we choose the simpler model (Mr).
4.If it is significant, we choose the more complex model (Mu).
2.Interpretations:
1.The factor loadings are the same.
2.In other words, the reliabilities of the items are the same.
Copyright © 2009 by Mike Cheung, SEM 64
Copyright © 2009 by Mike Cheung, SEM 65

8.How to run Model r?


1.Input: ex3c.inp
2.Output: ex3c.out
TITLE: Equal factor loadings
DATA: FILE IS ex3.dat;

VARIABLE: NAMES ARE y1 y2 y3 y4;

MODEL:
! Constrain both factor loadings
f1 BY y1* (1)
y2 (1);
! Constrain both factor loadings
f2 BY y3* (2)
y4 (2);

! Fix the factor variances


f1@1.0;
f2@1.0;

OUTPUT: SAMP;
Copyright © 2009 by Mike Cheung, SEM 66

Selected output:
TESTS OF MODEL FIT

Chi-Square Test of Model Fit

Value 4.777
Degrees of Freedom 3
P-Value 0.1889

MODEL RESULTS
Two-Tailed
Estimate S.E. Est./S.E. P-Value

F1 BY
Y1 0.458 0.054 8.462 0.000
Y2 0.458 0.054 8.462 0.000

F2 BY
Y3 0.544 0.048 11.403 0.000
Y4 0.544 0.048 11.403 0.000
Copyright © 2009 by Mike Cheung, SEM 67
F2 WITH
F1 0.360 0.125 2.880 0.004
Intercepts
Y1 2.002 0.052 38.151 0.000
Y2 2.454 0.053 46.400 0.000
Y3 2.007 0.052 38.801 0.000
Y4 2.511 0.055 46.032 0.000

Variances
F1 1.000 0.000 999.000 999.000
F2 1.000 0.000 999.000 999.000

Residual Variances
Y1 0.617 0.068 9.032 0.000
Y2 0.630 0.069 9.117 0.000
Y3 0.507 0.061 8.264 0.000
Y4 0.597 0.067 8.974 0.000
Copyright © 2009 by Mike Cheung, SEM 68

3.6 Comparing non-nested models


1.Sometimes, the models being compared are non-nested. That is, we
cannot convert a model into the other by constraining some parameters.
2.Chi-square difference test is not appropriate.
3.Two popular statistics are Akaike's information criterion (AIC) and the
Bayesian information criterion (BIC)
2
1. AIC = M −2 df m
2
2. BIC = M −df m∗ln n
3.They measure the parsimonious fit that consider both the model fit
and the no. of parameters estimated.
4.Smaller value indicates the model fits better in compromising between
the model fit and the model complexity.
Copyright © 2009 by Mike Cheung, SEM 69

5.They can be used to compare nested and non-nested models.


6.Choose the model with the smallest (better parsimonious fit) AIC or
BIC.
7.Unlikely chi-square difference test, there is no significance test in AIC
and BIC.

Model A (ex3a.inp): Two-factor Model B (ex3d.inp): One-factor


model model while X4 is measuring
something else
Copyright © 2009 by Mike Cheung, SEM 70

1.00 1.00 1.00

F1 F2
F1 F2
1.00

X1 X2 X3 X4
X1 X2 X3 X4
0.00

8.The above two models are non-nested.


9.However, it may be of importance to test which one is better. For
example,
Copyright © 2009 by Mike Cheung, SEM 71

1.Western population: A two-factor model is generally accepted.


2.Chinese population: A researcher argues that x1-x3 are measuring a
single construct while x4 is measuring something else in Chinese.
4.Results:
1.Model A: AIC =3150, BIC =3198
2.Model B: AIC =3165, BIC =3209
3.As both AIC and BIC are smaller in Model A, Model A is preferred.
5.Mplus syntax:
1.Input: ex3d.inp
2.Output: ex3d.out
Copyright © 2009 by Mike Cheung, SEM 72
TITLE: Non-nested model

DATA: FILE IS ex3.dat;

VARIABLE: NAMES ARE y1 y2 y3 y4;

MODEL:

f1 BY y1* y2 y3; ! Free the loading of y1


f2 BY y4@1.0; ! Fix the loading at 1
y4@0.0; ! Fix the error variance at 0.0
f1@1.0; ! Fix the factor variance

OUTPUT: SAMP;

Selected output:
TESTS OF MODEL FIT

Chi-Square Test of Model Fit

Value 16.314
Degrees of Freedom 2
P-Value 0.0003
Copyright © 2009 by Mike Cheung, SEM 73

Chi-Square Test of Model Fit for the Baseline Model

Value 71.367
Degrees of Freedom 6
P-Value 0.0000

CFI/TLI

CFI 0.781
TLI 0.343

Loglikelihood

H0 Value -1570.267
H1 Value -1562.110

Information Criteria

Number of Free Parameters 12


Akaike (AIC) 3164.534
Bayesian (BIC) 3208.980
Sample-Size Adjusted BIC 3170.923
(n* = (n + 2) / 24)
Copyright © 2009 by Mike Cheung, SEM 74

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.154
90 Percent C.I. 0.091 0.228
Probability RMSEA <= .05 0.005

SRMR (Standardized Root Mean Square Residual)

Value 0.058

Note. This model does not fit the data well. It is clear that it is not a good
model.
Copyright © 2009 by Mike Cheung, SEM 75

4.1 Structural equation model


1.SEM is composed of two major components (Anderson & Gerbing,
1988)11
1.Measurement (CFA) model:
1.Are the items grouped according to the theory?
2.Assessment of convergent and discriminant validity of measurement
2.Structural model:
1.What are the relationships among the latent constructs?
2.Assessment of predictive validity

11 Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and recommended two-
step approach. Psychological Bulletin, 103, 411-423.
Copyright © 2009 by Mike Cheung, SEM 76
Copyright © 2009 by Mike Cheung, SEM 77

2.Some researchers suggest analyzing SEM via two steps:


1.Step 1:
1.Formulate the SEM as a CFA model;
2.Test the CFA model;
3.If it does not fit, we know that there are problems in the
measurement model;
4.If it fits the data, go to Step 2
2.Step 2:
1.Formulate an SEM;
2.If it does not fit, we know that the problems may be attributed by
the structural model, not the measurement model.
3.Note. Other researchers argue that Step 1 is not necessary.
Copyright © 2009 by Mike Cheung, SEM 78

4.2 An example: Stability of alienation


1.Wheaton, et al. (1977) studied the stability of attitudes over time (1967
and 1971). These include alienation, and the relation to background
variables such as education and occupation.12
2.Alienation:
1.Anomia subscale (Anomia)
2.Powerlessness subscale (Power)
3.Socioeconomic status (SES):
1.Duncan’s Socioeconomic Index (SEI)
2.Years of schooling (EDU)

12 Wheaton, B., Muthén, B., Alwin, D., & Summers, G. (1977). Assessing reliability and stability in panel models. In D.
R. Heise (Ed.): Sociological Methodology 1977 (pp. 84-136). San Francisco: Jossey-Bass.
Copyright © 2009 by Mike Cheung, SEM 79

SES AL71

AL67
4.The proposed SEM is:

Edu SEI An67 Pow67 An71 Pow71

5.Following the two-step approach, the first step is to establish a


measurement model:
Copyright © 2009 by Mike Cheung, SEM 80

SES AL67 AL71

Edu SEI An67 Pow67 An71 Pow71

6.Mplus syntax:
Copyright © 2009 by Mike Cheung, SEM 81

1.Data: ex4.cov (covariance matrix, not raw data)


2.Input: ex4a.inp
3.Output: ex4a.out
TITLE: Stability of Alienation: Measurement Model

DATA: FILE IS ex4.cov; ! covariance matrix as input


TYPE IS COVARIANCE; ! required to specify it
NOBSERVATIONS ARE 932; ! required to include sample size

VARIABLE: NAMES ARE anomia67 power67 anomia71 power71 educ SEI;

MODEL:
alien67 BY anomia67 power67;
alien71 BY anomia71 power71;
ses BY educ SEI;

OUTPUT: SAMP;
Copyright © 2009 by Mike Cheung, SEM 82

Interpretations::
2
4.  df =6=71.55, p.0001 , CFI=0.97; TLI=0.92 and
RMSEA=0.108. The model fits the data marginally.
5.Before analyzing a full SEM, we may improve the model fit by model
modifications.
6.Since Anomia subscale and the Powerlessness subscale were
measured twice (1967 and 1971), it is reasonable to expect that the
measurement errors might be correlated.
7.We may use the modifications indices to provide some hints. We may
add MODINDICES in OUTPUT (see ex4b.inp).
1.Modification indices (MI):
1.For each fixed parameter specified, there is a MI for it;
Copyright © 2009 by Mike Cheung, SEM 83
2
2.The predicted drop in overall  value if that parameter is freed;
3.We can free the parameters with large MIs if they are theoretically
justified.
2.Expected parameter change (EPC) value:
1.For each fixed parameter specified, there is an EPC for it;
2.The predicted change, in either a positive or negative direction, for
each parameter if the parameter is freed;
3.The directions should be consistent with our theory.
Copyright © 2009 by Mike Cheung, SEM 84
M.I. E.P.C. Std E.P.C. StdYX E.P.C.

WITH Statements

ANOMIA71 WITH ANOMIA67 63.774 1.952 1.952 0.507


ANOMIA71 WITH POWER67 49.798 -1.533 -1.533 -0.447
POWER71 WITH ANOMIA67 49.886 -1.507 -1.507 -0.395
POWER71 WITH POWER67 37.294 1.158 1.158 0.341

8.We may add two correlated errors in the model.


1.Input: ex4c.inp
2.Output: ex4c.out
TITLE: Stability of Alienation: Measuremenet Model with correlated errors

DATA: FILE IS ex4.cov; ! covariance matrix as input


TYPE IS COVARIANCE; ! required to specify it
NOBSERVATIONS ARE 932; ! required to include sample size

VARIABLE: NAMES ARE anomia67 power67 anomia71 power71 educ SEI;


Copyright © 2009 by Mike Cheung, SEM 85
MODEL:
alien67 BY anomia67 power67;
alien71 BY anomia71 power71;
ses BY educ SEI;

! Correlated errors
anomia67 WITH anomia71;
power67 WITH power71;

OUTPUT: SAMP;

Selected output:
2
1.  df =4=4.74, p=.32 ; CFI=1.00; TLI=1.00 and RMSEA=0.014.
The model has an excellent fit.
2. We may compare whether this model (with correlated errors) is better
than the previous one (without correlated errors) by using a chi-square
2
difference test:   =66.81,  df =2, p.001 .
Copyright © 2009 by Mike Cheung, SEM 86

3. Thus, the model with correlated errors is preferred.


4. We should only change those parameters that are theoretically
interpretable.
5. Moreover, the chi-square difference test may not be appropriate for ad-
hoc changes.
Copyright © 2009 by Mike Cheung, SEM 87

4.3 A full SEM


1.We may then formulate a full SEM for testing:
1.Input: ex4d.inp
2.Output: ex4d.out
TITLE: Stability of Alienation: Full SEM Model with correlated errors

DATA: FILE IS ex4.cov; ! covariance matrix as input


TYPE IS COVARIANCE; ! required to specify it
NOBSERVATIONS ARE 932; ! required to include sample size

VARIABLE: NAMES ARE anomia67 power67 anomia71 power71 educ SEI;

MODEL:
alien67 BY anomia67 power67;
alien71 BY anomia71 power71;
ses BY educ SEI;

! Correlated errors
anomia67 WITH anomia71;
power67 WITH power71;
Copyright © 2009 by Mike Cheung, SEM 88
! Structural model
alien67 ON ses (p1);
alien71 ON alien67 (p2);
alien71 ON ses;

MODEL CONSTRAINT:
NEW(ind_effect); ! Create a new variable for indirect effect
ind_effect = p1*p2;

OUTPUT: CINTERVAL(SYMMETRIC); ! Wald CI

Selected output:
2
1.  df =4=4.74, p=.32 ; CFI=1.00; TLI=1.00 and RMSEA=0.014.
The model fits the data very well.
2. The direct effect is -0.227, p < .001.
3. The indirect effect is -0.575*0.607= -0.349, p < .05.
Copyright © 2009 by Mike Cheung, SEM 89
MODEL RESULTS Two-Tailed
Estimate S.E. Est./S.E. P-Value

ALIEN67 BY
ANOMIA67 1.000 0.000 999.000 999.000
POWER67 0.979 0.062 15.896 0.000

ALIEN71 BY
ANOMIA71 1.000 0.000 999.000 999.000
POWER71 0.922 0.059 15.498 0.000

SES BY
EDUC 1.000 0.000 999.000 999.000
SEI 0.522 0.042 12.363 0.000

ALIEN67 ON
SES -0.575 0.056 -10.195 0.000

ALIEN71 ON
ALIEN67 0.607 0.051 11.897 0.000
SES -0.227 0.052 -4.334 0.000

ANOMIA67 WITH
ANOMIA71 1.623 0.314 5.176 0.000
Copyright © 2009 by Mike Cheung, SEM 90
POWER67 WITH
POWER71 0.339 0.261 1.298 0.194

Variances
SES 6.797 0.649 10.474 0.000

Residual Variances
ANOMIA67 4.731 0.453 10.440 0.000
POWER67 2.564 0.403 6.360 0.000
ANOMIA71 4.399 0.515 8.541 0.000
POWER71 3.070 0.434 7.070 0.000
EDUC 2.802 0.507 5.527 0.000
SEI 2.646 0.181 14.598 0.000
ALIEN67 4.841 0.467 10.359 0.000
ALIEN71 4.083 0.404 10.104 0.000

New/Additional Parameters
IND_EFFE -0.349 0.041 -8.538 0.000

CONFIDENCE INTERVALS OF MODEL RESULTS

Lower .5% Lower 2.5% Estimate Upper 2.5% Upper .5%


New/Additional Parameters
IND_EFFE -0.454 -0.429 -0.349 -0.269 -0.244
Copyright © 2009 by Mike Cheung, SEM 91

5.1 Missing data


7.Why are there missing data?
1.Participants fail to provide all data.
2.Participants provide inappropriate responses, e.g., 6 in a 5-point
Likert scale.
3.Participants are unwilling to answer sensitive questions.
4.Participants drop out in longitudinal studies.
5.It is common to observe missing data even if the design and the
collection of data have been carefully conducted.
6.There may be different mechanisms behind the missing data.
8.Problems:
1.Decrease statistical power.
Copyright © 2009 by Mike Cheung, SEM 92

2.Biased parameter estimates.


3.Most statistical techniques assume complete data.
4.It is usually required to describe how you handle missing data in
journal submission.
9.Types of missingness mechanisms:
1.According to Rubin (1987), there are three types of missingness
mechanisms:13
2.Missing completely at random (MCAR):
1.The data on Y are said to be MCAR if the probability of missing
data on Y is unrelated to the value of Y itself or to the values of
any other variables in the data set.
2.For example, a participant forgets to report his salary by mistake

13 Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Copyright © 2009 by Mike Cheung, SEM 93

3.Missing at random (MAR):


1.Data on Y are said to be MAR if the probability of missing data
on Y is unrelated to the value of Y, after controlling for other
variables in the analysis.
2.For example, A participant fails to report his salary (a missing
datum on Y) because he has a high (or low) educational level
(other variable).
3.This assumption is more realistic in applied research
4.Missing not at random (MNAR):
1.If the data are not MAR, we say that the missing data are MNAR
2.For example, a participant fails to report his salary because his
income is very high (or low).
Copyright © 2009 by Mike Cheung, SEM 94

5.Note. Most methods of handling missing data assume the missing


data are either MCAR or MAR.
Copyright © 2009 by Mike Cheung, SEM 95

5.2 Methods of handling missing data


1.Most of the common methods are generally not recommended (see
Schafer & Graham, 2002, for a detailed review):14
1.Listwise deletion
2.Pairwise deletion
3.Mean substitution
2.Two modern approaches are generally recommended to handle missing
data. They are multiple imputation (MI) and full information maximum
likelihood (FIML).
3.FIML is the default method to handle missing data in Mplus.
4.It means that we are using the “best” method to handle missing data
automatically.

14 Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-
177.
Copyright © 2009 by Mike Cheung, SEM 96

5.Example:
1.Data: ex5.dat:
1.We use “999” as the missing value in the data.
2.Other values are possible.
3.Be careful in choosing the missing values!

4.
2.Input: ex5a.inp
3.Output: ex5a.out
Copyright © 2009 by Mike Cheung, SEM 97
TITLE: Using FIML to handle missing data

DATA: FILE IS ex5.dat;

VARIABLE: NAMES ARE x1-x6;


MISSING ARE ALL (999);
! MISSING ARE x1(99) x2(999);

MODEL:
f1 BY x1-x3*;
f2 BY x4-x6*;

f1@1.0;
f2@1.0;

OUTPUT: SAMP;

Selected output:
SUMMARY OF ANALYSIS

Number of groups 1
Number of observations 500
Copyright © 2009 by Mike Cheung, SEM 98
TITLE: Using listwise deletion to handle missing data

DATA: FILE IS ex5.dat;


LISTWISE IS ON; ! Use listwise deletion

VARIABLE: NAMES ARE x1-x6;


MISSING ARE ALL (999);
! MISSING ARE x1(99) x2(999);

MODEL:
f1 BY x1-x3*;
f2 BY x4-x6*;

f1@1.0;
f2@1.0;
OUTPUT: SAMP;

Selected output:
SUMMARY OF ANALYSIS

Number of groups 1
Number of observations 338
Copyright © 2009 by Mike Cheung, SEM 99

6.1 Further topics (not covered in this workshop)


1.Categorical (ordinal) variables, e.g.,
1.Binary variables: yes/no
2.Likert scales
2.Latent growth models:
1.Modeling intra- and inter-individual differences
2.Mathematically, it is the same as the multilevel model
3.Multiple group SEM:
1.Testing equivalence of psychological process across different groups
4.Multilevel SEM:
1.Combining both multilevel model and SEM
Copyright © 2009 by Mike Cheung, SEM 100

6.2 Further resources


1.General review of SEM:
1. MacCallum, R. C., & Austin, J. T. (2000). Applications of structural equation
modeling in psychological research. Annual Review of Psychology, 51, 201–226.
2. Weston, R., & Gore Jr., P. A. (2006). A brief guide to structural equation modeling.
The Counseling Psychologist, 34, 719-751.
2.Path analysis:
1. Stage, F. K., Carter, H. C., & Nora, A. (2004). Path analysis: An introduction and
analysis of a decade of research. The Journal of Educational Research, 98, 5-12.
2. Streiner, D. L. (2005). Finding our way: An introduction to path analysis. Canadian
Journal Of Psychiatry, 50, 115-122.
3.Missing data analysis:
Copyright © 2009 by Mike Cheung, SEM 101
1. Allison, P. D. (2001). Missing data techniques for structural equation modeling.
Journal of Abnormal Psychology, 112, 545-557.
4.Mediation analysis:
1. Cheung, M.W.L. (2007). Comparison of approaches to constructing confidence
intervals for mediating effects using structural equation models. Structural Equation
Modeling, 14, 227-246.
Copyright © 2009 by Mike Cheung, SEM 102

Thank you for your patience!

Das könnte Ihnen auch gefallen