Sie sind auf Seite 1von 47

Structural Equation

Modeling
1

DR. ARSHAD HASSAN

Structural Equation Modeling


2

SEM is an extension of the general linear model that

enables a researcher to test a set of regression


equations simultaneously.
SEM software can test traditional models, but it also
permits examination of more complex relationships
and models, such as confirmatory factor analysis and
path analyses.

Structural Equation Modeling


3

SEM Structural Equation Modeling


CSA Covariance Structure Analysis
Causal Models
Simultaneous Equation Modeling

Structural Equation Modeling


4

SEM is a combination of factor analysis and multiple

regression.

Structural Equation Modeling


5

The researcher first specifies a model based on

theory, then determines how to measure constructs,


collects data, and then inputs the data into the SEM
software package. The package fits the data to the
specified model and produces the results, which
include overall model fit statistics and parameter
estimates.

Theory
7

Theorize your model

What observed variables?


What latent variables?
Relationship between latent variables?
Relationship between latent variables and observed variables?
Correlated errors of measurement?

Structural Equation Modeling


8

SEM has a language all its own.


Manifest or observed variables are directly

measured by researchers, while latent or unobserved


variables are not directly measured but are inferred
by the relationships or correlations among measured
variables in the analysis.
This statistical estimation is accomplished in much
the same way that an exploratory factor analysis
infers the presence of latent factors from shared
variance among observed variables.

Structural Equation Modeling


9

Independent variables, which are assumed to be

measured without error, are called exogenous or


upstream variables;
dependent or mediating variables are called
endogenous or downstream variables.

Structural Equation Modeling


10

SEM users represent relationships among observed

and unobserved variables using path diagrams.


Ovals or circles represent latent variables, while
rectangles or squares represent measured variables.
Residuals are always unobserved, so they are
represented by ovals or circles.

Vocabulary
11

Measured variable

Observed variables, indicators or manifest variables in


an SEM design
Predictors and outcomes in path analysis
Squares in the diagram
Latent Variable

Un-observable variable in the model, factor, construct


Construct driving measured variables in the
measurement model
Circles in the diagram

Vocabulary
12

Error or E

Variance left over after prediction of a measured variable

Disturbance or D

Variance left over after prediction of a factor

Exogenous Variable

Variable that predicts other variables

Endogenous Variables

A variable that is predicted by another variable


A predicted variable is endogenous even if it in turn
predicts another variable

Vocabulary
13

Parameters.
The parameters of the model are regression coefficients for

paths between variables and variances/covariances of


independent variables. Parameters may be fixed to a
certain value (usually 0 or 1) or may be estimated.
In the diagram, an represents a parameter to be
estimated. A 1 indicates that the parameter has been
fixed to value 1. When two variables are not connected
by a path the coefficient for that path is fixed at 0.

Why SEM?
14

Assumptions underlying the statistical analyses are clear

and testable, giving the investigator full control and


potentially furthering understanding of the analyses.
Graphical interface software boosts creativity and
facilitates rapid model debugging
SEM programs provide overall tests of model fit and
individual parameter estimate tests simultaneously.
Regression coefficients, means, and variances may be
compared simultaneously, even across multiple
between-subjects groups.

Why SEM?
15

Measurement and confirmatory factor analysis models can

be used to purge errors, making estimated relationships


among latent variables less contaminated by measurement
error.
Ability to fit non-standard models, including flexible
handling of longitudinal data, databases with auto correlated
error structures (time series analysis), and databases with
non-normally distributed variables and incomplete data.
This last feature of SEM is its most attractive quality. SEM
provides a unifying framework under which numerous linear
models may be fit using flexible, powerful software.

SEM Assumptions
16

A Reasonable Sample Size


Continuously and Normally Distributed Endogenous

Variables
Model Identification

Identification
17

Identification is a structural or mathematical

requirement in order for the SEM analysis to take


place.
Identification refers to the idea that there is at least
one unique solution for each parameter estimate in a
SEM model.

Identification
18

Models in which there is only one possible solution

for each parameter estimate are said to be justidentified


Models for which there are an infinite number of
possible parameter estimate values are said to be
underidentified.
Finally, models that have more than one possible
solution (but one best or optimal solution) for each
parameter estimate are considered overidentified.

Model Identification
19

To determine whether the model is identified or not,

compare the number of data points to the number of


parameters to be estimated.
Since the input data set is the sample
variance/covariance matrix, the number of data
points is the number of variances and covariances in
that matrix, which can be calculated as , M(M+1)/2
where m is the number of measured variables.

Structural Equation Modeling


20

The SEM can be divided into two parts.


The measurement model is the part which relates

measured variables to latent variables.


The structural model is the part that relates latent
variables to one another.

Structural Equation Modeling


21

Measurement Models

Structural Equation Modeling


22

Structural Models

Structural Equation Modeling


23

Simultaneous Models

Identification of the Measurement Model


24

The measurement portion of the model will probably be identified if:


There is only one latent variable, it has at least three indicators that

load on it, and the errors of these indicators are not correlated with
one another.
There are two or more latent variables, each has at least three
indicators that load on it, and the errors of these indicators are not
correlated, each indicator loads on only one factor, and the factors
are allowed to covary.
There are two or more latent variables, but there is a latent variable
on which only two indicators load, the errors of the indicators are
not correlated, each indicator loads on only one factor, and none of
variances or covariances between factors is zero.

Identification of the Structural Model


25

This portion of the model may be identified if:


None of the latent dependent variables predicts

another latent dependent variable.


When a latent dependent variable does predict
another latent dependent variable, the relationship is
recursive, and the disturbances are not correlated.

Handling of Incomplete Data


26

Typical ad hoc solutions to missing data problems include


listwise deletion of cases, where an entire cases record is

deleted if the case has one or more missing data points,


and
pairwise data deletion, where bivariate correlations are
computed only on cases with available data. Pairwise
deletion results in different Ns for each bivariate
covariance or correlation in the database.
Another typically used ad hoc missing data handling
technique is substitution of the variables mean for the
missing data points on that variable.

Handling of Incomplete Data


27

Listwise deletion can result in a substantial loss of power,

particularly if many cases each have a few data points missing


on a variety of variables, not to mention limiting statistical
inference to individuals who complete all measures in the
database.
Pairwise deletion is marginally better, but the consequences
of using different ns for each covariance or correlation can
have profound consequences for model fitting efforts,
including impossible solutions in some instances.
Finally, mean substitution will shrink the variances of the
variables where mean substitution took place, which is not
desirable.

Handling of Incomplete Data


28

If the proportion of cases with missing data is small, say five

percent or less, list wise deletion may be acceptable (Roth,


1994).
Of course, if the five percent (or fewer) cases are not missing
completely at random, inconsistent parameter estimates can
result.
Otherwise, missing data experts (e.g., Little and Rubin, 1987)
recommend using a maximum likelihood estimation method
for analysis, a method that makes use of all available data
points.
AMOS features maximum likelihood estimation in the
presence of missing data.

Reliability of Measured Variables.


29

The variance in each measured variable is assumed to

stem from variance in the underlying latent variable.


Classically, the variance of a measured variable can be
partitioned into true variance (that related to the true
variable) and (random) error variance.
The reliability of a measured variable is the ratio of true
variance to total (true + error) variance.
In SEM the reliability of a measured variable is
estimated by a squared correlation coefficient, which is
the proportion of variance in the measured variable that
is explained by variance in the latent variable(s

How SEM Works


30

Statistically, the model is evaluated by comparing

two variance/covariance matrices. From the data a


sample variance/covariance matrix is calculated.
From this matrix and the model an estimated
population variance/covariance matrix is computed.
If the estimated population variance/covariance
matrix is very similar to the known sample Variance/
covariance matrix, then the model is said to fit the
data well.

How SEM Works


31

Evaluating Model Fit


32

The Default model, contains the fit statistics for the model you

specified in your AMOS Graphics diagram.


The Saturated and Independence, refer to two baseline or
comparison models automatically fitted by AMOS as part of every
analysis.
The Saturated model contains as many parameter estimates as
there are available degrees of freedom or inputs into the analysis.
The Saturated model is thus the least restricted model possible
that can be fit by AMOS.
By contrast, the Independence model is one of the most restrictive
models that can be fit: it contains estimates of the variances of the
observed variables only. In other words, the Independence model
assumes all relationships between the observed variables are zero.

Tests of Fit
33

The chi-square test is a test of overall model fit,


when the probability value of the chi-square test is

smaller than the .05 level used by convention, you


would reject the null hypothesis that the model fits
the data.
Because the chi-square test of absolute model fit is
sensitive to sample size and non-normality in the
underlying distribution of the input variables,
investigators often turn to various descriptive fit
statistics to assess the overall fit a model to the data.

Tests of Fit
34

These fit statistics are similar to the adjusted R 2 in

multiple regression analysis: the parsimony fit


statistics penalize large models with many estimated
parameters
Tucker-Lewis Index (TLI) and the Comparative Fit
Index (CFI) compare the absolute fit of your
specified model to the absolute fit of the
Independence model. The greater the discrepancy
between the overall fit of the two models, the larger
the values of these descriptive statistics.

Tests of Fit
35

The chi-square test is an absolute test of model fit: If

the probability value (P) is below .05, the model is


rejected.
Hu and Bentler (1999) recommend RMSEA values
below .06 and Tucker-Lewis Index values of .95 or
higher.
The analysis uses an iterative procedure to minimize
the differences between the sample
variance/covariance matrix and the estimated
population variance matrix. Maximum Likelihood
(ML) estimation is that most frequently employed.

Goodness-of-fit Statistics
36

Many goodness-of-fit statistics

Tb = chi-square test statistic for baseline model


Tm = chi-square test statistic for hypothesized model
dfb = degrees of freedom for baseline model
dfm = degrees of freedom for hypothesized model

Tb Tm
NFI
Tb

Tb Tm
IFI
Tb df m

RMSEA

Tm df m
( N 1)df m

Goodness-of-fit Statistics
37

The Normed Fit Index (NFI) is simply the difference

between the two models chi-squares divided by the


chi-square for the independence model. Values of .9
or higher (some say .95 or higher) indicate good fit.
The Comparative Fit Index (CFI) uses a similar
approach (with a noncentral chi-square) and is said
to be a good index for use even with small samples. It
ranges from 0 to 1, like the NFI, and .95 (or .9 or
higher) indicates good fit.

Goodness-of-fit Statistics
38

PRATIO is the ratio of how many paths you dropped

to how many you could have dropped (all of them).


The Parsimony Normed Fit Index (PNFI), is the
product of NFI and PRATIO, and PCFI is the product
of the CFI and PRATIO. The PNFI and PCFI are
intended to reward those whose models are
parsimonious (contain few paths).

Goodness-of-fit Statistics
39

NPAR is the number of parameters in the model.


CMIN is a Chi-square statistic comparing the tested

model and the independence model to the saturated


model.
CMIN/DF, the relative chi-square, is an index of how
much the fit of data to model has been reduced by
dropping one or more paths.
One rule of thumb is to decide you have dropped too
many paths if this index exceeds 2 or 3.

40

RMR, the root mean square residual, is an index of

the amount by which the estimated (by your model)


variances and covariances differ from the observed
variances and covariances. Smaller is better

Goodness-of-fit Statistics
41

GFI, the goodness of fit index, tells you what proportion of the

variance in the sample variance-covariance matrix is accounted


for by the model. This should exceed .9 for a good model. For the
saturated model it will be a perfect 1.
AGFI (adjusted GFI) is an alternate GFI index in which the value
of the index is adjusted for the number of parameters in the
model. The fewer the number of parameters in the model relative
to the number of data points (variances and covariances in the
sample variance-covariance matrix), the closer the AGFI will be
to the GFI.
The PGFI (P is for parsimony), the index is adjusted to reward
simple models and penalize models in which few paths have been
deleted.

Goodness-of-fit Statistics
42

The Root Mean Square Error of Approximation

(RMSEA) estimates lack of fit compared to the


saturated model. RMSEA of .05 or less indicates
good fit, and .08 or less adequate fit. PCLOSE is the
p value testing the null that RMSEA is no greater
than .05.

Goodness-of-fit Statistics
43

Component Fit

Use Substantive Experience

Are signs correct?


Any nonsensical results?
R2s for individual equations
Negative error variances?
Standard errors seem reasonable?

SEM limitations
44

SEM is a confirmatory approach

You need to have established theory about the relationships


Cannot be used to explore possible relationships when you
have more than a handful of variables
Exploratory methods (e.g. model modification) can be used on
top of the original theory
SEM is not causal; experimental design = cause
SEM is often thought of as strictly correlational but can be
used (like regression) with experimental data

Path Analysis
45

Theoretical assumptions

Causality:

X1 and Y1 correlate.
X1 precedes Y1 chronologically.
X1 and Y1 are still related after controlling other
dependencies.
Statistical assumptions
Model needs to be recursive.
It is OK to use ordinal data.
All variables are measured (and analyzed) without
measurement error ( = 0).

Path Analysis
46

Path Analysis estimates effects of variables in a

causal system.
It starts with structural Equationa mathematical
equation representing the structure of variables
relationships to each other.

47

Das könnte Ihnen auch gefallen