Sie sind auf Seite 1von 21

CFA/SEM Using Stata

Five Main Points:

1. Basics of Stata CFA/SEM syntax


2. One Factor CFA
3. Two Factor CFA
4. Structural Equation Modeling
5. A little bit of cross-group invariance

Basic CFA/SEM Syntax Using Stata:

To begin, we should start on a good note

There is in my opinion really good news: In terms of conducting most analyses, the syntax
for CFA/SEM in Stata is far, far, far simpler than that of LISREL.

Feel free to applaud enthusiastically. Yay!!!!!

Syntax Basics

The most basic language is that which specifies the relationship between the latent constructs and
the observed variables. You do this simply by creating a latent construct (using CAPITAL
letters) that is comprised of observed variables that are in the data set.

For example:

sem (DEPRESS -> x1 x2 x3)

or

sem (x1 x2 x3 <- MONKEY)

Please note that the order between the latent construct and the observed variables does not matter
and it does not matter what you name the latent construct. However, you do need to be careful to
have the arrow which is simply comprised of a hyphen (-) and a greater than/less than (>
or <) pointed in the right direction: away from the latent construct.

Another basic command is the cov option. This allows you to specify that the error terms for
particular observed variables be allowed to correlate.

1
For example:

sem (DEPRESS -> x1 x2 x3), cov (e.x1*e.x2 e.x1*e.x3)

or

sem (DEPRESS -> x1 x2 x3), cov (e.x1*e.x2) cov (e.x1*e.x3)

Again, the syntax here is relatively flexible as you have a variety of options of how you can
specify that certain observed variables be allowed to correlate.

These are the basic, basic concepts of Stata syntax for CFA/SEM. With these, we easily have
enough information to look at an example and in doing so to become familiar with some
more CFA/SEM syntax.

One Factor CFA

The data for this example indeed, for all the examples in this session comes from National
Survey of Drug Use and Health (NSDUH) which is really interesting nationally representative,
cross-sectional substance use data for individuals from the ages of 12 and up. This particular
data set is a subset of the NSDUH comprised of truant adolescents males between the ages of 12
and 17 years old (M age = 15.3, SD = 0.057). As these data are coded, roughly 3/5 are White
Non-Hispanic (57.71%) while the remaining 2/5 are youth of color (African-American/Hispanic)
(42.29%).

Let's open up the data set and first use the "set more off" command so that Stata does not drive us
crazy by requiring us to click more every 5 seconds.

use "L:\TranCFA_red.dta"
set more off

We can start with an example of confirmatory factor analysis using five variables related to
adolescent academic involvement.

s_felt Felt about going to school


s_work Frequency of school feeling meaningful
s_imp How important things learned will be in future
s_int Courses interesting
s_job Teacher let you know you did a good job

Typically, when I'm going to run a CFA, I begin with a quick reliability analysis and exploratory
factor analysis. We can do these really easily using Stata.

2
Reliability Analysis

If you are not familiar with the reliability analysis syntax using Stata, it is simple:

alpha s_felt s_work s_imp s_int s_job, item

This gives us the following output:

Test scale = mean(unstandardized items)

average
item-test item-rest interitem
Item Obs Sign correlation correlation covariance alpha

s_felt 726 - 0.7012 0.5393 .3249279 0.7433


s_work 725 + 0.7566 0.6008 .3123865 0.7233
s_imp 725 + 0.7146 0.5275 .3271112 0.7475
s_int 725 + 0.8168 0.6799 .2771369 0.6936
s_job 725 + 0.6330 0.4270 .368387 0.7786

Test scale .3219899 0.7795

Looks like for the most part the items of the scale hang together pretty well. The Cronbach's
alpha coefficient of .7795 is clearly acceptable. It is interesting to note that both the item-test
correlation for s_job of .6330 and the item-rest correlation for s_job of .4370 are quite a bit lower
than the corresponding values for other items. It does not seem to affect the overall reliability of
the scale and is just something to keep an eye on as we proceed.

Note: if you want to do an internal consistency analysis for a specified group, then you can
specify this too. For example, you could use the following syntax:

alpha s_felt s_work s_imp s_int s_job if racecat == 1, item

Exploratory Factor Analysis

If you are not familiar with the syntax for EFA using Stata, it is also relatively straightforward:

factor s_felt s_work s_imp s_int s_job, blanks (.45)

The blanks option is one that I like to use. It allows you to specify that factor loadings of lower
specified value in this case, 0.45 should not be displayed so as to make the factor loading
output easier to interpret. In this example, it doesn't really matter, but it is a good option to know
about.

3
Also, be sure to note that you can specify principal component factor analysis rather than the
default of principal factor" analysis. The selection of each of these depends upon what kind of
analysis you are running and what kind of constructs you are looking at.

Factor analysis/correlation Number of obs = 725


Method: principal factors Retained factors = 2
Rotation: (unrotated) Number of params = 9

Factor Eigenvalue Difference Proportion Cumulative

Factor1 2.03562 1.98954 1.1944 1.1944


Factor2 0.04608 0.08445 0.0270 1.2214
Factor3 -0.03837 0.09426 -0.0225 1.1989
Factor4 -0.13262 0.07373 -0.0778 1.1211
Factor5 -0.20635 . -0.1211 1.0000

LR test: independent vs. saturated: chi2(10) = 950.30 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

Variable Factor1 Factor2 Uniqueness

s_felt 0.6240 0.6017


s_work 0.6779 0.5324
s_imp 0.6113 0.6076
s_int 0.7581 0.4219
s_job 0.4880 0.7546

(blanks represent abs(loading)<.45)

The results of the exploratory factor analysis indicate that all five of these items load nicely onto
a single latent factor. Again, however, please note that the factor loading for s_job is quite a bit
lower than the other four. This makes sense as this variable is slightly different than the other for
variables in the sense that it is in reference to a teachers support rather than the students
perception of the importance of school, etc.

Note: the default setting in Stata is to display rotated solutions, but if you want to see an un-
rotated solution then you can see this using the norotated command. This is available for
replay only.

At any rate, let's proceed to the CFA with all five of these variables.

4
One Factor CFA

Just so we can visualize it, this is the path diagram for the proposed model:

Felt About
School

School
Meaningful Academic
Involvement
Important for
Future

Courses
Interesting

Teacher:
Good Job

Now that we have our proposed confirmatory factor model, let's write up the syntax and see what
we get. I am going to use the stand option so that we get standardized coefficients:

sem (INVOLVE -> s_felt s_work s_imp s_int s_job), stand

Endogenous variables

Measurement: s_felt s_work s_imp s_int s_job

Exogenous variables

Latent: INVOLVE

Fitting target model:

Iteration 0: log likelihood = -4225.2004


Iteration 1: log likelihood = -4224.09
Iteration 2: log likelihood = -4224.0826
Iteration 3: log likelihood = -4224.0826

Structural equation model Number of obs = 725


Estimation method = ml
Log likelihood = -4224.0826

( 1) [s_felt]INVOLVE = 1

5
This first part of the syntax tells us a few things. First, it identifies our five observed variables as
measurement variables that contribute to the identification of the latent variable we have
baptized INVOLVE. It also shows us how many iterations using maximum likelihood it took
for Stata to set the target model (3 is good! After all, this is a relatively simple model). It
also tells us our number of observations (725), the log likelihood value for model, and that the
factor loading for s_felt was fixed to 1.0.

Next, we can look at the rest of the output:

OIM
Standardized Coef. Std. Err. z P>|z| [95% Conf. Interval]

Measurement
s_felt <-
INVOLVE .6419818 .0271475 23.65 0.000 .5887737 .6951898
_cons 3.101211 .0895102 34.65 0.000 2.925775 3.276648

s_work <-
INVOLVE .6795142 .0269338 25.23 0.000 .6267248 .7323036
_cons 3.335998 .0951546 35.06 0.000 3.149498 3.522498

s_imp <-
INVOLVE .6139033 .0288177 21.30 0.000 .5574217 .6703849
_cons 3.497442 .0990719 35.30 0.000 3.303265 3.69162

s_int <-
INVOLVE .8010629 .0225112 35.59 0.000 .7569417 .845184
_cons 3.039006 .0880265 34.52 0.000 2.866477 3.211534

s_job <-
INVOLVE .4942271 .0326628 15.13 0.000 .4302092 .558245
_cons 3.44226 .0977299 35.22 0.000 3.250713 3.633808

Variance
e.s_felt .5878594 .0348563 .5233625 .6603046
e.s_work .5382605 .0366039 .4710938 .6150035
e.s_imp .6231228 .0353825 .557494 .6964774
e.s_int .3582982 .0360658 .2941467 .4364408
e.s_job .7557396 .0322857 .6950376 .821743
INVOLVE 1 . . .

LR test of model vs. saturated: chi2(5) = 35.30, Prob > chi2 = 0.0000

This output gives us standardized factor loading values for each of the five observed variables as
well as their standard error, significance, and confidence intervals. For example, the
standardized factor loading for s_felt onto the latent construct INVOLVE was 0.64 with a
standard error of 0.027. It was significant at p < .001 and had a 95% confidence interval that
ranged from 0.59 to 0.69. All this looks good.

The output also provides us with the chi-square value of 35.30, the degrees of freedom of 5, and
the significance of the chi-square test (i.e. p < . 001). These preliminary goodness of fit statistics

6
suggest that the model may not fit the data all that well. However, Stata can provide us with
more information.

If you use the following syntax after running your CFA model, you will get additional goodness
of fit statistics:

estat gof, stats(all)

Fit statistic Value Description

Likelihood ratio
chi2_ms(5) 35.295 model vs. saturated
p > chi2 0.000
chi2_bs(10) 953.586 baseline vs. saturated
p > chi2 0.000

Population error
RMSEA 0.091 Root mean squared error of approximation
90% CI, lower bound 0.064
upper bound 0.121
pclose 0.007 Probability RMSEA <= 0.05

Information criteria
AIC 8478.165 Akaike's information criterion
BIC 8546.958 Bayesian information criterion

Baseline comparison
CFI 0.968 Comparative fit index
TLI 0.936 Tucker-Lewis index

Size of residuals
SRMR 0.028 Standardized root mean squared residual
CD 0.811 Coefficient of determination

This provides us with some of the goodness of fit statistics we are familiar with using LISREL.
For instance, we can see that the RMSEA value is 0.091, that the CFI value is 0.968 and that the
TLI value is 0.936. The CD value of 0.811 provides information similar to the R-squared value
you get using OLS and other forms of regression. In all, looking at these goodness of fit
statistics suggest that the fit of the model to the data is only moderate and may benefit from
modification.

At this point, it would be helpful to examine the modification indices and see if purely in an
empirical sense any additional paths could be specified that might improve model fit. Is
anyone else as excited as I am? Please don't raise your hand.

To get the modification indices we type the following after having run the CFA model:

estat mindices

7
Modification indices

Standard
MI df P>MI EPC EPC

Covariance
e.s_felt
e.s_imp 13.420 1 0.00 -.0848231 -.174979
e.s_int 11.096 1 0.00 .0878639 .2376938

e.s_work
e.s_imp 27.714 1 0.00 .1175849 .2650019
e.s_int 19.436 1 0.00 -.1170219 -.3458601

e.s_int
e.s_job 5.050 1 0.02 .0517828 .1280503

EPC = expected parameter change

These modification indices give us some important information about omitted paths in the fitted
model. Two particularly salient points stand out. First the variables in the covariance column
identify potential suggested paths. Second, the numbers in the MI column represent the
decreases in the Chi-Squared value that will result if the suggested paths added. For instance, if
a path were added from e.s_imp to e.s_work then the chi-squared value will decrease by value of
27.714.

Let's just say that we have a theoretical or conceptual reason to add a path between these error
terms (i.e. perceived feelings of the importance and meaningfulness of school are conceptually
akin and thus it is conceptually plausible that the error terms might be correlated, etc.). In such a
case, we would add this path using the following syntax:

sem (INVOLVE -> s_felt s_work s_imp s_int s_job), cov (e.s_imp*e.s_work) stand

The output for this syntax is omitted, however, if you run this model you will see that the new
chi-square value of 8.88 for the modified model is approximately 27 units lower than the initial
chi-squared value for the proposed model. If you examine the goodness of fit statistics again you
will also see a difference between the proposed modified models.

estat gof, stats(all)

Because I have nothing better to do on a Sunday afternoon (eek!), I created a table to show the
difference between the goodness of fit statistics for the proposed model (i.e. the model without
the path from e.s_imp to e.s_work) and the modified model.

8
Academic Involvement
Fit Indexes Proposed Model Modified Model
X2 (df) 35.295 (5)*** 8.876 (4)
RMSEA 0.091 0.041
CFI 0.968 0.995
TLI 0.936 0.987
CD 0.811 0.812
Note: p<.10, *p <.05, **p < .01, ***p < .001.
Coefficients in bold are statistically significant at p < .05 or lower.

As you can see, the goodness of fit of the modified model is superior to that of the proposed
model. Pretty simple, no?

Here's what the path diagram looks like with the standardized values and error terms. You have
to make this yourself using Microsoft Word or some other program. Stata wont do it for you.

e1 Felt About School

.65***

e2 School
.63***
Meaningful
Academic
.23*** .56***
Involvement

e3 Important for Future

.84***

e4 Courses
Interesting
.49***

e5 Teacher: Good Job

9
Two Factor CFA

So, the first example looked at a confirmatory factor analysis for single factor latent construct.
Using the same basic syntax, we can do a very similar procedure for a two factor latent construct.
This is a pretty simple procedure.

Lets say were interested in a latent construct of "school involvement" that is comprised of two
latent factors: Academic Engagement (made up of the five observed variables that we examined
above) AND Parental Involvement. In this case, parental involvement is made up of the
following two variables:

p_checkwrk Frequency of parents checking homework


p_homework Frequency of parents helping with homework

Lets start by looking at a proposed path diagram that can help us to conceptualize what this two
factor model looks and feels like.

Felt About
School

School
Meaningful
Academic
Engagement
Important for
Future
Check
Homework
Courses
Interesting
Parental
Involvement Help w/
Teacher: Homework
Good Job

The syntax for this model is very similar to that of a one factor confirmatory factor analysis. The
only difference is that we add some additional syntax to specify that the observed variables
hypothesized as loading onto the latent parental involvement variable should be included as well
Here it is:

sem (INVOLVE -> s_felt s_work s_imp s_int s_job) (PARENT -> p_checkwrk p_homework),
stand

10
Just looking at the measurement component of the output, we can see that the observed variables
load reasonably well onto their corresponding latent constructs.

OIM
Standardized Coef. Std. Err. z P>|z| [95% Conf. Interval]

Measurement
s_felt <-
INVOLVE .639671 .027285 23.44 0.000 .5861934 .6931486
_cons 3.099758 .0897856 34.52 0.000 2.923781 3.275734

s_work <-
INVOLVE .6813196 .0266317 25.58 0.000 .6291224 .7335168
_cons 3.329927 .0953372 34.93 0.000 3.143069 3.516784

s_imp <-
INVOLVE .62111 .0285262 21.77 0.000 .5651997 .6770203
_cons 3.505355 .0996087 35.19 0.000 3.310126 3.700585

s_int <-
INVOLVE .796785 .0223727 35.61 0.000 .7529354 .8406347
_cons 3.027399 .0880544 34.38 0.000 2.854816 3.199983

s_job <-
INVOLVE .4991685 .0326156 15.30 0.000 .4352431 .5630939
_cons 3.433842 .0978635 35.09 0.000 3.242033 3.625651

p_checkwrk <-
PARENT .8637905 .1054045 8.20 0.000 .6572015 1.07038
_cons 3.015601 .0877728 34.36 0.000 2.843569 3.187632

p_homework <-
PARENT .573103 .0740109 7.74 0.000 .4280443 .7181618
_cons 2.744652 .0813647 33.73 0.000 2.58518 2.904124

Of course, as we did with the single factor model, we could also look at the goodness of fit
statistics using the following command:

estat gof, stats(all)

And look at any modification indices using the following command:

estat mindices

Additionally, were you concerned about missing values, you could also run the model using an
approach that is conceptually akin to but methodologically simpler than multiple imputation.
This can be used for variables that are either missing at random or missing completely at
random:

sem (INVOLVE -> s_felt s_work s_imp s_int s_job) (PARENT -> p_checkwrk p_homework),
method(mlmv) stand

11
Structural Equation Modeling

Having become familiar with the basics of the syntax for confirmatory factor analysis using
Stata, it isn't a big jump to move towards structural equation modeling. Really, it just involves
for the most part one additional step in terms of syntax.

As we have above, we can use an example to help elucidate what structural equation modeling
might look like in Stata. Let's begin with a conceptual path diagram that can help us make sense
of what were talking about.

Religious Academic Substance


Salience Involvement Use

This model takes our initial 5-item, single factor measure of academic involvement and examines
it as a mediator of the relationship between religious salience and substance use.

Religious Salience (Alpha = 0.84) is comprised of the following two variables:

r_import the importance of religious beliefs


r_dec religious beliefs influence my decisions

Substance Use (Alpha = 0.78) is comprised of three dichotomous measures of substance use
over the previous 12 months:

tobacco Tobacco use


alcohol Alcohol use
marij Marijuana use

Having operationalized our latent constructs and, presumably, having run reliability analysis a
confirmatory factor analysis on each construct we can move towards structural equation
modeling. The syntax involves one simple extension of CFA: rather than simply specifying the
measurement models, we also specify the ways in which the latent constructs should be related
(i.e. we specify the structural model).

Here is what it looks like for this example:

sem (RELIG -> r_import r_dec) (INVOLVE -> s_felt s_work s_imp s_int s_job) (SUB ->
tobacco marij alcohol) (RELIG -> INVOLVE) (INVOLVE->SUB), stand

12
As you can see below, this gives us output for both a "structural" and "measurement" component.

OIM
Standardized Coef. Std. Err. z P>|z| [95% Conf. Interval]

Structural
INVOLVE <-
RELIG .3445945 .0417522 8.25 0.000 .2627616 .4264274

SUB <-
INVOLVE -.2447217 .0445189 -5.50 0.000 -.3319772 -.1574662

Measurement
r_import <-
RELIG .7678148 .0433559 17.71 0.000 .6828388 .8527907
_cons 2.658976 .0800908 33.20 0.000 2.502001 2.815951

r_dec <-
RELIG .9419582 .0498841 18.88 0.000 .8441872 1.039729
_cons 2.512768 .0766797 32.77 0.000 2.362479 2.663058

s_felt <-
INVOLVE .6442999 .0272316 23.66 0.000 .5909269 .697673
_cons 3.101709 .0906544 34.21 0.000 2.92403 3.279388

s_work <-
INVOLVE .6878422 .0263092 26.14 0.000 .6362771 .7394072
_cons 3.327945 .0961611 34.61 0.000 3.139472 3.516417

s_imp <-
INVOLVE .6185892 .0285719 21.65 0.000 .5625892 .6745892
_cons 3.501077 .1004148 34.87 0.000 3.304268 3.697886

s_int <-
INVOLVE .7920593 .0222125 35.66 0.000 .7485235 .8355951
_cons 3.036869 .0890885 34.09 0.000 2.862259 3.21148

s_job <-
INVOLVE .4986583 .0327875 15.21 0.000 .4343959 .5629207
_cons 3.415893 .098318 34.74 0.000 3.223193 3.608593

tobacco <-
SUB .8010398 .0254266 31.50 0.000 .7512046 .8508751
_cons .8097763 .0433379 18.69 0.000 .7248356 .8947171

marij <-
SUB .788174 .0255671 30.83 0.000 .7380634 .8382846
_cons .6921557 .0418717 16.53 0.000 .6100887 .7742227

alcohol <-
SUB .6185876 .0289005 21.40 0.000 .5619438 .6752315
_cons 1.009951 .0462146 21.85 0.000 .9193716 1.100529

13
Here's the structural component would look like in a path diagram:

.34*** -.24***
Religious Academic Substance
Salience Involvement Use

If you were in the mood to be interpretive, you might say:

Academic involvement serves to mediate the relationship between religious salience and
substance use. More precisely, religious salience is associated with increased academic
involvement ( = 0.34, p < .001) which is, in turn, associated with lower levels of substance use
( = -0.24, p < .001).

Of course, just as we did with the CFA models, you could also look at the goodness of fit
statistics:

estat gof, stats(all)

And you could look at any modification indices using the following command:

estat mindices

Another function that can be used with SEM models is the estat teffects function. This allows
us to calculate the direct, indirect, and total effects and to calculate the significance of each. The
following syntax can be used after estimating your SEM models.

estat teffects, stand compact nodirect

This gives us a great deal of output, but it is often helpful to simply focus in on the indirect and
total effects for the structural models. Each of these is below.

INDIRECT EFFECTS

Structural
INVOLVE <-

SUB <-
RELIG -.0424611 .0098129 -4.33 0.000 -.0843298

14
The unstandardized coefficient is the first number that is circled. The standardized coefficient is
the number that is circled on the far right. It took me a while to figure this out on my own, so I
included circles

Looking at the standardized indirect effect of religious salience on substance use, we can see that
the indirect effect of religious salience on substance use is significant ( = -0.084, p < .001). To
be sure, this is the same number you get when you multiply the standardized coefficient from
religious salience to academic involvement by the standardized coefficient of academic
involvement to substance use (i.e. 0.3445*-0.2447 = -0.084). Snazzy.

TOTAL EFFECTS

Structural
INVOLVE <-
RELIG .254523 .0346072 7.35 0.000 .3445945

SUB <-
INVOLVE -.166826 .0325338 -5.13 0.000 -.2447217
RELIG -.0424611 .0098129 -4.33 0.000 -.0843298

You can also look at the total effects and their significance of religious salience and parental
involvement on substance use. Note that, given that religious salience has no direct effect on
substance use (that is, at least in this simplified model), that the indirect effect is the same as the
total effect. Also, since academic involvement is directly associated with substance use and
there is no indirect relationship between these two variables, the direct effect is the same as the
total effect.

On an unrelated point, it should be noted that, if you are missing the traditional SEM notation
that you get using LISREL, you can always type in estat framework after you run your SEM
models and they will give you all of the old output that you miss so dearly.

Finally, if you have any questions about this whatsoever I would direct you to the resources
listed below. The secret to getting this stuff is locking oneself in a dark chamber, drinking
copious amounts of whatever does the trick for you, and running model after model while
consulting textbooks, Google, peers, instructors, and scholarly manuscripts

1. the Statalist (www.stata.com/statalist). This is your resource for all things dweeby.

2. type help sem directly into the command bar in Stata.

3. the Stata manual that I sent you all in the email.

15
Testing CFA Model Invariance

Finally, CFA or SEM model invariance.

Please note: looking at model invariance using Stata is possible, however, the syntax used in
doing so is substantially more complicated and perhaps due to the newness of the software it
still appears to be a little wonky. That said, if you want a full tutorial on how to check
measurement invariance using Stata 12, then I suggest you consult the following website:

http://www.ats.ucla.edu/stat/stata/faq/invariance.htm

Also, there is some pertinent information in the Stata SEM manual that I sent to you guys.

Be advised, however, that when I have attempted to run some of these models using the NSDUH
data and other data, I have had difficulty in terms of model convergence. That is to say, I have
had difficulty getting my models to run and to actually produce interpretable output. Stata is still
a little wonky in terms of some of the more complex CFA/SEM procedures.

Nevertheless, there are some relatively simple and interesting commands that can be used to
examine factor structure, factor loading, and error measurement equivalence.

Factor structure, loading, and error measurement equivalence

Returning to our example of academic involvement, we can examine the factor structure, factor
loading, and factor error measurement values for two comparable populations. In this case, we
can use the variable racecat to compare the measurement equivalence of academic
involvement for Caucasian adolescents (57.71%) and for Minority Adolescents (for our
purposes, African-American and Hispanic youth only) (42.29%).

We start by running confirmatory factor models for each of the subgroups using the following
syntax:

sem (INVOLVE -> s_felt s_work s_imp s_int s_job) if racecat ==1
estat gof, stats(all)

sem (INVOLVE -> s_felt s_work s_imp s_int s_job) if racecat ==2
estat gof, stats(all)

Running these two models and using the estat gof, stats(all) command to examine the
goodness of fit statistics yields the following information:

16
Caucasian Youth Minority Youth
(N = 419) (N = 307) Difference
Measurement Model
Academic Involvement

Felt about school 1.00 1.00 --


error 0.49 0.43 0.06

School felt meaningful 1.03 0.98 0.05


error 0.43 0.34 0.09

Importance of learning 0.94 1.02 -0.08


error 0.53 0.48 0.05

School is interesting 1.37 1.22 0.15


error 0.26 0.32 -0.06

Teacher: "good job" 0.77 0.74 0.03


error 0.59 0.51 0.08

Fit Indexes
X2 (df) 29.74 (5)*** 13.21 (5)* 16.53
RMSEA 0.109 0.073 0.036
CFI 0.953 0.977 -0.024
TLI 0.901 0.954 -0.053
R2 0.816 0.789 0.027
Note: p<.10, *p <.05, **p < .01, ***p < .001.

Looking at the models as presented in this table, it seems that the model fit is slightly better for
minority youth than it is for Caucasian youth. Also, differences can be noted in terms of some of
the coefficients for the observed variables and error terms. However, simply visually examining
these differences or calculating differences is probably insufficient. Luckily, Stata provides a
helpful command in terms of examining the parameter differences (i.e. factor loadings and error
terms).

If we use the group option with the confirmatory factor model syntax, then Stata will constrain
the parameters to be equal across the specified group. Here is the syntax:

sem (INVOLVE -> s_felt s_work s_imp s_int s_job), group (racecat)

This yields coefficients that are slightly different than the coefficients for either of the subgroups
listed above. Notably, the coefficients are also different from those yielded by a standard
confirmatory factor analysis of the data without the specification of any grouping variable (i.e.
sem (INVOLVE -> s_felt s_work s_imp s_int s_job)

17
The output for the coefficients for the group specified syntax is below:

OIM
Coef. Std. Err. z P>|z| [95% Conf. Interval]

Measurement
s_felt <-
INVOLVE
[*] 1 (constrained)
_cons
[*] 2.645086 .0397695 66.51 0.000 2.567139 2.723032

s_work <-
INVOLVE
[*] 1.009841 .0705383 14.32 0.000 .8715886 1.148094
_cons
[*] 2.723091 .0386825 70.40 0.000 2.647275 2.798907

s_imp <-
INVOLVE
[*] .953468 .0738705 12.91 0.000 .8086844 1.098252
_cons
[*] 3.034508 .0393521 77.11 0.000 2.957379 3.111637

s_int <-
INVOLVE
[*] 1.239082 .077246 16.04 0.000 1.087683 1.390482
_cons
[*] 2.605361 .0422193 61.71 0.000 2.522612 2.688109

s_job <-
INVOLVE
[*] .7317976 .0658925 11.11 0.000 .6026506 .8609446
_cons
[*] 2.868235 .0359589 79.76 0.000 2.797757 2.938713

Besides the asterisks that specify that parameter estimates were constrained to be equal across
the Caucasian adolescent and Minority adolescent subgroups, this looks much like any other
output for a CFA model.

However, the output below specifically for the error variance is slightly different than what
we typically see for a CFA model that is not specified to be constrained to be equal across
groups. Note that the error variance is provided for both Group 1 (i.e. Caucasian adolescents)
and Group 2 (i.e. minority adolescents). Also note that, while not identical, these error variance
terms are quite similar to the error variance terms identified by running each of the models
separately as presented in the table above.

18
Mean
INVOLVE
1 0 (constrained)
2 .2807259 .0490092 5.73 0.000 .1846696 .3767822

Variance
e.s_felt
1 .4874226 .0399215 .415135 .5722975
2 .4184243 .0408301 .3455861 .5066145
e.s_work
1 .4179876 .0366378 .3520084 .4963336
2 .3346605 .0350795 .2725088 .4109872
e.s_imp
1 .5167937 .0417211 .4411632 .60539
2 .4911096 .0457873 .40909 .5895735
e.s_int
1 .2925839 .0356177 .2304779 .3714254
2 .3239955 .0394574 .2551976 .4113405
e.s_job
1 .5897704 .0437315 .5099954 .6820239
2 .5143119 .0447219 .433721 .6098775
INVOLVE
1 .3382274 .0426803 .2641175 .4331322
2 .2850644 .0393423 .2175039 .3736103

Note: [*] identifies parameter estimates constrained to be equal across


groups.
LR test of model vs. saturated: chi2(18) = 53.97, Prob > chi2 = 0.0000

At this point, Stata offers two simple commands that can help us to generate additional
information. The first of these commands provides information as to the group level goodness of
fit statistics. The syntax for this such:

estat ggof

Group-level fit statistics

N SRMR CD

racecat
1 419 0.039 0.808
2 306 0.038 0.788

Note: group-level chi-squared are not


reported because of constraints between
groups.

19
Please note that these group level fit statistics are rather limited. There is no RMSEA, no CFI,
no TLI, and Chi squared test. However, the two fit statistics that are provided are easily
interpretable. For the SRMR, a value near 0 represents good fit. For the CD, a value near 1
represents good fit. As you can see, these fit statistics are rather lackluster, but Stata provides
this option and you should be aware of it nonetheless.

You can also use the estat gof, stats(residuals command to get the overall results

There is another command that is arguably more useful that allows for the testing parameter of
equality across groups. As with the previous command, this command should be used after you
run your CFA (or SEM!) model. The syntax goes as such:

estat ginvariant

Here is the output:

Tests for group invariance of parameters

Wald Test Score Test


chi2 df p>chi2 chi2 df p>chi2

Measurement
s_felt <-
INVOLVE . . . 1.558 1 0.2120
_cons . . . 4.704 1 0.0301

s_work <-
INVOLVE . . . 0.028 1 0.8674
_cons . . . 1.661 1 0.1974

s_imp <-
INVOLVE . . . 0.876 1 0.3494
_cons . . . 0.032 1 0.8577

s_int <-
INVOLVE . . . 3.409 1 0.0649
_cons . . . 5.909 1 0.0151

s_job <-
INVOLVE . . . 0.021 1 0.8857
_cons . . . 0.284 1 0.5941

Variance
e.s_felt 1.517 1 0.2181 . . .
e.s_work 2.937 1 0.0866 . . .
e.s_imp 0.179 1 0.6720 . . .
e.s_int 0.401 1 0.5267 . . .
e.s_job 1.478 1 0.2240 . . .
INVOLVE 1.612 1 0.2042 . . .

20
The chi-squared tests are for parameters that were constrained (i.e. the coefficients for the
observed variables) while the Wald tests are reported for parameters that were not constrained (in
this case, the error measurement terms). More precisely, the null hypothesis is that the
constraints imposed that is, that the parameter estimates were constrained to be equal across the
Caucasian adolescent and minority adolescent subgroups are valid. A significant test rejects the
hypothesis that each individual constraint is valid.

Looking at the results of these tests, we see no significant tests. If you look closely, however,
you can see that the relatively large difference between the factor loading for school is
interesting" corresponds with the largest Chi2 value which has a p value of 0.06 (i.e. nearly
significant). The larger the difference, the more likely the Chi2 will be significant

Caucasian Youth Minority Youth Chi2


(N = 419) (N = 307) (p value)
Measurement Model
Academic Involvement

Felt about school 1.00 1.00 1.558


error 0.49 0.43 0.21

School felt meaningful 1.03 0.98 0.028


error 0.43 0.34 0.87

Importance of learning 0.94 1.02 0.876


error 0.53 0.48 0.35

School is interesting 1.37 1.22 3.409


error 0.26 0.32 0.06

Teacher: "good job" 0.77 0.74 0.021


error 0.59 0.51 0.89

Fit Indexes
X2 (df) 29.74 (5)*** 13.21 (5)*
RMSEA 0.109 0.073
CFI 0.953 0.977
TLI 0.901 0.954
R2 0.816 0.789

In all, the results for these tests imply that none of the constraints imposed should be relaxed. In
other words, these tests lend support to parameter invariance. As I mentioned above, there are
certainly more sophisticated tests namely, those that correspond with Table 7.2 and Dr. Trans
textbook however, the simple tests described here are a start that can get you along your way.

For more information check out the sources listed above and throughout this document.

21

Das könnte Ihnen auch gefallen