Sie sind auf Seite 1von 24

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/247728002

The Measurement of School EngagementAssessing Dimensionality and


Measurement Invariance Across Race and Ethnicity

Article  in  Educational and Psychological Measurement · June 2007


DOI: 10.1177/0013164406299126

CITATIONS READS

106 195

2 authors, including:

Jennifer L. Glanville
University of Iowa
25 PUBLICATIONS   1,054 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Jennifer L. Glanville on 12 January 2016.

The user has requested enhancement of the downloaded file.


Educational and
Psychological Measurement
Volume XX Number X
Month XXXX x-xx
The Measurement Ó Sage Publications
10.1177/0013164406299126

of School Engagement http://epm.sagepub.com


hosted at
http://online.sagepub.com
Assessing Dimensionality
and Measurement Invariance
Across Race and Ethnicity
Jennifer L. Glanville
Tina Wildhagen
University of Iowa

The purposes of this study were to (a) assess the measurement of school engagement
in prior research that used the National Educational Longitudinal Study of 1988
(NELS:88), (b) systematically develop an improved measurement model for school
engagement, and (c) examine the measurement invariance of this model across racial
and ethnic groups. Results from confirmatory factor analyses indicated that school
engagement should be measured as a multidimensional concept. A higher order mea-
surement model in which behavioral and psychological engagement are second-order
latent variables that influence several subdimensions is consistent with the data.
Results from a series of multiple group analyses indicated that the proposed measure-
ment model exhibits measurement invariance for White, African American, Latino,
and Asian students. Therefore, it is appropriate to compare the effects of the dimen-
sions of engagement across these groups. The results demonstrate the advantages of
confirmatory factor analysis for enhancing the understanding and measurement of
school engagement.

Keywords: school engagement; measurement invariance; confirmatory factor analysis;


NELS:88

T he concept of school engagement, or the extent to which students are committed


to and participate in the curriculum and other school activities, plays a promi-
nent role in theories of educational achievement and attainment (Newman, Wehlage,
& Lamborn, 1992). Several studies have demonstrated that higher engagement is
associated with higher academic achievement and reduced likelihood of school drop-
out (Finn, Pannozzo, & Voelkl, 1995; Ogbu, 2003; Smerdon, 1999). In addition, pre-
vious research has demonstrated that engagement is amenable to influence through
school or classroom practice (Davis & Jordan, 1994; Finn & Voelkl, 1993; Lee &
Smith, 1993; Marks, 2000). This growing body of research leads policy makers and
scholars to suggest that efforts to increase school engagement could stem high drop-
out rates in urban areas (National Research Council & Institute of Medicine, 2004)

Copyright 2007 by SAGE Publications.


Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
2 Educational and Psychological Measurement

and increase achievement among disadvantaged minority students (Connell, Spencer,


& Aber, 1994; Finn & Rock, 1997).
Given its association with better school outcomes, the concept of school engage-
ment could help both policy makers and scholars gain a more complete understanding
of the process of academic achievement. However, the variety of ways that research-
ers conceptualize and operationalize engagement prevents a comprehensive under-
standing of the degree to which engagement is important across a variety of outcomes
(Fredricks, Blumenfeld, & Paris, 2004). In particular, researchers differ on the dimen-
sionality of engagement, with some treating the concept as multidimensional (e.g.,
Finn, 1993; Finn & Voelkl, 1993) and others measuring it as a single dimension (e.g.,
Connell, Halpern-Felsher, Clifford, Crichlow, & Usinger, 1995; Lee & Smith, 1993;
Marks, 2000). Combining the different dimensions of engagement in one measure
makes it difficult to discern which aspects of engagement are the most important for
improving different school outcomes. For instance, one dimension of engagement
may be more important for preventing dropping out of school, whereas another
dimension may be more important for improving standardized test scores.
Although researchers have developed scales for measuring engagement, such as
the Rochester Assessment Package for Schools (Wellborn & Connell, 1987) and
the engagement subscale of the School Success Profile (Bowen & Richman, 1995),
they have reached no consensus on which scale best measures engagement (Fre-
dricks et al., 2004). Given the absence of a standard measure of engagement,
researchers tend to operationalize engagement differently, even when using the
same data set. For example, although numerous studies of school engagement use
the National Education Longitudinal Study of 1988 (NELS:88), most of them apply
different measurement strategies and use different variables to operationalize engage-
ment. No study to date has subjected these different measurement strategies to confir-
matory factor analysis (CFA) to test whether the assumptions inherent in them are
consistent with the data. In addition, in light of the attention given to engagement in
the area of understanding lower educational achievement for disadvantaged minority
students, researchers should examine whether the dimensions of engagement operate
similarly across racial and ethnic groups.
We have three main goals in this work. First, we use CFA to evaluate previous
measurement strategies that drew on NELS:88 data. Using NELS:88 allows us to

Authors’ Note: This research was supported by a Flexible Load Assignment from the University of
Iowa and a grant from the American Educational Research Association, which receives funds for its
AERA Grants Program from the National Science Foundation and the National Center for Education
Statistics of the Institute of Education Sciences (U.S. Department of Education) under NSF Grant REC-
0310268. Opinions reflect those of the authors and do not necessarily reflect those of the granting
agencies. We thank Robin Henson, Lisa Troyer, and two anonymous reviewers for their helpful
comments on an earlier version of this article and Kenneth Bollen for advice on the analyses. Please
address correspondence to Jennifer L. Glanville, Department of Sociology, W140 Seashore Hall, Univer-
sity of Iowa, Iowa City, IA 52242; e-mail: jennifer-glanville@uiowa.edu.

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 3

replicate and assess several former measurement strategies. Second, based on


previous theory and research, we propose and evaluate a second-order multidimen-
sional model of engagement. Third, we evaluate the extent to which this model
demonstrates measurement invariance across White, African American, Latino,
and Asian students. No previous study has examined whether these or similar mea-
sures of engagement are invariant across race and ethnicity. At least partial
measurement invariance is a necessary precondition for assessing effects of latent
variables across groups (Byrne, Shavelson, & Muthén, 1989).

Defining School Engagement

School engagement refers to a student’s behavioral and psychological involve-


ment in the school curriculum. Engagement encompasses a range of behaviors and
attitudes, with researchers and theorists applying different labels to these behaviors,
such as ‘‘participation,’’ ‘‘identification,’’ ‘‘attachment,’’ ‘‘motivation,’’ and ‘‘mem-
bership.’’ Terms such as ‘‘alienation’’ and ‘‘withdrawal’’ signal the converse of
engagement. Thus, engagement is a general concept that includes many specific
behaviors and attitudes.
Engagement scholars often distinguish between a behavioral, or participation,
component of engagement and a psychological, or emotional, component (Finn,
1989; Fredricks et al., 2004). The behavioral component refers to participation in
classroom and school activities. Participation encompasses both basic behaviors,
such as attendance, following school rules, and avoidance of disruptive behaviors,
and higher level behaviors, such as making an effort to learn. The psychological
component refers to either positive or negative affective responses to school, such
as boredom, interest, a sense of belonging, or identification with school. It corre-
sponds to a sense of belonging in the school and the sense that school is valuable,
‘‘that [students] are discernibly part of the school environment and that school
constitutes an important part of their own experience’’ (Finn, 1989, p. 123). Finn
(1989) argued that the separation of these two components is necessary because
each may have different antecedents.
Given the generality of the concept of engagement, it is important to examine
whether an inclusive measurement model suggests that a general concept is driving
more specific attitudes and behaviors. The range of behaviors and attitudes that
engagement encompasses could make engagement a powerful concept for under-
standing students’ educational outcomes. If a wide variety of behaviors and atti-
tudes are driven by the same underlying psychological and behavioral commitments
to school, then devising ways to improve these behaviors and attitudes among
students can be directed toward the source of these attitudes and behaviors rather
than treated separately. Moreover, as Fredricks et al. (2004) argued, because the

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
4 Educational and Psychological Measurement

dimensions of engagement are interrelated, researchers should study them simulta-


neously rather than in isolation from each other.

Measuring Engagement: Limitations of Current Practice

Although the concept of engagement promises to improve our understanding of


the process of educational attainment, two common practices of measuring engage-
ment impede researchers’ ability to realize this promise. First, whereas conceptual
definitions clearly comprehend the multifaceted nature of engagement, operationa-
lizations do not always reflect this understanding. For example, items that tap beha-
vioral and psychological engagement are often combined in a single scale in
research that examines the determinants of engagement and/or its consequences
(e.g., Connell et al., 1995; Lee & Smith, 1993; Marks, 2000). However, if engage-
ment is a multidimensional construct, then combining its dimensions in a single
measure will result in an incomplete understanding of its antecedents and conse-
quences (Fredricks et al., 2004). A second potential drawback in common practice
of measuring engagement is a lack of attention to measurement error. Because each
of these items only imperfectly represents that student’s level of engagement, anal-
yses using such an index are vulnerable to the negative consequences of measure-
ment error (Bollen, 1989).
It is also important to consider whether engagement can be measured similarly
for all groups of students. Oppositional culture theory argues that many African
American students perform worse in school than their White counterparts because
they are less likely to believe that school is important to their future and therefore
invest less effort in school (Ogbu, 2003). Thus, several scholars suggest that
increasing engagement among disadvantaged minority students is a promising way
to decrease educational disparities (Connell et al., 1994; Finn & Rock, 1997). How-
ever, recent research has suggested that African American students are at least as
engaged in their schools as White students (Ainsworth-Darnell & Downey, 1998).
Actually, if measures of engagement behave differently across racial groups and
these differences are not taken into account in the measurement strategy, compari-
sons of levels of engagement or its effects across groups are invalid. Therefore, it is
important to establish that items measuring engagement operate similarly across
African American and White students before comparing levels of engagement
or assessing whether engagement is more or less important in predicting school
achievement for some groups.
In the sections that follow, we use CFA to examine the measurement of school
engagement in prior research. These analyses reveal poor fits of some of the corre-
sponding CFA models and substantial measurement error for many of the items used
in prior research. On the basis of these analyses and the conceptual underpinnings of
school engagement, we develop a model that exhibits good fit and accounts for

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 5

measurement error. We then examine the model’s invariance across different racial
and ethnic groups. Before we present these analyses, we describe our data and meth-
odological strategy.

Data and Method


Data
We use data from NELS:88, a survey designed and collected by the National
Center for Education Statistics (NCES). NELS:88 surveyed a nationally representa-
tive sample of eighth graders in 1988, with oversamples of minority students. Follow-
up surveys of the same respondents were conducted in 1990, 1992, 1994, and 2000.
We use data from the first follow-up survey (1990), when most students were in the
10th grade. We chose this wave because some of the items differ across the base year
and the follow-up waves, and much of the research on engagement examines its
causes and consequences in high school. (See NCES, 1994, for a more detailed des-
cription of the research design.) The analyses are based on the 12,210 students (9,227
White; 986 African American; 1,224 Latino; and 773 Asian) for whom there was no
missing information on the items in the proposed measurement model.

Method
We use Mplus 3.12 to estimate all models. Because some of the measures are
categorical, the most commonly used estimator for structural equation models, maxi-
mum likelihood, can yield inconsistent parameter estimates, biased standard errors,
and an incorrect w2 (Bollen, 1989). Therefore, we use robust weighted least squares
(labeled weighted least squares with a mean and variance adjusted test statistic
[WLSMV] in Mplus). The WLSMV estimator produces consistent parameter esti-
mates, unbiased standard errors, and a correct w2 test statistic when there are catego-
rical endogenous variables (B. O. Muthén & Satorra, 1995). Note that with WLSMV
the estimation of degrees of freedom is based on the empirical data rather than on
model specification. Accordingly, the degrees of freedom reported in the tables do
not reflect the standard calculation. (See L. K. Muthén & Muthén, 1998, for details
on the calculation.) Because we correct for the categorical nature of the data, the
analyses must be conducted with the raw data instead of the covariance matrix.
We use the first follow-up weight variable to weight the data because of over-
sampling of particular groups. In addition, because of the complex sampling proce-
dure of NELS:88, we also correct the standard errors for clustering within schools
in the single sample models. This correction is not available for the multiple group
analyses we use to evaluate measurement invariance across race and ethnicity.
In evaluating our models, we examine several goodness-of-fit statistics. In large
samples, the power of the w2 statistic to detect very minor deviations from a perfect

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
6 Educational and Psychological Measurement

fit is high (Fan, Thompson, & Wang, 1999). Therefore we also report the root mean
square error of approximation (RMSEA; Steiger & Lind, 1980), the comparative fit
index (CFI; Bentler, 1990), and the nonnormed fit index (NNFI; Tucker & Lewis,
1973). Hu and Bentler (1999) suggest that an RMSEA at or below .06 indicates a
good fit. CFI and NNFI values greater than .90 suggest an acceptable model fit
(Bentler, 1990), though values of .95 or greater are preferred (Hu & Bentler, 1999).

Results From Previous Models

Before developing our measurement model, we examined the results for the
measurement strategies of previous research on engagement that used NELS:88.
Measures of engagement are not typically subjected to CFA. Instead, researchers
often create scales or subscales and report the alpha reliabilities of those scales.
Accordingly, little is known about the performance of particular indicators or how
the measurement strategies of these studies will fit when incorporated in a CFA
analysis. For example, a CFA model might suggest that the dimensionality is incor-
rect. In addition, some indicators could have very low pattern coefficients on the
latent variables they are meant to measure.
Table 1 describes five previous studies that used NELS:88 to operationalize
engagement. Because researchers who have drawn from NELS:88 have focused on
different aspects of engagement, they have used a variety of items. All five of these
previous studies contain a careful discussion of their measurement choices, and
their approaches have a good deal of face validity. Although Fredricks et al.’s
(2004) exhaustive review of previous operationalizations found a strong tendency
to collapse the dimensions of engagement, we found that researchers who use
NELS:88 do not do so as often. Finn and Rock (1997), Finn and Voelkl (1993), and
Lee and Smith (1993) measure separate dimensions of engagement, whereas two
studies by Smerdon (1999, 2002) collapse dimensions.
We found that the previous measurement strategies that follow a unidimensional
measurement approach have poor fits. For example, the RMSEA for Smerdon’s
(2002) model is .178 and the CFI is .805. The strategies that distinguish among
dimensions have better fits than the unidimensional approaches. However, only Finn
and Voelkl’s (1993) model has an adequate fit based on meeting the cutoffs of
all three goodness-of-fit statistics we examined (NNFI = .948, CFI = .902, and
RMSEA = .041). In general, the examination of the goodness-of-fit statistics from
these previous operationalizations strongly suggests that school engagement should
be conceived and measured as a multidimensional concept.
We also examined the pattern coefficients as a way of evaluating whether each
indicator was an acceptable measure of the construct it was intended to measure.
Although all of the pattern coefficients in each model were statistically significant,
many of them were quite small. Squaring the standardized pattern coefficient yields
the squared multiple correlation, or the percentage of the indicator that is explained

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Table 1
Descriptions of Previous Operationalizations of School Engagement Using NELS:88
Study Number and Description of Dimensions w2 df NNFI CFI RMSEA

Finn & Rock (1997) 9: Teacher reports of classroom behavior, 1,716.25∗ 63 .927 .872 .045
hard work, and attendance; student reports
of attendance, preparation, at-risk behavior,
homework, sports, and other extracurricular
activities
Finn & Voelkl (1993) 6: Teacher reports of classroom behavior 1,626.83∗ 72 .948 .902 .041
and attendance; student reports of attendance,
preparation, at-risk behavior, and relationships
with teachers
Lee & Smith (1993) 2: Academic engagement (items related to 901.33∗ 21 .902 .863 .051
preparation) and at-risk behavior (fighting, etc.)
Smerdon (1999) 1: Seven items related to attendance, preparation, 1,827.88∗ 10 .715 .687 .102
and times spent on homework
Smerdon (2002) 1: Nine items including items relating to 8,842.61∗ 17 .759 .805 .178
relationships with teachers, interest in classes,
importance of education, sense of belonging

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Note: NNFI = nonnormed fit index; CFI = comparative fit index; RMSEA = root mean square error of approximation.

p < .001.

7
8 Educational and Psychological Measurement

by the latent variable, which is often considered the reliability of the indicator (Bol-
len, 1989). We found that several of the individual items’ reliabilities were less
than .3, suggesting that measurement error accounts for more than 70% of the vari-
ation in these items. For example, in Finn and Voelkl’s (1993) model, the atten-
dance factor explains only 29% of the variation in frequency of missing school,
and the relationships with teachers factor explains only 12% of the variation in the
respondent’s agreement that there is school spirit. Although the fit of the Lee and
Smith (1993) model is borderline, it is important to underscore a key observation
of this model. Namely, they include the number of hours the student spends on
homework outside of school. Less than 9% of the variation in this item was
explained by the factor. We suspect that there are a number of potential problems
with this item, and we describe these issues in the next section.
To summarize, prior studies incorporating measures of school engagement from
the NELS:88 data, although compelling in terms of face validity, reveal problematic
measurement error and, in several cases, poor model fit when CFA is conducted.
Next, we use CFA to develop a modified measurement model of school engagement,
while also attending to the conceptual underpinnings of this important concept.

Proposed Measurement Model

In deriving our measurement model, we relied on the previous studies, particu-


larly on the Finn and Voelkl model (1993), because its fit was better than the other
previous strategies, and it was one of the most comprehensive in terms of including
potential items and dimensions of engagement. However, we omitted some of their
measures for a combination of theoretical and empirical reasons. For example, we
omitted the item pertaining to whether there is school spirit. This and other items
we omitted had very small pattern coefficients in their model. We also added other
items that were not included in the Finn and Voelkl model but are often incorpo-
rated in other researchers’ measures of engagement. We suggest that with NELS:88
data, it is possible to measure seven aspects of school engagement. After examining
the fit of a first-order model, in which all seven dimensions of engagement were
allowed to intercorrelate, we modeled these seven dimensions as reflective indica-
tors of second-order psychological and behavioral dimensions of engagement. The
appendix reports the numerical label, wording, and descriptive statistics for each
item in our model.

Behavioral Engagement
Our model captures four dimensions of behavioral engagement. First, attendance
is measured by self-reported frequencies of class skipping, tardiness, and the fre-
quency of parents receiving a warning about the student’s attendance (coded so that
higher values indicate better attendance). Second, at-risk behavior is measured by

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 9

how many times the student got into a fight during the first half of the school year,
how many times he or she got into trouble for not following rules, whether other stu-
dents see him or her as a troublemaker, and the frequency of parents receiving a
warning about the student’s behavior. Third, preparation is measured by how often
the student comes to class with homework completed, with a pen or pencil, and with
books. Fourth, teacher perceptions of student effort are measured by teachers’ eva-
luations of how often the student completes his or her homework, whether the
student usually works hard, and how often the student is attentive in class.

Psychological Engagement
Our model captures three aspects of psychological engagement. The first two
components pertain to the value of school and education to the student. First, aca-
demic interest is measured with the student’s agreement that his or her classes are
interesting and challenging, whether the student gets a feeling of satisfaction from
doing what he or she is supposed to do in class, and how often the student tries as
hard as he or she can. Second, extrinsic motivation is measured by how important
grades are to the respondent and how strongly the student agrees that he or she goes
to school because education is important to getting a job later in life. The third aspect
of psychological engagement, positive relationships with teachers, is measured by
whether the student feels put down by his or her teachers, whether teachers listen to
the student, whether teachers praise the student’s effort, and whether the student goes
to school because teachers care about the student and expect him or her to do well.

Time Spent on Homework


The amount of time the student spends on homework is an important aspect of
school preparation and effort. However, there are several potential problems with
including time spent on homework as an indicator of preparation. One issue is that
many factors other than engagement could be driving time spent on homework. For
example, the amount of homework a teacher assigns a student will influence how
much time the student will spend on homework, and the amount of homework
assigned is likely to vary substantially across schools and tracks. Indeed, in auxili-
ary analyses we found that more than 20% of the variance in hours spent on home-
work was explained by track placement and the average amount of time that other
students in the school spend on homework. Another potential problem with this
item is that two students who are equally engaged, but who have different apti-
tudes, will likely spend different amounts of time on homework. In preliminary
analyses, we found that when we allowed hours spent on homework to load on the
preparation latent variable, the measure had a small pattern coefficient, a finding
that is consistent with the limitations of the measure described above. Given that

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
10 Educational and Psychological Measurement

Table 2
Goodness-of-Fit Statistics for First- and Second-Order Single Sample Models
Model w2 df NNFI CFI RMSEA

1: First-order 1,659.85∗ 82 .956 .913 .040


2: Second-order, one second-order factor 3,203.04∗ 87 .917 .829 .054
3: Second-order, two second-order factors 1,891.13∗ 86 .952 .901 .041
4: Second-order, two second-order factors alternativea 1,732.48∗ 86 .956 .909 .040

Note: NNFI = nonnormed fit index; CFI = comparative fit index; RMSEA = root mean square error
of approximation.
a. The disturbances of academic interest and relationships with teachers first-order latent constructs were
free to correlate.

p < .001.

this item is often used in other research, we allow it to intercorrelate with the
second-order latent variables in our model.

Results of the Single Sample Analysis

Table 2 reports goodness-of-fit statistics for the first-order measurement model


and three second-order measurement models. The w2 and degrees of freedom pro-
duced with WLSMV estimation should not be used in nested model comparisons
(L. K. Muthén & Muthén, 1998). Thus, we note relevant corrected differences in the
w2 values and degrees of freedom across nested models in the text. Although the w2
value for the first-order model (Model 1) is statistically significant, the goodness-of-
fit indexes generally indicate a good fit. Therefore, the proposed factorial structure
is consistent with the data. The correlations among the latent variables are moderate
to high, suggesting that a second-order model may be reasonable.
To assess the possibility that all of the first-order dimensions of engagement
reflect a single second-order engagement factor, we first estimated a second-order
model with just one second-order engagement factor (Model 2). As expected, given
our argument that researchers should distinguish between psychological and beha-
vioral engagement, the fit of this model is significantly worse than the first-order
model in which it is nested (corrected w2 difference = 1638.22, df = 16, change in
CFI = −.084). Guided by the theoretical distinction between behavioral and psy-
chological engagement, we estimated a second-order model (Model 3) in which
behavioral engagement drives attendance, at-risk behavior, preparation, and teacher
perceptions of effort, whereas psychological engagement influences academic inter-
est, extrinsic motivation, and relationships with teachers. The fit of this model is not
poor. However, the nested w2 test between this model and the first-order model is
statistically significant (corrected w2 difference = 449.82, df = 15), and the other

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 11

indexes of fit are slightly worse than in Model 1. Consequently, we considered


whether any of the errors for the first-order factors should be correlated. Because
relationships with teachers and academic interest are likely to influence one another,
they likely are associated with one another above and beyond their mutual depen-
dence on psychological engagement. Therefore, we reestimated the second-order
model, allowing these errors to covary (Model 4). Freeing this parameter improved
the fit of the model. The nested w2 test between this model and Model 3 is signifi-
cant (w2 = 151.70, df = 1), and the other indicators of fit improve slightly.
In addition to the good fit of the models that specify separate behavioral and psy-
chological engagement dimensions, the correlation between behavioral and psycho-
logical engagement suggests that these are distinct dimensions. Specifically,
because the correlation between behavioral and psychological engagement is less
than 1 minus 2 times the standard error of the correlation (.75, SE = .023) in Model
4, we can conclude that there is discriminant validity between these two dimensions
(Anderson & Gerbing, 1988). These results provide empirical support for the theo-
retical distinction between psychological and behavioral engagement. At the same
time, the results are consistent with the idea that behavioral and psychological
engagements are higher order constructs that explain the covariances among the
seven first-order latent variables. Also noteworthy is that the correlations between
time spent on homework and behavioral and psychological engagement are .36 and
.38, respectively. The relatively small correlations underscore the idea that time
spent on homework is related to engagement, but more than likely not a useful mea-
sure of engagement. In an auxiliary analysis, we allowed time spent on homework
to load on behavioral engagement. The fit of this model was poor.
Given its good fit, we designated Model 4 as our preferred model. Thus, the
remaining reported results come from this model. Tables 3 and 4 display the stan-
dardized pattern and structure coefficients for the first-order and second-order
dimensions of engagement, respectively. All of the first- and second-order factor
pattern coefficients are statistically significant, providing evidence of convergent
validity (Anderson & Gerbing, 1988). In addition, the structure coefficients display
the expected pattern. Specifically, each item has a higher structure coefficient for
the latent variable on which its pattern coefficient is freely estimated than for latent
variables for which its pattern coefficient is constrained to zero (Graham, Guthrie,
& Thompson, 2003).
Bagozzi and Yi (1988) suggest that for CFA models, composite reliabilities
greater than .60 are acceptable. The composite reliability coefficients of each first-
order latent variable meet this threshold (.66, .77, .75, .78, .62, .76, and .88 for
attendance, at-risk behavior, preparation, teacher perceptions of student effort, aca-
demic interest, extrinsic motivation, and relationships with teachers, respectively).
However, only the composite reliability coefficient of teacher perceptions could be
considered high. These results highlight the fact that it is critical to correct for

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
12 Educational and Psychological Measurement

Table 3
Standardized Pattern and Structure Coefficients
for First-Order Latent Variables
B1 B2 B3 B4 P1 P2 P3

Item p rS p rS p rS p rS p rS p rS p rS

1 .67 .67 .00 −.45 .00 .32 .00 .34 .00 .30 .00 .37 .00 .23
2 .55 .55 .00 −.37 .00 .27 .00 .28 .00 .25 .00 .31 .00 .19
3 .67 .67 .00 −.45 .00 .33 .00 .34 .00 .30 .00 .37 .00 .23
4 .00 −.47 .70 .70 .00 −.40 .00 −.42 .00 −.37 .00 −.46 .00 −.29
5 .00 −.44 .65 .65 .00 −.37 .00 −.39 .00 −.34 .00 −.43 .00 −.26
6 .00 −.40 .60 .60 .00 −.34 .00 −.35 .00 −.31 .00 −.39 .00 −.24
7 .00 −.50 .74 .74 .00 −.42 .00 −.44 .00 −.40 .00 −.49 .00 −.30
8 .00 .36 .00 −.42 .73 .73 .00 .32 .00 .28 .00 .35 .00 .22
9 .00 .31 .00 −.37 .64 .64 .00 .28 .00 .24 .00 .30 .00 .19
10 .00 .36 .00 −.43 .74 .74 .00 .32 .00 .28 .00 .35 .00 .22
11 .00 .44 .00 −.52 .00 .38 .87 .87 .00 .35 .00 .43 .00 .27
12 .00 .40 .00 −.47 .00 .34 .78 .78 .00 .31 .00 .39 .00 .24
13 .00 .43 .00 −.51 .00 .37 .86 .86 .00 .34 .00 .42 .00 .26
14 .00 .36 .00 −.42 .00 .31 .00 .32 .81 .81 .00 .63 .00 .55
15 .00 .36 .00 −.42 .00 .31 .00 .32 .80 .80 .00 .62 .00 .54
16 .00 .26 .00 −.30 .00 .22 .00 .27 .57 .57 .00 .44 .00 .39
17 .00 .43 .00 −.51 .00 .37 .00 .39 .00 .61 .78 .78 .00 .47
18 .00 .30 .00 −.36 .00 .26 .00 .27 .00 .42 .55 .55 .00 .33
19 .00 .25 .00 −.29 .00 .21 .00 .22 .00 .48 .00 .43 .71 .71
20 .00 .20 .00 −.23 .00 .17 .00 .18 .00 .39 .00 .34 .57 .57
21 .00 .27 .00 −.31 .00 .23 .00 .24 .00 .52 .00 .46 .77 .77
22 .00 .20 .00 −.24 .00 .17 .00 .18 .00 .40 .00 .35 .60 .60

Note: B1 = attendance; B2 = at-risk behavior; B3 = preparation; B4 = teacher perceptions; P1 = aca-


demic interest; P2 = extrinsic motivation; P3 = relationships with teachers; p = pattern coefficient;
rS = structure coefficient.

measurement error in analyses that use these NELS:88 or similar measures,


whether by utilizing structural equation modeling or some other method.
It is also instructive to examine the reliabilities of individual items, which can
be obtained by squaring the standardized pattern coefficients (Bollen, 1989).
Although half of the items have reliabilities of .50 or higher, there is still consider-
able measurement error in many of the NELS:88 items. The teacher perception
indicators have the most consistently high item reliabilities (.61 for student works
hard, .74 for student is attentive in class, and .76 for student completes homework).
This is likely because these items are averages of two teacher ratings and therefore
are more reliable. In contrast, all three indicators of attendance have reliabilities
under .50 (.44 for skipping class, .30 for being late to class, and .45 for parents
receiving a warning about attendance). All of the other latent variables have at least

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 13

Table 4
Standardized Pattern and Structure Coefficients
for Second-Order Latent Variables
Behavioral Psychological

Dimension p rS p rS

B1 .77 .77 .00 .57


B2 −.89 −.89 .00 −.67
B3 .65 .65 .00 .48
B4 .67 .67 .00 .50
P1 .00 .60 .79 .79
P2 .00 .74 .98 .98
P3 .00 .46 .61 .61

Note: B1 = attendance; B2 = at-risk behavior; B3 = preparation; B4 = teacher perceptions; P1 = aca-


demic interest; P2 = extrinsic motivation; P3 = relationships with teachers; p = pattern coefficient;
rS = structure coefficient.

some indicators for which more than half of the variation is explained by the latent
variable. The fact that many of the reliabilities for the individual items are some-
what low underscores the need for research using these items to apply multiple
measures and correct for measurement error.
On the whole, the results of this CFA model are mixed. On one hand, the global
fit of the model is good and the discriminant validity is high, strongly supporting
the call to measure engagement as a multidimensional rather than unidimensional
concept. On the other hand, although the composite reliability coefficients for the
first-order latent variables are high enough to be acceptable, most are not particu-
larly high, and many of the individual items exhibit a good deal of measurement
error. These latter results suggest that future surveys that include engagement items
could benefit from better measures. Until improved measures are developed,
researchers using engagement items from NELS:88 (and the more recent Educa-
tional Longitudinal Study, which includes most of the same items) will benefit
from using structural equation models to estimate the causes and consequences of
engagement because such models correct for measurement error.

Assessing Measurement Invariance Across Race and Ethnicity

We next present a series of models that are designed to assess the invariance of
our measures of engagement across White, African American, Latino, and Asian
students. The fact that many of the measures are ordered categories adds a layer of
complexity to the testing of invariance across groups. As previously noted, treating
the indicators as continuous can yield inconsistent parameter estimates, biased stan-
dard errors, and an incorrect w2. These problems stem in part from the fact that

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
14 Educational and Psychological Measurement

underlying the ordered categorical measures are continuous latent variables, often
referred to as y∗ variables. Accordingly, analyzing the covariance matrix of the
observed measures is not the same thing as analyzing the covariances of the under-
lying y∗ variables (Bollen, 1989). Yet analyzing the polychoric correlation matrix
(correlations among the underlying y∗ variables) is a not a suitable approach for
multiple group analysis because this approach standardizes the variance of the
underlying y∗ variables, forcing variances to be equal across groups. Fortunately,
the conditional probability formulation does not mask variation created by group
variation in the measurement parameters (B. O. Muthén & Asparouhov, 2002). For
ordered categorical indicators, measurement invariance holds if, given the factor
scores, the conditional probabilities for the ordered indicator variables are not
dependent on the group (Millsap & Yun-Tein, 2004). This property is assessed by
examining the invariance of the pattern coefficients and the thresholds.
We follow Millsap and Yun-Tein’s (2004) guidelines in identifying our models.
The baseline model allows most pattern coefficients, thresholds, and intercepts
to vary between groups. The following restrictions are necessary to identify the
model. The pattern coefficients of the scaling indicator must be set to 1 for all
groups; for continuous scaling indicators, the intercepts of the scaling indicator are
set to 0; for categorical scaling indicators, two thresholds must be forced to be
invariant across all groups. The first threshold of each additional categorical indica-
tor is constrained to be equal across groups. Residual variances are constrained to
1 in the first group and are free in the other groups, and the means of the second-
order latent variables are constrained to 0 in the first group.
The parameters of concern in evaluating measurement invariance are pattern
coefficients, intercepts, and thresholds (Byrne et al., 1989; Millsap & Yun-Tein,
2004). Below, we examine several invariance hypotheses. The first hypothesis is
that the same model form holds across groups. In other words, each group has the
same number of dimensions and patterns of fixed and free parameters (Bollen,
1989). If the fit of this model is acceptable, higher levels of invariance may be
examined. The second step, then, is to assess whether the pattern coefficients for the
first-order latent variables are invariant across groups. Assuming that this condition
holds, one can test the hypotheses that the thresholds and intercepts for the items
measuring the first-order factors are invariant. If the measurement parameters for
the first-order component of the model are invariant, we can then assess whether the
pattern coefficients and intercepts from the second-order component are invariant.
In multiple group analyses with large samples, there are not yet conventionally
accepted rules for rejecting a more constrained model over a less restricted model.
With very large samples, the w2 nested model comparison is not always appropriate
because of the same excess power issue encountered when evaluating the fit of a single
model (Marsh, 1994). Because even the most unconstrained model usually has a statis-
tically significant w2 test statistic, it would be inconsistent to fail to reject a model
based on goodness-of-fit indexes such as the RMSEA and the CFI and then reject a

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 15

Table 5
Goodness-of-Fit Statistics for Models Testing
Invariance Across Race and Ethnicity
Model w2 df RMSEA NNFI CFI NCI Gamma hat

1 796.762∗ 138 .040 .960 .955 .973 .995


2 746.027∗ 139 .038 .963 .959 .975 .996
3 738.884∗ 139 .038 .964 .959 .976 .996
4 731.634∗ 143 .037 .965 .960 .976 .996
5 725.301∗ 143 .037 .966 .961 .976 .996
6 697.803∗ 143 .036 .967 .962 .978 .996
7 683.297∗ 142 .035 .968 .963 .980 .997

Note: RMSEA = root mean square error of approximation; NNFI = nonnormed fit index; CFI = com-
parative fit index; NCI = McDonald’s noncentrality index. Model 1 = equality of overall structure;
Model 2 = Model 1 plus first-order pattern coefficients invariant; Model 3 = Model 2, except pattern
coefficient for parent warning about attendance free to vary for Black students; Model 4 = Model 3
plus thresholds invariant; Model 5 = Model 4 plus intercepts invariant; Model 6 = Model 5 plus
second-order pattern coefficients invariant; Model 7 = Model 6 plus intercepts of second-order indica-
tors invariant.

p < .001.

more constrained model because the nested model test is statistically significant, even
if the other indexes of fit do not change at all. However, there are no widely accepted
standards for comparing goodness-of-fit indexes for a series of nested models. Based
on a simulation study of multivariate normal data, Cheung and Rensvold (2002)
recommend examining changes in the CFI, Gamma hat, and McDonald’s noncentral-
ity index (NCI). In comparing a more restricted to a less restricted model, changes
greater than −.01 for the CFI, −.001 for Gamma hat, and −.02 for NCI suggest that
one should reject the null hypothesis of invariance. Although it is unclear whether
Cheung and Rensvold’s results will hold for categorical data, for lack of any other
benchmark, we examine these fit indexes as well as the w2 and the RMSEA. Table 5
summarizes the fit indexes of models testing different levels of invariance. In Table
6 we present the corrected differences in the w2 values and degrees of freedom across
nested models. In addition to examining several goodness-of-fit indices, we exam-
ined 95% confidence intervals for each parameter to assess whether there was over-
lap in the confidence intervals among the groups. With one exception, which we note
below, we found that the confidence intervals for the pattern coefficients, thresholds,
and intercepts overlapped among the groups.

Testing for Configural Invariance


To assess the hypothesis that a common factor structure describes all four racial
and ethnic groups, we estimated our proposed measurement model with no equality

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
16 Educational and Psychological Measurement

Table 6
Changes in Goodness-of-Fit Statistics Across Levels of Invariance
Model Comparison Dw2 df p value CFI NCI Gamma hat

2 versus 1 64.126 25 <.001 .004 .002 .000


3 versus 2 26.28 1 <.001 .000 .000 .000
3 versus 1 52.528 25 <.001 .004 .002 .000
4 versus 3 94.398 35 <.001 .001 .001 .000
5 versus 4 23.481 13 .036 .001 .000 .000
6 versus 5 33.736 13 .001 .001 .001 .000
7 versus 6 24.957 10 .005 .001 .002 .000

Note: RMSEA = root-mean-square error of approximation; NNFI = nonnormed fit index; CFI = com-
parative fit index; NCI = McDonald’s noncentrality index. w2 and df are weighted-least-squares cor-
rected. Model 1 = equality of overall structure; Model 2 = Model 1 plus first-order pattern coefficients
invariant; Model 3 = Model 2, except pattern coefficient for parent warning about attendance free to
vary for Black students; Model 4 = Model 3 plus thresholds invariant; Model 5 = Model 4 plus inter-
cepts invariant; Model 6 = Model 5 plus second-order pattern coefficients invariant; Model 7 = Model
6 plus intercepts of second-order indicators invariant.

constraints on the free parameters across groups. As Table 5 shows, Model 1 yields
a statistically significant w2 value (796.762, df = 138). However, the other indexes
of model fit suggest a good fit (RMSEA = .040; NNFI = .96; CFI = .96), sug-
gesting that it is acceptable to consider the form of the measurement model invar-
iant across ethnic groups.

Testing for Metric Invariance for the First-Order Items


Model 2 constrains the first-order pattern coefficients to be equal across groups.
If this invariance condition is supported, it indicates that the latent variables have
the same effect on each of their respective indicators across groups. As Table 6
shows, the corrected w2 for the nested model comparison between this model and
Model 1 is significant (64.13, df = 25). However, the CFI, NCI, and Gamma hat
do not decline by more than the cutoff values suggested by Cheung and Rensvold
(2002). In fact, the CFI and NCI are slightly higher for Model 2, probably reflecting
the added parsimony of the pattern coefficient invariance assumption. Only paren-
tal warning about poor behavior has a confidence interval that does not overlap
among the groups. The pattern coefficient for African American students is larger
than it is for students of other races and ethnicities. Accordingly, we allow this pat-
tern coefficient to vary in subsequent models. Substantively, these results indicate
that for the most part, the latent engagement variables have the same effect on their
respective observed indicators regardless of a student’s race.

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 17

Testing for Threshold Invariance for the First-Order Items


Model 4 adds threshold invariance for the categorical indicators to Model 3. If
we fail to reject the null hypothesis that thresholds are invariant across groups, then
we can assume that the latent continuous indicators (y∗ ) underlying the categorical
observed indicators (y) have the same cut points, regardless of the race or ethnicity
of a student. As Table 6 shows, the nested w2 comparison between Models 3 and
4 is significant (94.40, df = 35). However, the CFI, NCI, and Gamma hat do not
decline. In addition, for each threshold, the 95% confidence intervals overlapped
across the groups. Thus, most of the evidence suggests that the thresholds do not
vary across groups. The invariance of pattern coefficients and thresholds implies
that for a given value of the latent variable, we expect the same latent y∗ response
scores across groups (Millsap & Yun-Tein, 2004).

Testing for Intercept Invariance for the First-Order Items


In Model 5 we test whether the intercepts for the continuous items can be con-
sidered invariant across ethnic groups. If the intercepts are invariant across groups,
then at a given value of the latent variable, the expected value of the item is the
same across groups. The w2 test is insignificant at the .01 level (p = .036) and the
other indexes of fit do not decrease, suggesting that the intercepts of the continuous
variables are invariant across groups. In addition, for each intercept, the 95% confi-
dence intervals overlapped across the groups.

Second-Order Invariance
Models 6 and 7 test the invariance of the second-order pattern coefficients and
intercepts, respectively. Both tests of invariance are statistically significant at the
.01 level, but not at the .001 level. The CFI, NCI, and Gamma hat do not decline. In
addition, the 95% confidence intervals for both sets of parameters overlap among
the groups. Therefore, taken as a whole, the evidence suggests that the second-order
component of the measurement model is invariant across racial and ethnic groups.
The invariance of the second-order pattern coefficients suggests that each dimension
of behavioral and psychological engagement is equally salient across each racial
and ethnic group. Accordingly, future research could use this measurement model
to investigate whether the influences of behavioral and psychological engagement
on outcomes such as academic achievement are the same across race and ethnicity.

Conclusion
This study contributes to literature on the measurement of school engagement in
three main ways. First, we apply CFA to evaluate the performance of previous
measures of school engagement that used NELS:88 data. The results revealed poor

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
18 Educational and Psychological Measurement

fits of past strategies that collapsed indicators of engagement into too few dimen-
sions. The analyses also revealed that some of the indicators included in these pre-
vious approaches did not perform as well as measures of their intended dimensions.
Second, drawing from theory on school engagement and using the results from
the previous models, especially Finn and Voelkl (1993), we proposed an alternative
measurement model using NELS:88 data. The results of our proposed measurement
model strongly suggest that engagement is a multidimensional concept. Higher
order constructs representing behavioral and psychological engagement explain
the covariances among several first-order latent variables. Future research should
address whether the different dimensions have different antecedents and effects,
information that would contribute to both educational theory and policy. In addi-
tion, the results demonstrated that there is a good deal of measurement error in the
NELS:88 items and in some of the composites that can be developed from these
items. Consequently, research that examines the relationships between school engage-
ment and other constructs should correct for measurement error.
Third, this study tested the measurement invariance of school engagement
across racial and ethnic groups. Much has been made of the importance of school
engagement in deterring students, particularly disadvantaged students, from nega-
tive educational outcomes, such as dropping out of school (Connell et al., 1994;
Finn & Rock, 1997). However, little research thus far has actually compared the
effects of engagement across students of different racial and ethnic backgrounds.
Rigorous comparisons of the effect of engagement across race and ethnicity can
not be made until measurement invariance has been established. Although the cur-
rent study does not speak to the causal relationship between engagement and out-
comes, the results largely indicate measurement invariance of school engagement
across racial and ethnic groups. Accordingly, future research could use the same
measurement model of engagement across these racial and ethnic groups to exam-
ine whether there are racial and ethnic differences in the relationships between
school engagement and its causes and consequences.
School engagement and related concepts figure prominently into current educa-
tional theory and policy prescriptions. A growing body of research has observed that
higher school engagement is associated with better educational outcomes, such as
higher academic achievement and school retention (Finn et al., 1995; Ogbu, 2003;
Smerdon, 1999). Because engagement can be influenced through school or classroom
practice (Davis & Jordan, 1994; Finn & Voelkl, 1993; Lee & Smith, 1993; Marks,
2000), increasing policy attention has been directed toward efforts to foster school
engagement, particularly among disadvantaged minority youth (National Research
Council & Institute of Medicine, 2004). Given the prominence of the concept of
school engagement in current discussions, it is critical that our studies of the causes
and consequences of school engagement use the best possible measurement strategies
to ensure valid inferences. The present study demonstrates that CFA is a powerful tool
for informing researchers about the quality of the measures of engagement.

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Appendix
Item Labels, Wording, and Descriptive Statistics for Proposed Measurement
Model Variable Label NELS Label Dimension of Engagement M SD

Attendance

1 Skip f1s10b How many times did you cut or skip classes in the first 1.27 2.55
half of the current school year? (0–11; multiplied by 1)
2 Late f1s10a How many times were you late for school in the first half 2.79 3.13
of the current school year? (0–11; multiplied by 1)
3 Parent warning: attendance f1s107a In the first half of the current school year, how often did .24 .51
your parents receive a warning about your attendance?
(0 = never to 2 = more than twice; reverse-coded)

At-risk behavior

4 Troublemaker f1s67f Agreement that other students see you as a trouble maker? 1.32 .55
(1 = not at all to 3 = very much)
5 Trouble f1s10c How many times did you get in trouble for not following 1.24 2.17
rules in the first half of the current school year? (0–11)
6 Fights f1s9d In the first half of the school year, how many times did you .18 .45
get into a physical fight at school? (0 = never to 2 = more
than twice)
7 Parent warning: behavior f1s107c In the first half of the current school year, how often did .17 .44
your parents receive a warning about your behavior?
(0 = never to 2 = more than twice)

Preparation

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
8 Brings homework f1s40c How often do you go to class without your homework done? 2.97 .73
(1 = never to 4 = usually; reverse-coded)
9 Brings pencil f1s40a How often do you go to class without pencil or paper? 3.34 .73
(1 = never to 4 = usually; reverse-coded)
10 Brings books f1s40b How often do you go to class without books? 2.51 .60
(1 = never to 4 = usually; reverse-coded; usually and often

19
collapsed because of small percentage of usually)
(continued)
20
Appendix (continued)
Model Variable Label NELS Label Dimension of Engagement M SD
a
Teacher perceptions of student effort

11 Completes homework f1t1_15, f1t5_15 How often does the student do his or her homework? 4.00 .87
(1 = never to 5 = all of the time)
12 Works hard f1t1_2, f1t5_2 Does the student usually work hard? (1 = yes, 0 = no) .66 .42
13 Attentive in class f1t1_18, f1t5_18 How often is the student attentive in class? (1 = never 3.92 .76
to 5 = all of the time)

Academic interest

14 Classes interesting f1s66a When you compare your first year of high school to 2.76 .67
the year before that, do agree that the subjects you’re
taking are interesting and challenging? (1 = strongly
disagree to 4 = strongly agree)
15 Feeling of satisfaction f1s66b When you compare your first year of high school to the 1.88 .57
year before that, do agree that you get a feeling of
satisfaction from doing what you’re supposed to in class?
(1 = strongly disagree to 4 = strongly agree)
16 Try hard f1s27a– f1s27d In your math, English, history, and science classes, how 4.15 .93
often do you try as hard as you can? (1 = never
to 5 = almost every day; average of the four items)

Extrinsic motivation

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
17 Grades important f1s38a How important are good grades to you? (1 = not 2.43 .69
important/somewhat important to 3 = very important)
18 Education important f1s66d Do you agree that you go to school because education is 2.60 .55
important for getting a job later on? (1 = strongly
disagree to 4 = strongly agree; strongly disagree and
disagree collapsed because of small percentage of
strongly disagree)
Student–Teacher relationships

19 Teachers listen f1s71 Do you agree that most of your teachers really listen to 2.76 .67
what you have to say? (1 = strongly agree to 4 = strongly
disagree; reverse-coded)
20 Teachers do not put down f1s7j In class I often feel put down by my teachers. (1 = strongly 1.95 .68
agree to 4 = strongly disagree)
21 Teachers care f1s66g Do you agree that you go to school because your teachers 2.83 .76
care about you and expect you to do well in school?
(1 = strongly agree to 4 = strongly disagree;
reverse-coded)
22 Teachers praise f1s7i When I work hard on schoolwork, my teachers praise 2.59 .73
my effort. (1 = strongly agree to 4 = strongly disagree;
reverse-coded)

Homework

23 Hours of homework f1s36a2 Total time spent on homework out of school each week 4.55 4.51
(0–16 hours)

Note: N = 12,210.
a. Values are averages of both teachers’ responses, unless data for only one teacher was available.

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
21
22 Educational and Psychological Measurement

References
Ainsworth-Darnell, J. W., & Downey, D. B. (1998). Assessing the oppositional culture explanation for
racial/ethnic differences in school performance. American Sociological Review, 63, 536-553.
Anderson, J. C., & Gerbing, D. W. (1988). Structural equation modeling in practice: A review and
recommended two-step approach. Psychological Bulletin, 103, 411-423.
Bagozzi, R. P., & Yi, Y. (1988). On the evaluation of structural equation models. Journal of the
Academy of Marketing Science, 16, 74-94.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,
238-246.
Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley.
Bowen, G. L., & Richman, J. M. (1995). The school success profile. Chapel Hill: University of North
Carolina at Chapel Hill.
Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance
and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105,
456-466.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling: A Multidisciplinary Journal, 9, 233-255.
Connell, J. P., Halpern-Felsher, B. G., Clifford, E., Crichlow, W., & Usinger, P. (1995). Hanging
in there: Behavioral, psychological, and contextual factors affecting whether African American
students stay in school. Journal of Adolescent Research, 10, 41-63.
Connell, J. P., Spencer, M. B., & Aber, J. L. (1994). Educational risk and resilience among African-
American youth: Context, self, action, and outcomes in school. Child Development, 65, 493-506.
Davis, J., & Jordan, W. (1994). The effects of school context, structure, and experiences on African
American males in middle and high school. Journal of Negro Education, 63, 570-587.
Fan, X., Thompson, B., & Wang, L. (1999). The effects of sample size, estimation methods, and model
specification on SEM fit indexes. Structural Equation Modeling: A Multidisciplinary Journal, 6,
56-83.
Finn, J. D. (1989). Withdrawing from school. Review of Educational Research, 59, 117-142.
Finn, J. D. (1993). School engagement and students at risk. Washington, DC: National Center for Educa-
tion Statistics.
Finn, J. D., Pannozzo, G. M., & Voelkl, K. (1995). Disruptive and inattentive-withdrawn behavior and
achievement among fourth graders. Elementary School Journal, 95,421-434.
Finn, J. D., & Rock, D. A. (1997). Academic success among students at risk for school failure. Journal
of Applied Psychology, 82, 221-234.
Finn, J. D., & Voelkl, K. E. (1993). School characteristics related to student engagement. Journal
of Negro Education, 62, 249-268.
Fredricks, J. A., Blumenfeld, P. C, & Paris, A. H. (2004). School engagement: Potential of the concept,
state of the evidence. Review of Educational Research, 74, 59-109.
Graham, J. M., Guthrie, A. C., & Thompson, B. (2003). Consequences of not interpreting structure
coefficients in published CFA research: A reminder. Structural Equation Modeling, 10, 142-153.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conven-
tional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal,
6, 1-55.
Lee, V. E., & Smith, J. B. (1993). Effects of school restructuring on the achievement and engagement of
middle school students. Sociology of Education, 66, 164-187.
Marks, H. M. (2000). Student engagement in instructional activity: Patterns in elementary, middle and
high school years. American Educational Research Journal, 37, 153-184.
Marsh, H. W. (1994). Confirmatory factor analysis models of factorial invariance: A multifaceted
approach. Structural Equation Modeling: A Multidisciplinary Journal, 1, 5-34.

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
Glanville, Wildhagen / Measuring School Engagement 23

Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures.
Multivariate Behavioral Research, 39, 479-515.
Muthén, B. O., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-
group and growth modeling in Mplus (Mplus Web Notes: No. 4). Los Angeles: Authors.
Muthén, B. O., & Satorra, A. (1995). Technical aspects of Muthén’s Liscomp approach to estimation of
latent variable relations with a comprehensive measurement model. Psychometrika, 60, 489-503.
Muthén, L. K., & Muthén, B. O. (1998). Mplus user’s guide. Los Angeles: Authors.
National Research Council & Institute of Medicine. (2004). Engaging schools: Fostering high school
students’ motivation to learn. Washington, DC: National Academy Press.
Newman, F. M., Wehlage, G. G., & Lamborn, S. D. (1992). The significance and sources of student
engagement. In F. M. Newman (Ed.), Student engagement and achievement in American secondary
schools (pp. 11-39). New York: Teachers College Press.
Ogbu, J. U. (2003). Black American students in an affluent suburb: A study of academic disengagement.
Mahwah, NJ: Lawrence Erlbaum.
Smerdon, B. A. (1999). Engagement and achievement: Differences between African-American and
White high school students. Research in Sociology of Education and Socialization, 12, 103-134.
Smerdon, B. A. (2002). Students’ perceptions of membership in their high schools. Sociology of Educa-
tion, 75, 287-305.
Steiger, J. H., & Lind, J. M. (1980, May). Statistically based tests for the number of common factors.
Paper presented at the annual meeting of the Psychometric Society, Iowa City, IA.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis.
Psychometrika, 38, 1010.
Wellborn, J. G., & Connell, J. P. (1987). Manual for the Rochester assessment package for schools.
Rochester, NY: University of Rochester.

Downloaded from epm.sagepub.com at The University of Iowa Libraries on January 12, 2016
View publication stats

Das könnte Ihnen auch gefallen