Sie sind auf Seite 1von 45

Journal of Sport & Exercise Psychology, 2010, 32, 438-482

2010 Human Kinetics, Inc.

Introducing a Short Version of the


Physical Self Description Questionnaire:
New Strategies, Short-Form Evaluative
Criteria, and Applications
of Factor Analyses
Herbert W. Marsh,1 Andrew J. Martin,2 and Susan Jackson3
Oxford University; 2University of Sydney; 3University of Queensland

Based on the Physical Self Description Questionnaire (PSDQ) normative archive


(n = 1,607 Australian adolescents), 40 of 70 items were selected to construct a
new short form (PSDQ-S). The PSDQ-S was evaluated in a new cross-validation
sample of 708 Australian adolescents and four additional samples: 349 Australian
elite-athlete adolescents, 986 Spanish adolescents, 395 Israeli university students,
760 Australian older adults. Across these six groups, the 11 PSDQ-S factors had
consistently high reliabilities and invariant factor structures. Study 1, using a
missing-by-design variation of multigroup invariance tests, showed invariance
across 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial
invariance over a 1-year interval (testretest correlations .57.90; Mdn = .77), and
good convergent and discriminant validity in relation to time. Study 3 showed good
and nearly identical support for convergent and discriminant validity of PSDQ
and PSDQ-S responses in relation to two other physical self-concept instruments.
Keywords: short forms, physical self-concept, multitrait-multimethod analyses,
confirmatory factor analysis

The overarching aim of this article is to describe a construct validity approach to


the development and evaluation of a new, short version of the widely used Physical
Self Description Questionnaire (PSDQ). Acknowledged as a leading multidimensional physical self-concept instrument for adolescents (e.g., Byrne, 1996), the
PSDQ is sometimes perceived as being too long for applied research70 items
that define 11 self-concept factors (see Appendixes A and E). Such length may
be acceptable when only the PSDQ is administered, particularly when used for
individual diagnostic purposes; however, it might be an impediment when used in
Herbert W. Marsh is with the Department of Education, Oxford University, Oxford, U.K. Andrew J.
Martin is with the Faculty of Education and Social Work, University of Sydney, Sydney, NSW, Australia. Susan Jackson is with the School of Human Movement Studies, University of Queensland, St.
Lucia, QLD, Australia.

438

PSDQ Short Form 439

a battery of other instruments or administered on multiple occasions. To address


this concern, we have developed a short version of the PSDQ (PSDQ-S) that balanced brevity and psychometric quality in relation to established guidelines for
evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005;
Smith, McCarthy, & Anderson, 2000) and the construct validity approach that is
the basis of PSDQ research.
Even though there is a long history of short-form development, critics (e.g.,
Levy, 1968; Smith & McCarthy, 1995; Smith et al., 2000) have argued that their
use is rarely justified and that actual practice typically falls far short of ideal or even
reasonable standards (e.g., Marsh et al., 2005; Smith et al., 2000). Smith et al. (also
see Stanton, Sinar, Balzer, & Smith, 2002) argued that the fundamental problem
in short-form development is to assume that psychometric properties for the short
form based on the original sample used to select the items will generalize to a new
cross-validation sample. Importantly, Marsh et al. (2005) critically reviewed the
Smith et al. criteria, proposed ways in which they could be tested, and described
strategies for the development and evaluation of short forms that are one basis of
the present investigation. In this respect, our study is a methodological-substantive
synergy. Our substantive concern is the development of a short form of the PSDQS. However, we also stress a methodological focus on construct validity, stronger
operationalizations of established guidelines, and new strategies for the evaluation
of short forms.

Physical Self-Concept and the Physical Self


Description Questionnaire (PSDQ)
Historically, most self-concept instruments either ignored physical self-concept
completely or treated it as a relatively unidimensional domain incorporating characteristics as diverse as fitness, health, appearance, grooming, sporting competence,
body image, sexuality, and physical activity into a single score. In self-concept
research, there is increasing evidence for a multidimensional perspective. In their
classic review of self-concept research, theory, and measurement, Shavelson,
Hubner, and Stanton (1976) developed a multidimensional, hierarchical model
that fundamentally impacted self-concept research (Marsh & Hattie, 1996). Following from this review, self-concept researchers (e.g., Byrne, 1996; Marsh &
Hattie, 1996; Wylie, 1989) have routinely evaluated responses to self-concept
instruments through the application of confirmatory factor analyses (CFAs) to
assess their structure, structural equation models (SEMs) to relate self-concept to
other constructs, and multitrait-multimethod (MTMM) analyses to establish their
convergent and discriminant validity. Early SDQ research provided strong support
for the multidimensionality of self-concept responses posited by Shavelson et al.
(Marsh, Smith, Barnes, & Butler, 1983; also see Byrne, 1996).
The original SDQ instruments were designed to measure self-concept across
a broad array of academic, social, physical, and emotional areas. They contained
two separate components of physical self-concept: global physical ability and
appearance. However, sport and exercise psychologists required a more fine-grained
view of physical self-concept. PSDQ scales reflect some of the original SDQ scales
(physical ability, physical appearance, and esteem) and parallel physical fitness

440 Marsh, Martin, and Jackson

components identified in a confirmatory factor analysis (CFA) of physical fitness


measures (Marsh, 1993a) that extended Fleishmans (1964) classic research on the
structure of physical fitness. The PSDQ consists of nine specific components of
physical self-concept, a global physical scale, and a global self-esteem scale (see
Appendixes A and E for detail). Each PSDQ item is a simple declarative statement
and individuals respond using a 6-point true-false response scale. The PSDQ is
designed for high-school-aged adolescents. Although Marsh (1997, 2002) suggested that the PSDQ should be appropriate for adults, it has not previously been
systematically evaluated for responses by participants older than university-aged
students. Previous PSDQ research demonstrates:
Good reliability (median coefficient alpha = .92) across the 11 scales (Marsh,
1996b; Marsh, Richards, Johnson, Roche, & Tremayne, 1994);
Good testretest stability over short term (Mdn r = .83 for 11 PSDQ scales, 3
months) and longer term (Mdn r = .69, 14 months; Marsh, 1996b);
A well-defined, replicable CFA factor structure (Marsh, 1996b; Marsh, Richards, Johnson, Roche, and Tremayne, 1994);
A factor structure that is invariant over gender as shown by multiple-group
CFA (Marsh et al., 1994);
Convergent and discriminant validity as shown by multitrait-multimethod
(MTMM) studies of responses to three PSC instruments (see Marsh et al.,
1994);
Convergent and discriminant validity as shown by PSDQ relations with external
criteria (see Marsh, 1993c; 1996a; 1997);
Applicability for participants aged 1218 and for elite athletes and nonathletes
(Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995);
and
Applicability to different countries based on translated versions of the PSDQ
(Marsh, Asci, & Tomas-Marco, 2002; Marsh, Bar-Eli, Zach, & Richards, 2006).
In summary, the PSDQ is a psychometrically strong instrument that is appropriate
for a wide variety of sport and exercise research. The present research extends
this research by examining the PSDQ-S across diverse samples, including a senior
sample of participants much older than previously considered in PSDQ studies.

The Present Investigation


Aware of the many problems of short form development (Marsh et al., 2005;
Smith et al., 2000), we not only want to avoid them, but also want to contribute
to the clarification of guidelines that assist applied researchers in this precarious
endeavor. Setting out to develop the PSDQ-S that would remain true to the PSDQ
constructs and measures them reliably, we also want to offer a noticeable benefit to
applied practitioners seeking to reduce administration time by reducing the length
of the PSDQ. However, before the PSDQ-S can be accepted, we worked through
a list of probing questions that rigorously test its viability and apply some new
methodsdemonstrated fully in three studies. Consistent with our perspective on

PSDQ Short Form 441

construct validity, the main focus in this initial presentation of the PSDQ-S is on
within-network studies that are a logical prerequisite of subsequent research that
addresses between-network questions that are part of a long-term research program.
Study 1 addresses the fundamental problem of short form development,
assuming that the psychometric qualities of the selected items generalize to other
samples. Here, we compare results based on the normative database with a new
cross-validation sample, as well as results from four additional samples that are
quite diverse in terms of language, age, and nature of the sample. We focus on
psychometric properties (reliability, CFA factor structure, and multigroup invariance) of the PSDQ and PSDQ-S based on these six samples.
Study 2 evaluates stability over time and internal validity, illustrating an
application of MTMM analyses to this within-network issue. Do the new factors
show a consistent expression over time, and are short-form items equally reliable
when used repeatedly? Do the relationships between different facets of self-concept
change over time, or can a researcher wanting to conduct a longitudinal or an
intervention study assume a consistent pattern over repeated administrations? In
addressing these questions, we evaluate the stability of PSDQ-S responses over
a 1-year interval for high school students before and after graduation from high
school and for older adults who are either retired or approaching retirement age.
In different sets of analyses, we evaluate the invariance of factor structures over
time, the effects of age and gender, and support for convergent and discriminant
validity in relation to time.
Study 3 is in line with the Stanton et al. (2002; also see Donnellan, Oswald,
Baird, & Lucas 2006) recommendation that short and long forms should have
equivalent relations to external criteria. This study applies a more demanding
application of MTMM analyses based on responses to the PSDQ, Foxs Physical
Self Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989), and Richardss
Physical Self Concept (PSC; Marsh et al., 1994; Richards, 1988) instruments.
Here, based on separate analyses of the 40 PSDQ-S and 70 PSDQ items, we extend
previous studies and evaluate how well the strong support for the convergent and
discriminant validity of PSDQ responses generalizes to the PSDQ-S.

Overarching Methods
Description of Six Samples
The total sample in this investigation consisted of 4,800 participants from six diverse
groups designed to test generalizability of results over age, nationality, and diverse
groups. Groups 14 (G1G4) completed the original 70-item version of the PSDQ.
On the basis of data from the normative archive of responses to the long form of
the PSDQ (G1), we selected 40 items to be included in the PSDQ-S that were used
in two new studies (Groups 5 and 6; G5 and G6). Thus, all six groups completed
the 40 PSDQ-S items: G1G4 completed all 70 PSDQ items, whereas G5 and G6
completed only the 40 PSDQ-S items.
Normative Archive Group (G1). The normative archive group consisted of 1,605

adolescent high school students (predominantly aged 1218) broadly representative


of high school students in greater metropolitan Sydney, Australia, who completed
the PSDQ. Reanalysis of these data were the basis for selecting the 40 PSDQ-S.

442 Marsh, Martin, and Jackson

Spanish Group (G2). The Spanish group consisted of 986 Spanish adolescents
matched to G1 in terms of gender, age, and demographic variables. The PSDQ
was translated into Spanish, followed by a back-translation procedure widely
described in the literature (for further discussion, see Marsh, Tomas-Marco &
Asci, 2002).
Elite Athlete Group (G3). Group 3 consisted of 349 elite adolescent athletes
(63% males, 37% females, M age = 13.3 years) from Westfield Sports High School
(see Marsh, Hey, et al., 1997), one of the most prestigious sports high schools in
Australia.
Israel University Group (G4). Group 4 consists of 395 Israel university students
(60% women, M age = 24.4,) physical education students from a teacher preparation
program at Zinman College, Wingate Institute (77%), and School of Management
students from Ben-Gurion University. All participants completed the 70-item version
of the PSDQ and two other physical self-concept instruments. The translation of
the instruments followed a translation and back-translation process typically used
in cross-cultural research (see Marsh, Bar-Eli, et al., 2006).
Cross-Validation Group (G5). Group 5, the cross-validation sample, consists of

708 adolescents (51% female; M age = 17 years) in their last year of high school,
from metropolitan Sydney (see Parker, Martin, Martinez, Marsh, & Jackson, 2010).
Participants completed the PSDQ-S as part of a larger battery of tests related to
physical activity, health, and the transition from high school. While the PSDQ-S
was specifically designed for use with this sample, this sample differs in age from
the normative archive (aged 1218) used to select the 40 PSDQ-S items.

Senior Group (G6). Group 6, the senior sample, consists of 760 participants

who completed the PSDQ-S as part of a larger battery of tests related to physical
activity, psychological factors, health, and retirement in older adults (Jackson,
Miller, Cotterill, Brown, & ODwyer, 2007). Participants ranged in age from 52
to 93 (M = 63 years, SD = 6.8). In terms of age, this sample is clearly different
from the other samples and any previous large-scale PSDQ studies (or to any other
multidimensional physical self-concept instrument). In this respect, results based
on this sample constitute an important extension of PSDQ research, as well as
providing a demanding test of the PSDQ-S.

Additional Samples. Two additional samples were constructed from these


six groups of participants that were the basis of Studies 2 and 3. In Study 2, we
considered 212 participants from the cross-validation G5 and the 553 participants
from the senior G6 who completed the 40-item PSDQ-S on two occasions (interval
of 1 year). Participants in both these groups who completed surveys at Time 1 were
asked if we could approach them to complete the same survey 1 year later. In Study
3, we considered a subset of 322 participants from G1 (Australian adolescents) and
all participants from G4 (Israeli university students) who had completed the PSDQ
as well as Foxs PSPP (Fox & Corbin, 1989) and Richards PSC (Richards, 1988;
Marsh et al., 1994). Combining data from these previously published studies, we
employed new statistical analyses and compared responses based on the original
70 PSDQ items and the 40 PSDQ-S items.

PSDQ Short Form 443

Criteria Applied in the Selection of Items for the PSDQ-S


Short Form
Based on reanalysis of the normative archive for the 70-item PSDQ, we selected
a subset of 40 items as the basis for the PSDQ-S. The goals of the item selection
procedure were to: (a) substantially reduce the length of the PSDQ; (b) measure
and preserve the content of all 11 factors on the PSDQ; (c) retain a minimum of 3
items per scale (original scales had 6 or 8 items); (d) maintain reliability estimates
of at least .80; and (e) provide a factor structure in which goodness-of-fit indexes
are acceptable. In the selection of items (for further discussion, see Marsh et al.,
2005; also see Maano et al., 2008), items were selected that:
1. best measured the intended construct as inferred on the basis of corrected itemtotal correlations (from the reliability procedure) and the size of standardized
factor loadings in CFA (these two criteria are combined as they provide
essentially the same information);
2. had minimal cross-loadings as evidenced by LISRELs modification indexes,
indicating the extent to which the fit would be improved if an item were
allowed to load on a factor other than the one that it was intended to measure,
and expected size of the cross-loading;
3. had minimal correlated uniquenesses (CUs), particularly with other items in
the same scale. (If two items within the same scale had a substantial correlated
uniqueness, only one item was retained. Hence, this guideline superseded the
other guidelines.);
4. maintained the breadth of content of the original construct (based on subjective
evaluations of the content of each item);
5. were sufficient in number to maintain a coefficient estimate of reliability of
at least .80 (and to retain more items for scales found to be less reliable).
No attempt was made to select items explicitly on the basis of mean responses
so that the mean scores on the short form would be similar to those on the long form.

Statistical Analysis
Confirmatory factor analyses in the present investigation were conducted with
LISREL (version 8.8) using maximum likelihood estimation (for further discussion
of SEM, see Byrne, 1998; Jreskog & Srbom, 1996; Marsh, 2007). The main focus
of these analyses is to ascertain that the factor structure based on the 40 PSDQ-S
items was well defined and to compare it with the factor structure based on the
70-PSDQ items. Consistent with current practice, we emphasize the root mean
square error of approximation (RMSEA) to evaluate goodness of fit, but, following
Marsh (2007; Marsh, Balla, & Hau, 1996; Marsh, Balla, & McDonald, 1988) we
also consider the TuckerLewis index (TLI) and the comparative fit index (CFI),
as well as presenting the normal theory 2 test statistic (the default in LISREL),
and an evaluation of parameter estimates. Values for RMSEA of less than .05 and
.08 are taken to reflect a close fit and a reasonable fit, respectively. The TLI and
CFI vary along a 0-to-1 continuum in which values greater than .90 and .95 are
typically taken to reflect acceptable and excellent fits to the data. The CFI contains

444 Marsh, Martin, and Jackson

no penalty for a lack of parsimony, so that improved fit due to the introduction of
additional parameters may reflect capitalization on chance, whereas the TLI and
RMSEA contain penalties for a lack of parsimony.
To compare factor structures based on PSDQ or PSDQ-S responses in multiple
groups (or over multiple occasions), we conducted formal tests of invariance of
parameters based on a set of partially nested models. In this taxonomy of factorial
invariance tests, we compare increasingly demanding models that impose invariance constraints on factor loadings, factor variances, factor covariances, and item
uniquenesses (Marsh, 2007). Cheung and Rensvold (2002) and Chen (2007) suggested that if the decrease in fit for the more parsimonious model is less than .01
for incremental fit indexes like the CFI, then there is reasonable support for the
more parsimonious model. Chen (2007) suggests that when the RMSEA increases
by less than .015, there is support for the more constrained model. For indexes that
incorporate a penalty for lack of parsimony, it is possible for a more restrictive
(more parsimonious) model to result in a better fit than a less restrictive model
(Marsh, 2007), thus providing strong support for the more parsimonious model.
However, we emphasize that the use of fit indexes like these remains controversial
(e.g., Barrett, 2007) and proposed cut-off values are rough guidelines that should
not be interpreted as golden rules (Marsh, 2007; Marsh, Hau, & Wen, 2004). The
evaluation of model fit and selection of a best model require a subjective integration of multiple sources of evidence (different indexes, evaluation of parameter
estimates, evaluation of construct validity, and common sense) and ultimately a
degree of subjectivity and professional judgment (Chen, Curran, Bollen, Kirby, &
Paxton, 2008; Marsh, 2007; Marsh et al., 1988).
For large studies, the inevitable missing data represents a potentially important
problem, particularly when the amount of missing data exceeds 5% (e.g., Graham,
2009; Graham & Hofer, 2000). Here, however, the amounts of missing data were
extremely small (less than 0.7% in all samples and 0.04% overall), so that it was
unlikely to be a serious problem. Nevertheless, a growing body of research has
emphasized potential problems with traditional pairwise, listwise, and mean substitution approaches to missing data (e.g., Graham, 2009; Graham & Hofer, 2000;
Little & Rubin, 1987), leading us to implement the expectation maximization
(EM) algorithm, a widely recommended approach to imputation for missing data,
as operationalized using missing value analysis in SPSS (version 17).

Study 1: Cross-Validation of Psychometric


Properties of Items Selected for the PSDQ-S:
Do the 40 PSDQ-S Items Measure the Same
Factors on the Long and Short Forms?
Study 1 Methods
The overarching purpose of Study 1 is to compare the psychometric properties of
responses to the PSDQ and PSDQ-S based on responses by the six groups. We begin
with reliability analyses of PSDQ and PSDQ-S factors in each of the six groups,
followed by single-group CFAs (see earlier overarching methodology section for
details of the CFAs). Next, we evaluate tests of the invariance of the factor structure

PSDQ Short Form 445

for responses to the 40 PSDQ-S items based on the G1 (the archive group used to
select the 40 items from the 70 PSDQ items) and G4 (the cross-validation group
who completed the PSDQ-S) as well as the other four groups. Finally, we apply a
new missing-by-design approach to multigroup invariance tests specifically introduced by Marsh et al. (2005) to evaluate short forms. In this approach to missing
data, the researcher forms separate groups for participants with different patterns
of missing data (e.g., Allison, 1987; Kaplan, 2000; Muthen, Kaplan, & Hollis,
1987; Jreskog & Srbom, 1996). Although the approach is rarely used because
the number of missing data patterns is typically very large (and group sizes very
small), it becomes practical when different groups are given different forms of the
same instrument. This multigroup missing-by-design approach to the evaluation of
short forms of an instrument was introduced by Marsh et al., but to our knowledge
has rarely been used to evaluate short forms (but see Maano et al., 2008). For
purposes of just this analysis, we used the full information maximum likelihood
(FIML) estimation approach to missing data. We present further discussion of the
rationale for this approach as part of the results of Study 1.

Study 1 Results
Are the PSDQ-S Scales Reliable? How Do They Compare With the PSDQ
Factors? Reliability estimates (Table 1) are consistently high for all factors based

on the 40 PSDQ-S items across all groups (mean s across the 11 PSDQ-S scales
vary from .84 to .91). All 11 PSDQ-S scales had reliability estimates of at least .80
in both groups and the reliability is marginally higher in the new cross-validation
G5 (.81.94; M = .89) than in the norm G1 (.82.94; M = .86). Across all six groups
the mean reliability estimates are consistently greater than .80 (mean alphas vary
from .81 to .91; Table 1) as is the case for most of the individual scales.
It is interesting to compare the reliabilities of the 70 PSDQ items and the 40
PSDQ-S items for Groups 14. Not surprisingly, the reliabilities are marginally
lower for the PSDQ-S responses, as reliability varies in part according to the number
of items. For example, the mean reliabilities for the normative archive G1 are .89
(70-PSDQ items) and .86 (40-PSDQ-S items). As the norm G1 was the basis of
selecting items for the PSDQ-S, it might be expected that the difference was so
small due to capitalization on chance in the item selection. However, the difference
in mean reliabilities is of a similar size in the other three groups that completed
the 70-item PSDQ. In conclusion, there is only a very small drop in reliability in
moving from the PSDQ to the PSDQ-S.

How Well Does the PSDQ-S Factor Structure Fit the Data for Diverse Samples?
How Does It Compare With the Fit of the PSDQ Factor Structure? In pur-

suing these questions, we began with a series of single-group CFAs for each of the
six groups (labeled SG1SG6 in Table 2). For Groups 14, we did separate analyses
comparing the factor structures based on responses to the 40 PSDQ-S items (e.g.,
SG1A) and the 70 PSDQ items (SG1B), whereas for Groups 5 and 6 it was only
possible to do analyses on the 40 PSDQ-S items. Two features are clear in these
analyses. First, the goodness-of-fit statistics are very good for all 8 CFAs in relation
to traditional guidelines of good and acceptable levels of fit. Thus, for example,
all 8 TLIs and CFIs are greater than .96. Second, the goodness-of-fit statistics for
Groups 14 are all marginally better for responses to the 40 PSDQ-S items than to

446

4
.79
.78
.88
.82
.93
.78
.86
.91
.81
.91
.86
.860
.848
.055
4
3
3
5
3
5
3
3
5
3
3
11
11
11

Number
of Items

Short Form (40 Items)


Group
1
2
3
4
.85
.84
.80
.75
.88
.84
.91
.74
.91
.87
.92
.82
.84
.80
.86
.79
.82
.80
.84
.90
.83
.70
.85
.73
.86
.87
.88
.77
.88
88
.87
.85
.82
.80
.85
.79
.92
.88
.81
.86
.85
.88
.84
.88
.850 .840 .850 .790
.860 .839 .857 .807
.034 .040 .037 .059
5
.90
.87
.89
.89
.90
.81
.89
.91
.87
.94
.91
.890
.889
.033

6
.88
.86
.93
.91
.83
.82
.91
.91
.78
.94
.91
.910
.880
.051

Note. Group 1 = archive adolescents; Group 2 = Spanish adolescents; Group 3 = elite-athlete adolescents; Group 4 = Israeli university students,
Group 5 = cross-validation sample, Group 6 = senior. All respondents in the first three groups completed the 70-item PSDQ. However, because
all 40 items on the PSDQ-S come from the original PSDQ items, it was possible to do parallel analyses on items from the PSDQ and PSDQ-S.
All participants in Groups 4 and 5 completed only the PSDQ-S.

Scale
Activity
Appearance
Body Fat
Coordination
Endurance
Esteem
Flexibility
Gen. Physical
Health
Sport
Strength
Median
Mean
SD

Long Form (70 Items)


Group
Number
of Items
1
2
3
6
.88
.89
.79
6
.90
.87
.91
6
.95
.92
.94
6
.86
.83
.86
6
.90
.89
.91
6
.88
.84
.88
8
.87
.86
.90
6
.93
.92
.93
8
.83
.79
.86
6
.93
.91
.82
6
.89
.89
.87
11
.890 .890 .880
11
893
.874 .879
11
.035 .041 .046

Table 1 Coefficient Alpha Estimates of Reliability for Long (70 Items) and Short (40 Items) Form
of the Physical Self Description Questionnaire (PSDQ) Based on Responses From Six Groups

PSDQ Short Form 447

the 70 PSDQ items. While this might not be surprising for the norm G1 that was
the basis of selection of the items (i.e., items that contributed most to the misfit
of the data were eliminated and then the a priori model was fit to the same data),
there was a similar trend for the other three groups who completed the 70 PSDQ-S
items. The very good fit for separate analyses based on the 40 PSDQ-S items for
each of these six diverse groups provides strong support for the generalizability of
support for the a priori PSDQ-S factor structure.
Even though the fit of the a priori model in all six groups is relevant, our particular focus is on the comparability of the factor solutions in the norm G1 and the
cross-validation G5. Before formally evaluating tests of invariance, it is appropriate
to evaluate parameter estimates for SG1A (norm G1) and SG5A (cross-validation
G5), in which no invariance constraints are imposed. All parameter estimates for
both models are presented in Appendix B and summarized in Table 3 (Study 1).
Compared with the factor solution based on the norm group, in the cross-validation
G5, factor loadings were as high or higher (M factor loading of .819 vs. .784),
uniquenesses were as small or smaller (M uniqueness of .320 vs. .377), and factor
correlations were smaller (M factor correlation of .431 vs. .498). Nevertheless, in
each case the pattern of results across different items and different factors was very
similar in the two factor solutions (see profile similarity indexes, PSIs, in Table 3).
Do the 40 PSDQ-S Items Measure the Same Factors on the Long and Short
Forms? Invariance Across Archive (G1) and Cross-Validation (G5) Groups.

In multiple group models considered here (Study 1, models MG1A-MG1E in Table


4), we evaluate formal tests of the invariance of the parameter estimates across
the normative archive group (G1) used to develop the PSDQ-S and the new crossvalidation group (G5) used to test the PSDQ-S. We begin with MG1A (configural
invariance) that imposes no invariance constraints and then move to progressively
more demanding models in which different sets of parameter estimates are constrained to be invariant across the two groups.
Particularly for fit indexes that control for model parsimony (TLI and RMSEA),
the added constraints result in almost no change in goodness of fit for any of the
models. The changes in the fit indexes are clearly less than the values traditionally
used to support the more parsimonious model (increases of less than .01 for CFI,
Cheung & Rensvold, 2002; decreases in RMSEA of less than .015, Chen, 2007).
Even the slight decrements in fit due to the constraint of the uniquenesses (consistent
with earlier results showing that responses were more reliable and uniquenesses
were smaller in the cross-validation sample) are small.
Do the 40 PSDQ-S Items Measure the Same Factors on the Long and Short
Forms? Invariance Across All six Groups. In Models MG2AMG2E (Table 4),

we extend the five invariance models to include all six groups. However, the pattern
of results is again similar to that based on the two-group comparisons. For models
MG2AMG2D, the changes in fit indexes were very small and clearly consistent
with guidelines used to support the inclusion of the increasingly demanding
invariance constraints. It is only for Model MG2E (which imposed the invariance
of uniquenesses as well as factor loadings and factor variance/covariances) that
there is a discernable decrement in fit. Thus, for example, comparing MG2E to
MG2F, the change in CFI (.977 vs. .966) was slightly larger than the .01 guideline
proposed by Cheung and Rensvold (2002), although the change in RMSEA was

448

CFI
.991
.986
.978
.974
.979
.966
.964
.943
.975
.980

TLI

.990
.985

.975
.973

.976
.964

.959
.946

.971

.977

.055

.067

.053
.066

.050
.063

.046
.049

.037
.048

RMSEA

.053.058

.065.070

.049.057
.064.068

.046.055
.061.065

.044.048
.048.050

.035.039
.047.048

95% CI

Senior (40 items, N = 760)

Cross-Validation (40 items, N = 708)

Israeli (40 items, N = 395)


Israeli (70 items, N = 395)

Elite Athlete (40 items, N = 349)


Elite Athlete (70 items, N = 349)

Spanish (40 items, N = 986)


Spanish (70 items, N = 986)

Archive (40 items, N = 1,607)


Archive (70 items, N = 1,607)

Description

Note. CFI = comparative fit index TLI = TuckerLewis index, RMSEA = root mean square error of approximation, df = degrees of freedom. For
Groups 13 who completed the long form, separate analyses were done for all 70 items from the long form (A) and for the 40 items from the
short form (B). For Groups 4 and 5 who completed the short form, only analyses are presented for the 40 items.

Model
df
2
SG1 Archive (Adolescent)
1a
2,198
685
1b
10,578
2,290
SG2 Spanish (Adolescent)
2a
2,118
685
2b
7,710
2,290
SG3 Elite Sport (Adolescent)
3a
1,289
685
3b
5,443
2,290
SG4 Israeli (Young Adult)
4a
1,449
685
4b
5,661
2,290
SG5 Short Form (Adolescent)
5a
2,865
685
SG6 Short Form (Senior)
6a
2,277
685

Table 2 Summary of Goodness of Fit for Models Based on Responses for Each of Six Samples
for Long (A; 70 Items) or Short (B; 40 Items) Form

449

Med
b

SD

Archive (Group 1)
Cross-valid (Group 5)

55 Factor correlations

Study 2: Comparison of MTMM Correlations Based on the Longitudinal Samplec


11 MTHM corrs (convergent validities)

110 HTHM corrs (diff time, diff trait)

55 HTMM-T1 corrs (same time, diff trait)

55 HTMM-T2 corrs (same time, diff trait)

Archive (Group 1)
Cross-valid (Group 5)

40 Residuals (uniqueness)

.796
.362
.421
.421

.498
.431

.377
.320

.800
.375
.440
.440

.541
.441

.354
.277

.085
.158
.187
.187

.204
.221

.137
.143

Study 1: CFA Estimates for Archive (ARC, Group 1) and Cross-Validation (CVS, Group 5) Samples
40 Nonzero factor loadings
Archive (Group 1)
.784
.804
.089
Cross-valid (Group 5)
.819
.851
.095

Description

.910
.730
.800
.800

.837
.777

.625
.742

.920
.937

High

Table 3 Summary Statistics Comparing Sizes of Parameter Estimates From Different Studies

.590
.030
.030
.030

.099
.010

.154
.122

.612
.508

Low

(continued)

.918

.776

.755

PSIa

450

Long
Short
Long
Short
Long
Short
Long
Short

152 HTHM corrs (diff instr, diff trait)

55 HTMM-MPSDQ corrs (same instr, diff trait)

10 HTMM-FPSPP corrs (same instr, diff trait)

21 HTMM-RPSC corrs (same instr, diff trait)

.336
.336

.713
.713

.425
.429

.366
.367

.632
.636

.390
.390

.660
.660

.437
.423

.362
.357

.635
.643

.818
.807

Med

.138
.138

.122
.122

.166
.167

.138
.139

.056
.040

.059
.060

SD

.716
.716

.891
.891

.716
.748

.689
.659

.697
.674

.921
.890

High

.126
.126

.530
.530

.085
.145

.037
.034

.532
.586

.747
.733

Low

1.000

1.000

.959

.987

.804

.969

PSIa

a
Profile similarity indexes (PSIs) represent the correlation between two sets of corresponding parameter estimates from two different analyses (e.g., factor loadings
in Groups 1 and 5). bStudy 1 compared factor structures for the PSDQ-S original archive file based on archive (N = 1607) and cross-validation (n = 708) groups. The
profile similarity index is the correlation between estimates in archive and cross-validation groups. Also see models SG1A and SG5A in Table 1 and full solutions in
Appendix 1. cStudies 2 (N = 700) and 3 (N = 717) are multitrait-multimethod studies. MTHM = monotrait-heteromethod (same method, different methods, where method
refers to time in Study 2 and instrument in Study 3; convergent validities); HTHM = heterotrait-heteromethod; HTMM-T1 and HTMM-T2 = heterotrait-heteromethod
correlations at Time 1 and Time 2 in Study 2; HTMM-MPSDQ, HTMM-FOSPP, and HTMM-RPSC = heterotrait-heteromethod correlations at for each of three instruments in Study 3. For Study 3, separate analyses were done for the 70 PSDQ and 40 PSDQ-S items; the PSI is the correlation between estimates in these two analyses.

Long
Short

.818
.810

Study 3: Comparison of MTMM Correlations Based on Three PSC Instrumentsc


9 MTHM-Match corrs (convergent validity)
Long
Short

6 MTHM-Close corrs (convergent validity)

Description

Table 3, continued

451

12,130
12,852
13,289
14,173
15,119
18,561

11,513
11,668
11,731
11,981
12,111
13,057

MG2a
MG2b
MG2c
MG2d
MG2e
MG2f

MG3a
MG3b
MG3c
MG3d
MG3e
MG3f

8,510
6,012

5,063
5,221
5,283
5,493
5,690
5,895

MG1a
MG1b
MG1c
MG1d
MG1e
MG1f

LGI0
LGIa

Model

2,849
2,809

4,640
4,669
4,680
4,713
4,735
4,775

4,110
4,266
4,310
4,530
4,585
4,785

1,370
1,399
1,410
1,454
1,465
1,505

df

.980
.987

.980
.979
.978
.978
.977
.967

.984
.984
.984
.984
.983
.980

TLI

RMSEA

95% CI

Description

Study 2: Longitudinal Invariance Over Timed


.982
.051
.050.052
IN = none, no correlated uniqueness
.989
.039
.037.040
IN = none

Study 1: G1 (70 items) Versus G4 (40 items)c

.036
.035.037
None

.036
.035.037
Factor Loading (FL)

.036
.035.037
FL & Factor Variance (FV)

.037
.036.037
FL & Factor Covariance (FCV)

.037
.037.038
FL, FV, FCV

.039
.038.040
FL FV FCV UNQ

Study 1: G1 Versus G5 (40 items)b


.982
.049
.048.050
No invariance
.981
.050
.049.051
Factor Loading (FL) invariant
.980
.051
.050.052
FL & Factor Variance (FV) invariant
.979
.052
.050.053
FL & Factor Covariance (FCV) invariant
.976
.053
.052.054
FL, FV, FCV
.966
.061
.059.061
FL FV FCV UNQ

Study 1: G1 Versus G5 (40 items)a


.986
.049
.047.050
No invariance
.986
.049
.047.050
Factor Loading (FL) invariant
.985
.049
.047.050
FL and Factor Variance (FV) invariant
.985
.049
.048.050
FL, Factor Covar (Fcov)
.984
.050
.049.051
FL, FV, Factor Covar (Fcov)
.981
.050
.049.052
FL, FV, Fcov, Uniquensses invariant

CFI

Table 4 Summary of Goodness of Fit for CFA Models in the Present Investigation

(continued)

452

6,074
6,092
6,159
6,176
6,278

6,762
6,855
7,577
7,426
7,011
6,883

LGIb
LGIc
LGId
LGIe
LGIf

MIMICa
MIMICb
MIMICc
MIMICd
MIMICe
MIMICf

3,118
3,151
3,184
3,173
3,173
3,162

2,838
2,849
2,893
2,904
2,944

df

RMSEA

95% CI

Description

Study 2: Longitudinal Invariance Over Timed


.989
.039
.037.040
IN = FL
.989
.039
.037.040
IN = FL, FV
.989
.038
.037.040
IN = FL, FVCV
.989
.038
.037.040
IN = FL, FV FCV
.988
.038
.037.040
IN = FL, Inter, FVCV

CFI

Study 2: MIMIC Models of Sex and Age-Group Effectse


.986
.987
.039
.038.040
Sex(Fr,Nt), Age(Fr,Nt), Sex-Age (Fr,Nt)
.986
.987
.039
.038.040
Sex(Fr,It), Age(Fr,It), Sex-Age (Fr,It)
.985
.986
.043
.041.044
Sex(FI = 0), Age(FI = 0), Sex-Age (FI = 0)
.985
.986
.042
.041.043
Sex(Fr,It), Age(FI = 0), Sex-Age (FI = 0)
.986
.987
.040
.039.041
Sex(FI = 0), Age(Fr,It), Sex-Age (FI = 0)
.986
.987
.039
.038.041
Sex(Fr,It) Age(Fr,It), Sex-Age (FI = 0)

.987
.987
.988
.988
.988

TLI

Note. CFI = comparative fit index, TLI = TuckerLewis index, RMSEA = root mean square error of approximation, df = degrees of freedom, CI = confidence interval.
a
Tests of invariance across Group 1 (Archive) and Group 5 (cross-validation) for 40 PSDQ-S items. bTests of invariance across all six groups for 40 PSDQ-S items.
c
Tests of invariance across Group 1 (Archive) for 70 PSDQ items and Group 5 (cross-validation) for 40 PSDQ-S items, using missing-by-design analysis. dTests of
invariance across time for 40 PSDQ-S items; all models except LGI. eMultiple indicator multiple attribute (MIMIC) models of relations of sex, age group, and sex
age-group interactions with 11 PSDQ-S factors at Time 1 and Time 2 (based on Model LGIf). In different models, these correlations are Fr = free, It = Invariant over
time, Nt = not invariant over time, FI = 0 = fixed to be zero. fSeparate CFAs were done for each instrument (PSPP, PSC, PSDQ, PSDQ-S) followed by separate
MTMM analyses of all three instruments based on the PSDQ and PSDQ-S.

Study 3: CFA MTMM Models Across Three Physical Self-Concept Instrumentsf


Each Instrument Separately
PSPP
5,039
395
.914
.922
.128
.125.131
30 items, PSPP only
PSC
2,251
539
.950
.955
.067
.064.070
35 items, PSC only
PSDQ
8,405
2,290
.964
.966
.061
.060.062
70 items, PSDQ only
PSDQ-S
2,037
685
.971
.974
.052
.050.055
40 items, PSDQ-S only
All Three Instruments
MTMMa
24,919
8,657
.969
.970
.051
.050.052
135 items: 70-PSDQ, 30-PSPP, 35-PSC
MTMMb
15,172
5,102
.967
.969
.052
.051.053
105 items: 40-PSDQ, 30-PSPP, 35-PSC

Model

Table 4, continued

PSDQ Short Form 453

less than the guideline proposed by Chen (2007). As was the case in the two-group
comparison, earlier results (see reliability estimates in Table 1) demonstrate that
the measurement error is systematically lower in the two groups who completed
only the 40 PSDQ-S items (G5 and G6) compared with the other groups who
completed all 70 PSDQ items.
Do the Long and Short Forms Really Measure the Same Factors? Missingby-Design CFA Approach to Evaluate Short-Forms. Analyses thus far have

compared responses to the 40 PSDQ-S items for the normative archive G1 that
completed the 70-item PSDQ and the cross-validation G5 that completed only the
40-item PSDQ-S. Whereas these results argue that the 40 items on the short form
measure the same factors for the long and short forms of the PSDQ, this does not
necessarily imply that the 40 items on the short form measure the same factors as
the 70 items on the long form. In particular, Marsh et al. (2005) and Smith et al.
(2000) suggested that due to the typical item selection processes, factors defined
by short forms might be more narrowly defined than the corresponding factors
based on the long form. Marsh and colleagues demonstrated a novel CFA approach
to this critical issue. Specifically, they applied the rarely used multiple group
missing-by-design approach that is ideally suited to the comparison of short and
long forms of an instrument. As applied in the present investigation, responses to
the 30 items from the long form that were not included on the PSDQ-S are treated
as missing responses in the cross-validation group. In tests of invariance across the
two groups (MG3AMG3E in Table 4), we evaluate the invariance of factor loading
and uniquenesses for the 40 items common to both groups and the invariance of
the factor variance-covariance matrix.
The pattern of goodness-of-fit indexes for the different models is clearly consistent with results for the corresponding models MG1AMG1E based on only the
40 PSDQ-S items. In particular, even the model with complete invariance across
all (common) parameter estimates fits well (RMSEA = .036). There is almost
no difference in fit statistics for MG3A, imposing no invariance constraints with
models constraining factor loadings, factor variances, and factor covariances to
be invariant for the 40 common items. It is only with the added constraint of the
invariance of the 40 uniquenesses that there is a slight decrement in fit (RMSEA =
.039), although even this is small in relation to Chens (2007) guideline of changes
of .015. In summary, the results of Study 1 provide clear support for the conclusion
that the PSDQ and PSDQ-S measure the same factors.

Study 2
Longitudinal Stability of PSDQ-S Responses:
A Multitrait-Multimethod Approach
In Study 2, we consider testretest responses over a potentially turbulent 1-year
interval. For our adolescent sample (G5), this interval constituted the transition
from high school to life after school. For the senior sample (G6), this interval was
a period when many were either approaching or actually coping with retirement.

454 Marsh, Martin, and Jackson

Study 2 Methods
Thus far, we have looked at invariance across different samples. In Study 2 (also see
the earlier presentation of the overarching methodology), we consider invariance
over time for a sample of participants from Groups 5 and 6 who completed the
PSDQ-S on two occasions. This component has three main aims, to evaluate: the
invariance of parameter estimates over approximately a 1-year interval, adapting
the logic that we used to test invariance over multiple groups in Study 1; support
for convergent and discriminant validity over time, based on a MTMM analysis
with time as the method variable; and the effects of gender, age group (adolescents
from G5 and seniors from G6), and their interaction using a multiple-indicatormultiple-indicator-cause (MIMIC) model.
The MTMM design is used widely to assess convergent and discriminant
validity, and is one of the standard criteria for evaluating self-concept instruments
(e.g., Byrne, 1996; Marsh & Hattie, 1996; Shavelson et al., 1976; Wylie, 1989). The
MTMM design provides a strong approach to evaluating stability of responses to a
multidimensional instrument, as emphasized by Campbell and OConnell (1967),
who specifically operationalized the multiple methods in their MTMM paradigm
as multiple occasions. Marsh and colleagues (Marsh, Tomas-Marco, et al., 2002;
Marsh, Asci, et al., 2002) also recommended this approach to evaluate support for
the convergent and discriminant validity for new short forms in relation to temporal
stability over time.
Campbell and Fiskes (1959) paradigm is, perhaps, the most widely used construct validation design. Although their original guidelines are still widely used to
evaluate MTMM data, important problems with their guidelines are well known
(see reviews by Marsh, 1988, 1993b; Marsh & Grayson, 1995). Ironically, even in
highly sophisticated CFA approaches to MTMM data, a single scale scoreoften
an average of multiple itemsis used to represent each traitmethod combination. Marsh (1993b; Marsh & Hocevar, 1988), however, argued that it is stronger
to incorporate the multiple indicators explicitly into the MTMM design. When
multiple indicators are used to represent each scale, CFAs at the item level result
in a MTMM matrix of latent correlations, thereby eliminating many of the objections to the CampbellFiske guidelines. This is the approach chosen for Study 2.
MIMIC models (Kaplan, 2000; Jreskog & Srbom, 1996; Marsh et al., 2005;
Marsh, Tracey, & Craven, 2006) are like multivariate correlation and regression
models in which latent variables (e.g., multiple dimensions of self-concept) are
related to discrete contrast or grouping variables (e.g., age, gender, age gender).
The MIMIC approach is more flexible than the traditional multivariate analysis of
variance approach in allowing a mixture of continuous and discrete independent
variables (i.e., contrast, background, or grouping variables). Although it is similar
to a multiple regression approach to ANOVA, the MIMIC model has the important
advantage that the dependent variables are latent variables based on multiple indicators, corrected for measurement error. In the evaluation of background grouping
variables, we constructed three single-degree-of-freedom contrast variables to represent the two age groups (0 = adolescents in G5, 1 = senior citizens in G6) gender
(0 = male, 1 = female), and the corresponding two-way interaction.
Study 2 was based on analyses of responses to individual items. Although the
use of ex-post facto correlated uniquenesses (CUs) should be avoided (e.g., Marsh,
2007), Marsh and Hau (1996; Marsh, 2007), Jreskog (1979), and others argue

PSDQ Short Form 455

that the failure to include a priori CUs when the same item is used on multiple
occasions systematically biases parameter estimates such that testretest correlations among matching latent factors are systematically inflated, and that this bias
can also systematically bias other parameter estimates. In preliminary analyses,
the fit of the model without CUs (LGI0, Table 4) was good (TLI = .980, RMSEA
= .051), but the corresponding model with CUs (LG1A, Table 4) was substantially
better (TLI = .987, RMSEA = .039). Furthermore, the testretest correlations were
systematically larger in LGI0 (.60 to .98; testretest correlations in Table 5) than
LG1A (.56 to .90). Indeed, some testretest correlations in LGI0 approached 1.0
and the solution was technically improper (nonpositive definite factor variancecovariance matrix)consistent with our claim that parameter estimates in this model
are biased. On the basis of these results, CUs were retained in all subsequent models

Study 2 Results
Longitudinal Factor Structure of PSDQ-S Responses. Models LGIA-LGIE

parallel the five models previously used to test multigroup invariance in Study 1
(see Table 4). Here, however, invariance tests the equality of parameter estimates
are based on two different occasions for the same respondents rather than invariance
across different groups of respondents. Inspection of the fit indexes (Study 2,
Table 4) demonstrates clear support for the invariance over time of the factor
loadings, factor variances, factor covariances, and item uniquenesses. Indeed,
based on indexes that control for parsimony (TLI and RMSEA), the most highly
constrained model with complete invariance of these parameters fits the data as well
or better than the least constrained model with no invariance constraints. Hence,
the longitudinal invariance models applied to the PSDQ-S provide good support
for the invariance of factor solutions over time and for this aspect of construct
validity. Given the potentially turbulent 1-year interval considered, the testretest
correlations were remarkably large in that all but Health self-concept were at least
.70 and many were more than .80.

Multitrait-Multimethod (MTMM) Analysis With Time as the Method Factor.

Here we pursue MTMM analyses to evaluate support for the convergent and discriminant validity of the PSDQ-S responses in relation to time. The 22 22 correlation matrix relating responses to the 11 T1 PSDQ-S factors and the corresponding
11 T2 factors (see Appendix C) can be viewed as a MTMM matrix. As this is a
matrix of latent correlations, the correlations have been purged of measurement
error. Furthermore, because there is good support for the invariance of the factor
solution over time, we are confident that the same latent constructs are being
measured on each occasion. Our main emphasis here is on the 22 22 matrix of
correlations among the latent constructs that can be viewed as a latent MTMM
matrix with 11 traits (latent PSDQ-S factors) and 2 methods (occasions). From
this perspective, the 11 stability coefficients are seen as the convergent validities
(monotrait-heteromethod correlations).
The 11 convergent validities (those shaded in gray in Appendix C) represent
correlations between the same factors assessed by different methods (monotraitheteromethod correlations or testretest stabilities when the multiple methods are
the different occasions). These convergent validities are consistently high (M = .80,
Table 3) and only the coefficient for Health (.59) is less than .75. The correlations

456

AC
AP
BF
CO
EN
ES
FL
GP
HE
Effects of Gender (Female = 1, Male = 0), Age Group (1 = Senior, 0 = Adolescent),
and Their Interaction (Fem-Age)
.10
.06
.14
.19
.21
.13
.01
.17
.04
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
.22
.15
.24
.17
.32
.10
.07
.05
.26
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
(.03)
.04
.06
.09
.08
.03
.05
.02
.07
.02
(.04)
(.03)
(.04)
(.03)
(.04)
(.04)
(.04)
(.04)
(.03)

.27
(.03)
.34
(.03)
.01
(.04)

SP

.92
(.01)
.90
(.01)
.90
(.01)

SP

.19
(.03)
.14
(.04)
.07
(.04)

ST

.87
(.12)
.85
(0.12)
0.85
(.12)

ST

Note. For each of the 11 PSDQ-S factors (AC through ST; see Appendix A for a definition of the factors), testretest correlations are presented
for three models (see Models for Longitudinal Invariance Over Time in Table 4). The effects of background variables are based on the addition
of these variables to Model LGI5 in which parameter estimates are constrained to be invariant over time.

Fem-Age

Age-Group

Female

.60
(.03)
.56
(.03)
.57
(.03)

Effect

LGI5

LGI1

.78
(.02)
.77
(.02)
.77
(.02)

HE

.74
(.02)
.69
(.02)
.70
(.02)

BF
CO
EN
ES
FL
GP
TestRetest Correlations for Selected Models
.83
.76
.98
.80
.79
.74
(.02)
(.02)
(.02)
(.02)
(.02)
(.02)
.81
.75
.87
.74
.78
.73
(.02)
(.02)
(.02)
(.02)
(.02)
(.02)
.81
.75
.87
.74
.78
.73
(.02)
(.02)
(.02)
(.02)
(.02)
(.02)

LGI0

AP

AC

Model

Table 5 Selected Effects From Longitudinal Invariance Tests (Standard Errors in Parentheses)
From Study 3: TestRetest Correlations and Effects of Gender, Age Group, and Their Interaction

PSDQ Short Form 457

between different factors administered on different occasions (mean r = .36, heterotrait-heteromethod, HTHM, correlations in the square submatrix, not including
the convergent validities in the shaded diagonal) are substantially lower than the
convergent validities. The correlations between different factors administered on
the same occasion (mean r = .42, heterotrait-monomethod, HTMM, correlations in
the diagonal submatrixes) are only slightly larger than the heterotrait-heteromethod
(HTHM) correlations. In the application of the CampbellFiske guidelines, each
convergent validity is compared with the 40 HTHM (different trait, different
method) and 40 HTMM (different trait, same method) correlations involving the
same trait. Across the 11 PSDQ-S factors, this requires 440 HTHM and 440 HTMM
comparisons. For all 880 comparisons, the convergent validity is systematically
larger than the comparison coefficient.
Effects of Gender, Age Group, and Their Interaction. Are the physical selfconcepts of adolescents systematically different from those of older adults? Do
gender differences shown in previous adolescent research generalize to older adults?
Study 3 offers a unique opportunity to address these issues, testing the effects of
gender, age-group, and their interaction over time. In pursuit of this issue, we begin
with Model LGIf (invariant factor loadings, factor variances/covariances, and item
uniquenesses; see Study 2 in Table 4). Although not a necessary prerequisite for
analyses presented here, this invariance facilitates the interpretation of comparisons
over time. To model LGIf, we add contrast variables representing the main effects
of gender (0 = male, 1 = female) and age group (0 = adolescents, 1 = older adult)
and their interaction. Consistent with our focus on taxonomies of nested models,
we consider alternative MIMIC models in which one or more of these three effects
are constrained to be zero or invariant over time.
In Model MIMICa, all correlations are freely estimated relating the three
contrast variables (gender, age group, and their interaction) to the 11 PSDQ-S factors at Time 1 and Time 2 (i.e., 3 11 2 = 66 correlations). In Model MIMICb,
the 33 correlations relating the three demographic variables to 11 PSDQ-S factors
at Time 1 are constrained to be equal to the corresponding 33 correlations at
Time 2. Each of the goodness-of-fit indexes is almost exactly the same for these
two models, providing strong support for the invariance of these relations over
time. In subsequent models, we test the effects of further constraints on model
MIMICb.
In Models MIMICcMIMICf, we test the effects of constraining various
sets of correlations to be zero (because these correlations are constrained to be
invariant over time, fixing all correlations associated with any one of the contrast
variables to be zero at one time results in correlations being zero for the other
time as well). In MIMICc, all 66 correlations are constrained to be zero. This
results in a substantial decrement in fit (e.g., a change in chi-square from 6855 in
MIMICb to 7577 in MIMICc). In subsequent models, only the effects of gender
are freely estimated (but constrained to be invariant over time, MIMICd), only
the effects of age group are freely estimated (MIMIMe), or the effects of sex
and age-group are freely estimated (MIMICf). The juxtaposition of these results
demonstrates that there are significant effects of gender and even larger effects of
age-group, but almost no effects due to the interaction of these two demographic
variables. We now turn to actual parameter estimates based on these analyses.

458 Marsh, Martin, and Jackson

Women in our two longitudinal groups have consistently lower self-concepts


than men for most of the PSDQ-S factors. The largest differences are for Sport,
Endurance, and Strength, but there are also statistically significant differences
in Coordination, Body Fat, Activity, and the two global scales (Global Physical
and Esteem). However, men and women do not differ significantly in terms of
Physical Appearance, Health, and Flexibility.
The senior G6 has significantly lower self-concepts for most of the PSDQ-S
factors. The largest differences are for Sport, Endurance, Health, and Body Fat.
Interestingly, however, the senior G6 has Global Physical self-concepts that are
as good as or slightly better than the adolescent age group, and significantly
higher levels of self-esteem. Although not an a priori prediction, this pattern
is not surprising. For global self-esteem, there is good evidence that scores
decline during childhood and early adolescence, level out during middle adolescence, and then continue to increase into adulthood (e.g., Marsh, 1989; also
see Brandtstadter & Greve, 1994; Carstensen & Freund, 1994). Hence, at least
the direction of differences for the global scales is consistent with extrapolations from previous research. Our suppositions about the juxtaposition between
the global scales and the specific physical scales are more speculative (but see
Alaphilippe, 2008). As people get older, their physical attributes decline and they
are generally aware of this. However, they also become more accepting of these
effects and develop strategies to protect their sense of self that leads to positive
and resilient self-esteem (e.g., Alaphilippe, 2008; Brandtstadter & Greve, 1994;
Carstensen & Freund, 1994). Furthermore, self-concept is highly dependent on
frame of reference effects as well as other standards. For specific PSDQ-S factors, the self-concepts are closely tied to actual performances so that they are
strongly influenced by declines in these objective external standards, and they
show some decline with age. However, for self-esteem and, to a lesser extent,
the global physical scale, respondents have a lot more flexibility in operationalizing the frame of referenceusing social comparison processes such that their
self-concepts are in relation to others of a similar age. This suggests that these
older participants understand that they have diminished attributes in many physical areas, but apparently have come to grips with these differences in terms of
how they think about themselves globally. These heuristic speculations about
the juxtaposition between age, self-esteem, and specific components of physical
self-concept warrant further research.
The gender age-group interactions are consistently small and mostly nonsignificant, indicating that the pattern of gender differences generalizes reasonably well across the age groups, and that the pattern of age-group differences
generalizes reasonably well across gender. There are, nevertheless, a few of the
interaction effects that are marginally significant. In each case, these suggest that
gender differences are somewhat smaller for the senior age group. Nevertheless,
the results show a consistent the pattern of gender differences across the two
diverse age groups, extending previous research that is usually based on a more
limited age of younger participants.

PSDQ Short Form 459

Study 3
MTMM Study of Responses to Three Physical Self-Concept
Instruments
Study 3 Methods. The purpose of Study 3 is to compare support for the convergent

and discriminant validity of the PSDQ and PSDQ-S in relation to responses to two
other physical self-concept instruments. Thus, Study 3 is a MTMM study in which
multiple traits are multiple physical self-concept factors and the multiple methods
are the three physical self-concept instruments (also see the earlier presentation
of the overarching methodology). Study 3 is based on data from the Marsh et al.
(1994) and Marsh, Bar-Eli, et al. (2006) studies, but differs from the previous studies.
In particular, we begin with data from both studies, providing a sufficiently large
sample (N = 717; 322 from Marsh et al. and 395 from Marsh, Bar-Eli, et al.) to
allow us to evaluate appropriately large CFA models (with more than 500 estimated
parameters) based on responses from all 135 items designed to measure a total of
23 latent factors representing the three physical self-concept instruments. However,
the most important difference, and the main focus of Study 3, is the comparison
of results based on separate analyses of all 70 PSDQ items and the 40 PSDQ-S
itemsproviding an apparently new approach to the evaluation of short forms.

Richardss Physical Self-Concept Instrument (PSC). The PSC (Marsh et al.,

1994; Richards, 1988) is a 35-item instrument that measures six specific components
of physical self-concept and one general physical satisfaction factor (see Appendix
A). Each item is a simple declarative statement and participants respond on an
8-point true/false response scale. PSC responses have good psychometric properties
that generalize across age and gender (Marsh et al., 1994; Richards, 1988).

Foxs Physical Self-Perception Profile. The PSPP (Fox, 1990; Fox & Corbin,
1989) is a 30-item inventory with four specific scales and one general physical
self-worth factor (see Appendix A). Fox (1990) reported factor analyses indicating
that each item loads most highly on the factor that it is designed to measure, and
that individual scale reliabilities are in the .80s. Unlike the standard Likert response
formats used with the PSC and PSDQ instruments, the PSPP uses a nonstandard
response format based on Harter (1985), in which each item consists of a matched
pair of statements, one negative and one positive (e.g., Some people feel that they
are not very good when it comes to sports but Others feel that they are really
good at just about every sport.). Each subscale consists of six items in which
participants are presented with two contrasting descriptions of people, and are
asked which description is most like them and whether the selected description is
sort of true or really true for them. Responses are scored on a 1-to-4 scale in
which 1 represents a really true for me response to the negative statement and 4
represents a really true for me response to the positive statement (for reviews of
problems with this idiosyncratic response scale, see Marsh et al., 1994, and Marsh,
Bar-Eli, et al., 2006; also see Wylie, 1989).
Matching and Closely Related Factors From the Three Instruments. Multitrait-

multimethod analyses have the potential to reveal important problems in the


interpretation of scale scores based on the label that is attached to them. The

460 Marsh, Martin, and Jackson

standard MTMM design is based on strictly parallel measures of the same traits
assessed by different methods. However, this is unlikely to be the case when the
multiple methods consist of different multidimensional instruments independently
developed by different researchers. In particular, the extent to which matching traits
actually match is likely to vary. Recognizing this issue, Marsh et al. (1994) used a
detailed content analyses of items in each of the 23 scales to classify correlations
between factors into four a priori categories used here as well (see Appendix D
for the details of this classification): 9 convergent validities in which the scales
are most closely matched (MTHM-match); 6 convergent validities in which the
scales are less closely matched (MTHM-close); the 152 heterotrait-heteromethod
(HTHM) correlations, between different (nonmatching) traits measured by different
instruments; and 86 heterotrait-monomethod (HTMM) correlations among different
traits measured by the same instrument.

Study 3 Results
CFA Results. We began with separate CFAs for each of the three instruments

(Table 4, Study 3). Consistent with earlier results, the fit of the a priori factor
structure for the 70-item PSDQ is very good (TLI = .964, CFI = .966), while that
for the 40-item PSDQ-S was even better (TLI = .971, CFI = .974). Although the fit
of the a priori model based on the PSC is also good (TLI = .950, CFI = .955), the
fit of the a priori model based on the PSPP is marginal (TLI = .914, CFI = .922).
This pattern of results is consistent with other research based on these instruments
(e.g., Marsh et al., 1994; Marsh, Bar-Eli, et al., 2006)
Next we conducted two separate CFAs on the combined set of items from
all three instruments. Separate analyses were done on the entire set of 135 items
(including 70 PSDQ items) and on the subset of 105 items (including only the 40
PSDQ-S items). Both analyses resulted in very good fit statistics: TLI = .969, CFI
= .970 for the 135 items, and TLI = .967, CFI = .969 for the 105 items. Although
both sets of fit statistics are very good and similar, it is interesting to note that this
is the first analysis in which the goodness of fit based on the 40 PSDQ-S items is
marginally poorer than that based on the 70 PSDQ items. The apparent explanation
for this anomaly is that PSDQ items provide a better fit than the other two instruments (particularly the PSPP), so that reducing the proportion of PSDQ items from
70/135 to 40/105 results in a marginally poorer fit. Consistent with this interpretation, earlier analyses of these date showed that when only PSDQ and PSDQ-S
items were considered, the fit was better for the PSDQ-S items.

MTMM Results. Correlations among the 23 latent constructs (Appendix D)


resemble a traditional MTMM matrix, with two important exceptions. First, the
correlations represent relations among latent constructs that are appropriately
correlated for measurement error, instead of correlations among simple scale scores
that contain measurement error and that may not reflect the underlying empirical
factor structure. Second, unlike the traditional MTMM design, there is no one-toone matching of the traits measured by each of the three instruments (the multiple
methods in this MTMM application). Whereas there is an unambiguous matching
of some of the scales in the three instruments (e.g., the three strength factors), some
scales are not as closely matched (see earlier discussion).

PSDQ Short Form 461

Overall, there is very good support for convergent validity of responses to the
three instruments and the results are similar for the results based on the 70 PSDQ
and 40 PSDQ-S items. The nine convergent validities in the matched category
(appended with the letter a in Appendix D) are very high (mean rs = .82 for PSDQ
and .81 for PSDQ-S; see 9 MTHM-match convergent validities, Study 3 Table 3).
In support of the a priori classification scheme, the six convergent validities in the
less closely matched category (appended with the letter b in Appendix D) are
smaller, but still similar for PSDQ and PSDQ-S analyses (mean rs = .63 for PSDQ
and .64 for PSDQ-S; see 6 MTHM-close convergent validities, Study 3 Table 3).
There is also good support for discriminant validity in that 152 heterotraitheteromethod correlations are substantially smaller than both categories of convergent validity and similar across the PSDQ and PSDQ-S analyses (mean rs =
.37 for PSDQ and .37 for PSDQ-S; see 152 HTHM correlations, Study 3 Table
3). Consistent with the MTMM logic, these HTHM correlations are consistently
smaller than the convergent validities involving the same traits.
In evaluating the latent MTMM correlation matrixes (Appendix D), it is also
useful to evaluate correlations among the factors representing the same instrument
(the HTMM correlations). The means of the 55 correlations among the 11 PSDQ
factors are moderate and similar across the two analyses (.43 for both PSDQ and
PSDQ-S items). The highest correlations involve the global scales and four subscales (coordination, sports competence, physical activity, and endurance), but all
11 PSDQ factors tend to be moderately correlated.
The 10 correlations among the 5 PSPP factors varied from .53 to .91 (mean r
= .71). Whereas the highest correlations tend to involve the global physical selfworth factor, nearly all the correlations are high. These large correlations among
the PSPP factors may call into question the discriminant validity of responses to
this instrument.
Correlations among the 7 PSC factors tend to be the lowest of the three instruments (mean r = .34). Unlike the other two instruments, the highest correlations
among PSC factors do not to involve the general physical satisfaction scale. The
highest correlations are between the body and physical appearance factors, and
the three correlations relating physical activity to physical strength, health, and
competence.
A systematic evaluation of the MTMM matrix of correlations among latent
factors provides strong support for the convergent and discriminant validity of
PSDQ-S responses in relation to those from the other two instruments. Notably,
PSDQ-S convergent validities were generally higher than those involving the other
two instruments. Psychometrically, the PSPP appeared to be the weakest of the three
instruments, resulting in a poorer fit to the data, a systematic method effect related
to the idiosyncratic response scale, and inflated correlations among factors. The
comparison of the PSDQ and PSC instruments was not so straightforward. Whereas
the psychometric properties of the PSDQ appeared to be slightly stronger than those
of the PSC, the differences were not substantial. Hence, the major differences seem
to be the brevity of the PSC compared with the comprehensiveness of the PSDQ.
The juxtaposition of results based on the 70-item PSDQ and the 40-item
PSDQ-S demonstrates that their performances are nearly identical. In particular,
convergent validities are nearly identical in the two analyses (in support of
convergent validity) as are the HTHM and HTMM correlations (in support of

462 Marsh, Martin, and Jackson

discriminant validity). Thus, Study 3 provides particularly strong support for the
equivalence of the PSDQ and PSDQ-S for convergent and discriminant validity in
relation to responses to other physical self-concept instruments.

Discussion
Evaluation of PSDQ-S Results in Relation to Smith et al.s
(2000) Guidelines
In the present investigation, we sought to evaluate the appropriateness of the new,
short version of the widely used and highly regarded PSDQ instrument. Smith et
al. (2000) proposed a broad set of generic guidelines for the evaluation of short
forms of existing psychological measures that provided a valuable starting point
for our evaluation of the PSDQ-S. Marsh et al. (2005) translated what Smith et
al. refer to as the nine (negatively worded) sins of short-form development into
(positively worded) guidelines used here to highlight limitations and directions
for further research, and to discuss new and existing strategies to operationalize
these guidelines.
1. Start with a strong instrument. The existing research literature demonstrates
that the PSDQ is a strong instrument. Although there is a temptation to treat
this first methodological guideline in a token manner, we agree with Marsh
et al. (2005) and Smith et al. (2000) that this recommendation is critically
important, underpinning the entire process of short form development and
many of the subsequent guidelines. Specifically, psychological measures should
be developed from a strong theoretical basis, and this theoretical perspective
should be an important component of the evaluation of corresponding short
forms.
2. Show that the short form retains the content coverage of each factor. In the
development of the PSDQ-S, this was achieved conceptually by using this
guideline as one criterion for selecting items. The following two sets of results
from the present investigation are of particular relevance to this guideline.
First, the invariance of the factor structures based on responses to the full
70-item PSDQ by the normative archive G1 and the new 40-item PSDQ-S
by the cross-validation sample demonstrated that the content coverage of the
two instruments is reasonably invariant. This apparently unusual application
of the missing-by-design approach to the cross-validation of long and short
forms is uniquely suited to the evaluation of Guideline 2 in ways that are not
possible when only the items from the short form are cross-validated. Second,
the juxtaposition of PSDQ and PSDQ-S results in relation to responses to
the two other apparently strongest physical self-concept instruments in the
MTMM analysis (Study 3) provides a rigorous and new test of Guideline 2.
Both sets of analyses provided extremely good and equally strong support for
the convergent and discriminant validity of the PSDQ and PSDQ-S responses.
3. Show that each factor on the short form is adequately reliable. Smith et al.
considered reliability coefficients below .7 to be inadequate; we were successful
in obtaining reliability estimates of at least .8 for each of the PSDQ-S scales.

PSDQ Short Form 463

PSDQ-S results satisfy this criterion.


4. Demonstrate that the short form has adequate overlapping variance with
the original, long form based on independent administrations. Smith et al.
recommend that both the short and long forms be administered to the same
new sample. Although we laud the rigor of this gold standard, we argue
that it is typically impractical and, perhaps, unnecessary or even counterproductive. Problems arise when the instrument is long or takes a long time
to administer, when participants are likely to remember responses to specific
items, when the order of the instruments is not counter-balanced, and when
the new sample is not large and broadly representative of participants in the
original sample or the range of participants for which the instrument is likely
to be used. We argue that our CFA multigroup invariance approach in Study
1 with two independent samplesone completing the short form and one
completing the long formis a practically viable alternative. This alternative
may even be methodologically stronger than the alternative proposed by Smith
et al., particularly if the original sample is a large, broadly representative,
normative archive. It is also important, as shown in the present investigation,
to demonstrate that the factor structure based on the short form provides an
appropriate solution in relation to diverse groups for which the instrument
is likely to be used. Hence, the present investigation provides strong support
for the invariance of the factor structures across the original normative
archive and diverse cross-validation samples, providing clear evidence that
the factors on the short form are highly similar to those on the long form and
thus satisfying this criterion (also see the discussion of Guideline 2, which
overlaps substantially with Guideline 4).
5. Show that the short form retains the factor structure of the original form.
Although our study clearly satisfied this guideline, Smith et al. do not discuss
in detail how to evaluate this guideline. Recent advances in multigroup tests
of factorial invariance demonstrated here provide a strong approach to testing
this criterion that was presented as a more generic guideline by Smith et al.
We note, however, that our approach in no way conflicts with that proposed
by Smith et al. Rather, it provides a stronger, more methodologically elegant
approach to operationalizing tests of this guideline. Our approach to meeting
this guideline overlaps substantially with discussion of Guideline 2, particularly
the missing-by-design approach to multigroup CFA.
6. If appropriate, show that factors on the short form preserve the content of
subdomains or facets of each factor in the long form. This guideline is not
relevant to the present investigation, as the PSDQ factors do not contain
subdomains or facets. Again, however, it would be useful to provide detailed
guidance on how to operationalize this criterion. Logically, appropriate tests
might be based on higher-order CFAs, where subdomains are treated as
separate first-order factors and the domains that they reflect are treated as
higher-order factors (e.g., Marsh, 1985; 2007). These analyses should test the
a priori structure of the measure and multiple group tests of invariance like
those pursued here should demonstrate that the hierarchical structure in the
long form is retained in the short form.
7. Show that each factor has validity in an independent sample. Guideline 7

464 Marsh, Martin, and Jackson

is difficult to implement for multidimensional instruments designed from


a construct validity approach in which there are a potentially large number
of relevant criteria for each factorparticularly when the researcher also
seeks support for all the other guidelines in the same study. Realistically,
this guideline is best satisfied by the accumulated results from an ongoing
research program. Indeed, we argue that it would be counter-productive and
inappropriate to rely on the results from a single study to demonstrate the
construct validity of any psychological measure. Rather, consistent with our
construct validity approach, we argue that researchers should focus initially
on within-construct issues using approaches such as those demonstrated here
as a logical prerequisite to conducting between-construct research. To move
too quickly to potentially superficial between-construct research is to risk
within-construct problems that characterize many psychological measures and
particularly short forms. Importantly, we do not argue for the inappropriateness
of this guideline for the evaluation of a short form, but merely that such an
evaluation must be part of a long-term research program. Where we differ
with Smith et al. is not on the thoroughness of the support that we seek, but on
whether it is necessary to present an evaluation of between-network validity
as part of the same study where the focus is on a within-construct validation
of a new short form in relation to responses from the original, long form.
There is good indirect support for this criterion based on several features of our
research. First, there is solid, accumulated evidence for the construct validity
of PSDQ responses against a wide variety of different validity criteria based
on previous research. Second, there is very strong evidence for the invariance
of responses based on the long and short forms of the PSDQ in the present
investigation. Putting these two features together, it is highly likely that the
comprehensive evaluation of the PSDQ-S in relation to a diverse range of
validity criteria from accumulated research will provide this support.
Finally, Study 3 provides particularly convincing evidence that the strong
support for the convergent and discriminant validity of PSDQ responses in
relation to other physical self-concept instruments generalizes extremely well
to the PSDQ-S. Indeed, at least when based on a multidimensional instrument,
MTMM analysis (with multiple traits assessed by multiple methods, and parallel analyses for the short and long forms) appears to be a particularly strong
approach for evaluation of Guideline 7and provides strong support for the
construct validity of the PSDQ-S and its equivalence with the PSDQ.
In addressing related complications, Stanton et al. (2002; also see Donnellan
et al., 2006; Marsh, Ellis, et al., 2005; Maano et al., 2008) suggested that
relations between responses to the short form should retain the same pattern
of relations with the external criteria as did the long form. This alternative
perspective is more consistent with the construct validity approach in the
present investigation. Indeed, the MTMM approach demonstrated in Study 3
provides an ideal way to the test this assumption.
8. If appropriate, show that classification rates remain high with the short form.
This guideline is not typically appropriate for PSDQ studies, as the PSDQ is
not typically used to classify students into discrete classification categories.
Whereas there may be some limited applications where criterion measures

PSDQ Short Form 465

are true categories, making this approach appropriate, we note that arbitrarily
dividing a reasonably continuous criterion variable into discrete categories is
fraught with well-known methodological problems (for further discussion,
see MacCallum, Zhang, Preacher, & Rucker, 2002). Thus, this criterion may
have limited application. Hence, we recommend that this guideline should be
used with due caution unless the criterion variables truly represent discrete
categories.
9. Show that the tradeoff in savings of time and resources is acceptable in relation
to potential loss of validity. Realistically, this criterion has to be evaluated in
relation to a particular application. Particularly in the context of using the
PSDQ-S as part of a more extensive test battery, reducing the number of items
by 30 items can be argued to reduce the testing time by nearly half. Based on
our psychometric analyses, the losses in reliability and validity appear to be
small (e.g., the mean reliability was .89 for the long form and .85 for the short
form based on the normative archive sample that completed the long form, but
was .89 and .91 in the cross-validation adolescent and older adult samples that
completed the short form). Particularly in relation to the results of the MTMM
studies based on responses to three different physical self-concept instruments,
there appears to be almost no loss at all in relation to support for convergent
and discriminant validity. Furthermore, the goodness-of-fit statistics were
consistently better for the PSDQ-S than the PSDQ. From these perspectives,
the tradeoff is appropriate for a researcher who places a high value on reducing
the number of items.

Limitations and Directions for Further Research


Statistical methodology is advancing at such a rapid pace that there are potential
statistical limitations in all applied research. One of these is the missing data problem. Because there was so little missing data (0.4%) in the present investigation,
the use of single imputation (or almost any other procedure; Graham, 2009) is justified. Indeed, when there is little missing data, Graham (2009, p. 556) specifically
recommends the use of the expectation-maximization (EM) algorithm to impute a
single data set, stating that, This single imputed data set is known to yield good
parameter estimates, close to the population average. However, evolving missing data strategies (e.g., multiple imputations, use of auxiliary variables, and full
information maximum likelihood; see Graham, 2009) should be used in studies
with larger amounts of missing data.
In the present investigation we used continuous variable CFAs for 6-category,
ordinal PSDQ responses. Our rationale is that: (a) ordinal CFA models (e.g.,
Jreskog, 1994) require sample sizes far in excess of those in our six samples;
Jreskog and Srbom, 1996, suggested a minimum sample size of 1.5k (k + 1),
where k = number of items (i.e., N = 7455 for 70 items); (b) ordinal models of multigroup invariance (e.g., Millsap & Yun-Tein, 2004) are considerably more complex,
requiring even larger sample sizes; (c) simulation studies have consistently shown
that ML estimates are relatively unbiased when based on approximately normally
distributed, ordered categorical data with at least five categories (Babakus, Ferguson,
& Jreskog, 1987; Beauducel & Herzberg, 2006; DiStefano, 2002; Dolan, 1994;
Johnson & Creech, 1983; Muthn & Kaplan, 1985; West, Finch, & Curran, 1995).

466 Marsh, Martin, and Jackson

For example, the classic study by Muthn and Kaplan concluded that alternatives
like ordinal CFA are likely to be intractable for analyses with more than 2530
measured variables so that:
It is therefore reassuring to find that these normal theory estimators perform
quite well even with ordered categorical and moderate skewed/kurtotic variables, at least with the sample size is not small. . . . If most variables have
univariant skewness and kurtoses in the range of 1.0 to +1.0, not much distortion can be expected. (p. 187)
Other research has focused more on the number of response categories,
leading Beauducel and Herzberg to conclude that ordinal CFA procedures do not
out-perform maximum likelihood estimation when the number of categories is
large. The critical issues seem to be having at least five categories, and not having
excessive nonnormality (e.g., skews and kurtoses consistently greater than 2.0).
Importantly, the PSDQ uses a 6-category response scale and, for the 70 items, the
skews (2 to +.20; Mdn = .67; 83% less than 1 in absolute value) and kurtoses
(1.43 to 3.68; Mdn = .30; 95% less than 1 in absolute value) were not sufficiently
large to undermine the robustness of the maximum likelihood estimation (e.g.,
Muthn & Kaplan, 1985; Boomsma, 1982; Boomsma & Hoogland, 2001; Hau &
Marsh, 2004). However, for studies with fewer than four response categories and
more extreme nonnormality, it may be necessary to juxtapose continuous variable
CFAs with evolving ordinal approaches to CFA.
In the present investigation we used the venerable coefficient alpha as a measure
of internal consistency for preliminary comparisons of results for the PSDQ and
PSDQ-S in the different samples. This approach is justified in that, under appropriate
circumstances (e.g., no correlated measurement errors), coefficient alpha provides
a lower-bound estimate of reliability that is consistent with the use of summated
scale scores, and provides a balance between the number of items and the degree
of correlation among the items. Importantly, the computation of coefficient alpha
was only a starting point and was followed by detailed CFA evaluations of the
structure of the responses to the PSDQ and PSEDQ-S. However, as pointed out by
Sijtsma (2009), coefficient alpha does not index whether the test is unidimensional,
1-factorial, consistent, or homogeneous. Although a detailed critique of coefficient
alpha and alternative estimates of reliability is beyond the scope of this article (but
see Green & Yang, 2009; Sijtsma, 2009), it is important to note that the decisions
on the selection of items based on of item-total statistics from the coefficient alpha
analyses were essentially identical to those based on factor loadings for the subsequent CFAs (so much so that the two were combined as the initial basis for selection
of items; see earlier discussion). Nevertheless, reliability estimates presented here
are likely to be slightly conservative in relation to evolving CFA-based approaches
to reliability (e.g., Bentler, 2009; McDonald, 1999; Raykov & Shrout, 2002; also
see Green & Yang, 2009; Sijtsma, 2009).
Since publication of the Marsh et al. (2005) study, there have been several
other methodologically oriented studies on how to develop short forms (e.g.,
Donnellan et al., 2006; Maano et al., 2008; also see Stanton et al., 2002). Each
of these studies is largely consistent with the recommendations by Marsh et al.
and their extension in the present investigation. One possible qualification to this
conclusion is a greater emphasis on the use of external criteria to compare sup-

PSDQ Short Form 467

port for the construct validity of the new short form with the original long form.
Marsh et al. argued that validation in relation to an appropriately diverse range of
external criteria was typically beyond the scope of the first study to introduce a
new short form and required ongoing research, except, perhaps, for a measure that
was only designed to measure one very narrowly defined criterion. However, in the
present investigation we demonstrated a potentially important MTMM approach
that should be useful in comparing long and short forms of multidimensional
constructs in relation to appropriately selected external criteria. While we applied
this approach to responses to the three major physical self-concept instruments, the
MTMM approach could also be applied to other sets of criteria such as the ratings
by significant others (e.g., coaches, teachers, parents, peers), or a profile of objective
measures that are selected to match the factors in the long and short forms of the
instrument in question. Indeed, we suggest that our MTMM approach in Study 3
is a methodologically stronger strategy for comparing the long and short forms in
relation to relationships with external criteria, and more fully integrates this goal
into an overall construct validity perspective.

Summary Conclusion
Particularly given that Smith et al. (2000) found no research that adequately satisfied
their guidelines, we judge that the PSDQ-S was quite successful even in relation to
these ideal standards. Major caveats to this success is our ad hoc cross-validation
samples that were not really matched to the normative sample used to select PSDQ-S
items and limited support for between-network construct validity based on new data
for participants who only completed the PSDQ-S (Guideline 7). While there are
clearly advantages in using a carefully matched cross-validation sample to test the
short form, there are also limitations in the generalizability of the results to other
samples for which the instrument might be used. In this respect, we recommend that
researchers consider long and short forms from a variety of different samples, as in
the present investigation. We also recognize that an important focus of the Smith et
al. (2000) guidelines was based on newly collected data using only the items that
comprise the short form. While we clearly agree that this is an important aspect,
limiting the study to only newly collected data is likely to be counter-productive,
particularly when the short-form items are a proper subset of the long-form items
and there is a wealth of studies using the long form that can be reanalyzed using
the items on the short form. In the present investigation, for example, it was very
useful to demonstrate such strong support for PSDQ-S factor structure from such
diverse samplessome who completed the 70-item PSDQ and some who completed
the 40-item PSDQ-S. In this respect, we argue that the reanalysis of existing data
based on the long form and its comparison with parallel analyses based on the items
in the short form typically should be the central component of the evaluation of
newly devised short forms.
In summary, the results of the present investigation demonstrate that the strong
support for the psychometric properties and construct validity of the widely used
PSDQ instrument generalize very well to the short-form PSDQ-S. Thus, researchers who place a high value on reducing the number of items are advised to use
the PSDQ-S rather than the PSDQ. We also provide new insights into guidelines
for the construction and evaluation of short forms, and strategies for how these

468 Marsh, Martin, and Jackson

can be operationalized. No longer can a researcher simply select an ad hoc subset


of items from a long form and claim that this subset of items constitutes a viable
alternative to the long form. In this respect, the present investigation represents a
substantive-methodological synergy that focuses on both the substantive aim of
evaluating the PSDQ-S and the methodological aim of how to evaluate short forms
more effectively.
Acknowledgments
This research was supported in part by a grant from the UK Economic and Social Research
Council, a visiting fellowship from the University of Sydney, and an Australian Research
Council (ARC) grant. The authors would like to thank Ines Tomas, Gary Richards, Michael
Bar-Eli, Sima Zach, Clark Perry, and other colleagues who have contributed to earlier
PSDQ research; to Alexandre Morin, Benjamin Nagengast, Arief Liem, and Jackie Cheng
for helpful comments on earlier versions of this article; and to Wendy Brown for work on
the overall ARC project.

References
Alaphilippe, D. (2008). Self-esteem in the elderly. Psychologie & Neuropsychiatrie du
Vieillissement, 6, 167176.
Allison, P.D. (1987). Estimation of linear models with incomplete data. In C.C. Clogg (Ed.),
Sociological Methodology (pp. 71103). San Francisco: Jossey-Bass.
Babakus, E., Ferguson, E., & Jreskog, K.G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional
assumptions. JMR, Journal of Marketing Research, 24, 222228.
Barrett, P. (2007). Structural equation modeling: Adjudging model fit. Personality and
Individual Differences, 42, 815824.
Beauducel, A., & Herzberg, P.Y. (2006). On the performance of maximum likelihood versus
means and variance adjusted weighted least squares estimation in CFA. Structural
Equation Modeling, 13, 186203.
Bentler, P.M. (2009). Alpha, dimension-free, and model-based internal consistency reliability.
Psychometrika, 74, 137143.
Boomsma, A. (1982). Robustness of LISREL against small sample sizes in factor analysis
models. In K.G. Jreskog & H. Wold (Eds.), Systems under indirect observation:
Causality, structure, prediction (Vol. 1, pp. 149173). Amsterdam: North-Holland.
Boomsma, A., & Hoogland, J.J. (2001). The robustness of LISREL modeling revisited. In
R. Cudeck, S. du Toit, & D. Srbom (Eds.), Structural equation modeling: Present
and future. A Festschrift in honor of Karl Jreskog (pp. 139168). Lincolnwood, IL:
Scientific Software International.
Brandtstadter, J., & Greve, W. (1994). The aging self - stabilizing and protective processes.
Developmental Review, 14, 5280.
Byrne, B.M. (1996). Measuring self-concept across the life span: Issues and instrumentation. Washington, DC: American Psychological Association.
Byrne, B.M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMPLIS:
Basic concepts, applications and programming. Mahwah, NJ: Erlbaum.
Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the
multitrait-multimethod matrix. Psychological Bulletin, 56, 81105.
Campbell, D.T., & OConnell, E.J. (1967). Method factors in multitrait-multimethod matrices: Multiplicative rather than additive? Multivariate Behavioral Research, 2, 409426.

PSDQ Short Form 469

Carstensen, L.L., & Freund, A.M. (1994). The resilience of the aging self. Developmental
Review, 14, 8192.
Chen, F.F. (2007). Sensitivity of goodness of fit indices to lack of measurement invariance.
Structural Equation Modeling, 14, 464504.
Chen, F., Curran, P.J., Bollen, K.A., Kirby, J., & Paxton, P. (2008). An empirical evaluation
of the use of fixed cutoff points in RMSEA test statistic in structural equation models.
Sociological Methods & Research, 36, 462494.
Cheung, G.W., & Rensvold, R.B. (2002). Evaluating goodness-of-fit indexes for testing
measurement invariance. Structural Equation Modeling, 9, 233255.
DiStefano, C. (2002). The impact of categorization with confirmatory factor analysis. Structural Equation Modeling, 9, 327346.
Donnellan, M.B., Oswald, F.L., Baird, B.M., & Lucas, R.E. (2006). The Mini-IPIP scales:
Tiny-yet-effective measures of the Big Five factors of personality. Psychological
Assessment, 18, 192203.
Dolan, C.V. (1994). Factor analysis of variables with 2, 3, 5 and 7 response categories: a
comparison of categorical variable estimators using simulated data. The British Journal
of Mathematical and Statistical Psychology, 47, 309326.
Fleishman, F.A. (1964). The structure and measurement of physical fitness. Englewood
Cliffs, NJ: Prentice-Hall.
Fox, K.R. (1990). The Physical Self-Perception Profile Manual. DeKalb, Ill: Office for
Health Promotion, Northern Illinois University.
Fox, K.R., & Corbin, C.B. (1989). The physical self-perception profile: Development and
preliminary validation. Journal of Sport & Exercise Psychology, 11, 408430.
Graham, J.W. (2009). Missing data analysis: Making it work in the real world. Annual
Review of Psychology, 60, 549576.
Graham, J.W., & Hofer, S.M. (2000). Multiple imputation in multivariate research. In T.D.
Little, K.U. Schnable, & J. Baumert (Eds.), Modeling longitudinal and multilevel data:
Practical issues, applied approaches, and specific examples (pp. 201218). Mahwah,
NJ: Erlbaum.
Green, S.B., & Yang, Y. (2009). Commentary on coefficient alpha: a cautionary tale. Psychometrika, 74, 121135.
Harter, S. (1985). Manual for the Self-perception Profile for Children. Denver, CO: University of Denver.
Hau, K.T., & Marsh, H.W. (2004). The use of item parcels in structural equation modelling:
Nonnormal data and small sample sizes. The British Journal of Mathematical and
Statistical Psychology, 57, 327351.
Jackson, S.A., Miller, Y.D., Cotterill, T., Brown, W., & ODwyer, S. (2007, September).
Physical activity in the lives of older adults: The relative contribution of demographic,
health, psychological, social, and environmental correlates. Paper presented at the
Annual Australian Psychological Society Conference, Brisbane, Australia.
Johnson, D.R., & Creech, J.C. (1983). Ordinal measures in multiple indicator models: A
simulation study of categorization error. American Sociological Review, 48, 398407.
Jreskog, K.G. (1979). Statistical estimation of structural models in longitudinal investigations. In J.R. Nesselroade & B. Baltes (Eds.), Longitudinal research in the study of
behavior and development (pp. 303351). New York: Academic Press.
Jreskog, K.G. (1994). Structural equation modeling with ordinal variables. Lecture
NotesMonograph Series, Vol. 24, Multivariate analysis and its applications (1994),
pp. 297310. Stable URL: http://www.jstor.org/stable/4355811
Jreskog, K.G., & Srbom, D. (1996). LISREL 8: Structural equation modeling with the
SIMPLIS command language. Chicago: Scientific Software International.
Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Newbury
Park, CA: Sage.

470 Marsh, Martin, and Jackson

Levy, P. (1968). Short-form tests: A methodological review. Psychological Bulletin, 69,


410416.
Little, R.J.A., & Rubin, D.B. (1987). Statistical analysis with missing data. New York: Wiley.
MacCallum, R.C., Zhang, S., Preacher, K.J., & Rucker, D.D. (2002). On the practice of
dichotomization of quantitative variables. Psychological Methods, 7, 1940.
Maano, C., Morin, A.J.S., Ninot, G., Monthuy-Blanc, J., Stephan, Y., Florent, J-F., et al.
(2008). A short and very short form of the physical self-inventory for adolescents:
Development and factor validity. Psychology of Sport and Exercise, 9, 830847.
Marsh, H.W. (1985). The structure of masculinity/femininity: An application of confirmatory
factor analysis to higher-order factor structures and factorial invariance. Multivariate
Behavioral Research, 20, 427449.
Marsh, H.W. (1988). Multitrait-multimethod analyses. In J.P. Keeves (Ed.), Educational
research methodology, measurement and evaluation: An international handbook.
Oxford: Pergamon Press.
Marsh, H.W. (1989). Age and sex effects in multiple dimensions of self-concept: Preadolescence to Early-adulthood. Journal of Educational Psychology, 81, 417430.
Marsh, H.W. (1993a). The multidimensional structure of physical fitness: Invariance over
gender and age. Research Quarterly for Exercise and Sport, 64, 256273.
Marsh, H.W. (1993b). Multitrait-multimethod analyses: Inferring each trait/method combination with multiple indicators. Applied Measurement in Education, 6, 4981.
Marsh, H.W. (1993c). Physical fitness self-concept: Relations to field and technical indicators of physical fitness for boys and girls aged 9-15. Journal of Sport & Exercise
Psychology, 15, 184206.
Marsh, H.W. (1996a). Construct validity of Physical Self-Description Questionnaire
responses: Relations to external criteria. Journal of Sport & Exercise Psychology,
18(2), 111131.
Marsh, H.W. (1996b). Physical Self Description Questionnaire: Stability and discriminant
validity. Research Quarterly for Exercise and Sport, 67, 249264.
Marsh, H.W. (1997). The measurement of physical self-concept: A construct validation
approach. In K. Fox (Ed.), The physical self-: From motivation to well-being (pp.
2758). Champaign, IL: Human Kinetics.
Marsh, H.W. (2002). A multidimensional physical self-concept: A construct validity approach
to theory, measurement, and research. Psychology: The Journal of the Hellenic Psychological Society, 9, 459493.
Marsh, H.W. (2007). Application of confirmatory factor analysis and structural equation modeling in sport and exercise psychology. In G. Tenenbaum & R.C. Eklund (Eds.), Handbook of sport psychology (3rd ed., pp. 774798). Hoboken, NJ: John Wiley & Sons Inc.
Marsh, H.W., Asci, F.H., & Tomas-Marco, I. (2002). Multitrait-multimethod analyses of
two physical self-concept instruments: A cross-cultural perspective. Journal of Sport
& Exercise Psychology, 24, 99119.
Marsh, H.W., Balla, J.R., & Hau, K.T. (1996). An evaluation of incremental fit indices: A
clarification of mathematical and empirical processes. In G.A. Marcoulides & R.E.
Schumacker (Eds.), Advanced structural equation modeling techniques. Hillsdale, NJ:
Erlbaum.
Marsh, H.W., Balla, J.R., & McDonald, R.P. (1988). Goodness-of-fit indices in confirmatory
factor analysis: The effect of sample size. Psychological Bulletin, 103, 391410.
Marsh, H.W., Bar-Eli, M., Zach, S., & Richards, G.E. (2006). Construct validation of Hebrew
versions of three physical self-concept measures: An extended multitrait-multimethod
analysis. Journal of Sport & Exercise Psychology, 28, 310343.
Marsh, H.W., Ellis, L., Parada, L., Richards, G., & Heubeck, B.G. (2005). A short version of
the Self Description Questionnaire II: Operationalizing criteria for short-form evaluation with new applications of confirmatory factor analyses. Psychological Assessment,
17, 81102.

PSDQ Short Form 471

Marsh, H.W., & Grayson, D. (1995). Latent-variable models of multitrait-multimethod


data. In R.H. Hoyle (Ed.), Structural equation modeling: Issues and applications (pp.
177198). Thousand Oaks, CA: Sage.
Marsh, H.W., & Hattie, J. (1996). Theoretical perspectives on the structure of self-concept.
In B.A. Bracken (Ed.), Handbook of self-concept (pp. 3890). New York: Wiley.
Marsh, H.W., & Hau, K-T. (1996). Assessing goodness of fit: Is parsimony always desirable?
Journal of Experimental Education, 64, 364390.
Marsh, H.W., Hau, K.T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis
testing approaches to setting cutoff values for fit indexes and dangers in overgeneralising
Hu & Bentlers (1999) findings. Structural Equation Modeling, 11, 320341.
Marsh, H.W., Hey, J., Roche, L.A., & Perry, C. (1997). The structure of physical self-concept:
Elite athletes and physical education students. Journal of Educational Psychology, 89,
369380.
Marsh, H.W., & Hocevar, D. (1988). A new procedure for analysis of multitrait-multimethod
data: An application of second order confirmatory factor analysis. The Journal of
Applied Psychology, 73, 107111.
Marsh, H.W., Perry, C., Horsely, C., & Roche, L. (1995). Multidimensional self-concepts
of elite athletes: How do they differ from the general population? Journal of Sport &
Exercise Psychology, 17, 7083.
Marsh, H.W., Richards, G.E., Johnson, S., Roche, L., & Tremayne, P. (1994). Physical Self
Description Questionnaire: Psychometric properties and a multitrait-multimethod
analysis of relations to existing instruments. Journal of Sport & Exercise Psychology,
16, 270305.
Marsh, H.W., Smith, I.D., Barnes, J., & Butler, S. (1983). Self-concept: Reliability, stability, dimensionality, validity and the measurement of change. Journal of Educational
Psychology, 75, 772790.
Marsh, H.W., Tomas-Marco, I., & Asci, F.H. (2002). Cross-cultural validity of the physical
self description questionnaire: Comparison of factor structures in Australia, Spain, and
Turkey. Research Quarterly for Exercise and Sport, 73(3), 257270.
Marsh, H.W., Tracey, D.K., & Craven, R.G. (2006). Multidimensional self-concept structure for preadolescents with mild intellectual disabilities: A hybrid multigroup-mimic
approach to factorial invariance and latent mean differences. Educational and Psychological Measurement, 66, 795818.
McDonald, R.P. (1999). Test theory: A unified approach. Hillsdale: Erlbaum.
Millsap, R.E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical
measures. Multivariate Behavioral Research, 39, 479515.
Muthn, B.O., & Kaplan, D. (1985). A comparison of some methodologies for the factor
analysis of non-normal likert variables. The British Journal of Mathematical and Statistical Psychology, 38, 171189.
Muthen, B.O., Kaplan, D., & Hollis, M. (1987). On structural equation modelling with data
that are not missing completely at random. Psychometrika, 52, 431462.
Parker, P.D., Martin, A.J., Martinez, C., Marsh, H.W., & Jackson, S.A. (2010). Stages of
change in physical activity: A validation study in late adolescents. Health Education
& Behavior, 37, 318329.
Raykov, T., & Shrout, P. (2002). Reliability of scales with general structure: Point and
interval estimation using a structural equation modeling approach. Structural Equation
Modeling, 9, 195212.
Richards, G.E. (1988). Physical Self Concept Scale. Sydney: Australian Outward Bound
Foundation.
Shavelson, R.J., Hubner, J.J., & Stanton, G.C. (1976). Self-concept: Validation of construct
interpretations. Review of Educational Research, 46, 407441.
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbachs
alpha. Psychometrika, 74, 107120.

472 Marsh, Martin, and Jackson

Smith, G.T., & McCarthy, D.M. (1995). Methodological considerations in the refinement of
clinical assessment instruments. Psychological Assessment, 7, 300308.
Smith, G.T., McCarthy, D.M., & Anderson, K.G. (2000). On the sins of short-form development. Psychological Assessment, 12, 102111.
Stanton, J.M., Sinar, E.F., Balzer, W.K., & Smith, P.C. (2002). Issues and strategies for
reducing the length of self-report scales. Personnel Psychology, 55, 167194.
West, S.G., Finch, J.F., & Curran, P.J. (1995). Structural equation models with nonnormal
variables: Problems and remedies. In R.H. Hoyle (Ed.), Structural equation modeling
(pp. 5675). Newbury Park: Sage.
Wylie, R.C. (1989). Measures of self-concept. Lincoln: University of Nebraska Press.
Manuscript received: November 1, 2009
Revision accepted: April 1, 2010

Appendix A
Summary Description of the 11 PSDQ Scales and Number
of Items in Short and Long Versions
Note. Each item is denoted by a two-letter abbreviation for the factor. For the
PSDQ, the two numbers in parentheses at the end of each scale refer to the number
of items on the original (long) PSDQ and the new (short) PSDQ-S. The other
instruments, only used in Study 3, also have the number of items in parentheses.

Marsh Physical Self-Description Questionnaire (PSDQ)


1. Health (HE): Not getting sick often, getting well quickly (8,5).
2. Coordination (CO): Being good at coordinated movements, smooth physical
movements (6,5).
3. Activity (AC): Being physically active, doing lots of physical activities regularly
(6,3).
4. Body Fat (BF): Not being overweight, not being too fat (6,3).
5. Sport (SP): Being good at sports, being athletic, having good sports skills (6,3).
6. Global Physical (GP): Feeling positive about ones physical self (6,3).
7. Appearance (AP): Being good looking, having a nice face (6,3).
8. Strength (ST): Being strong, a powerful body, lots of muscles (6,3).
9. Flexibility (FL): Being able to bend and turn your body easily in different
directions (8,3).
10. Endurance (EN): Being able to run a long way without stopping, not tiring
when exercising hard (6,3).
11. Global Esteem (ES): Overall positive feelings about self (6,5).
Foxs Physical Self-Perception Profile (PSPP)
12. Strength (FST): physical strength (6).
13. Body (FBD): have an attractive body, have an admired physique or figure (6).
14. Condition (FCON): partake of regular exercise, maintain a high level of stamina
and fitness (6).
15. Sport (FSP): sporting competence and skills (6).
16. Global Physical (FGP): overall physical self worth (6).
Richardss Physical Self-Concept (PSC)
17. Body (RBD): self-perceptions of body shape and proportion (5).
18. Physical Satisfaction (RSA): Reflects a desire to be different in each of the
other components (e.g., I would like to be more. . . .) (5).
19. Appearance (RAP): self-perceptions of being physically attractive (5).
20. Health (RHE): self-perceptions of being physically healthy (5).
21. Action (RAC): self-perceptions of being physically active (5).
22. Competence (RCOM): physical coordination and agility (5).
23. Strength (RST): self-perceptions of physical strength (5).
473

474

ac
.63
.41
.30
.29

ac
.61
.77
.84

ac
ap
bf
co
en
es
fl
gp
he
sp
st

Ac
1.00
.39
.29
.67
.78
.50
.55
.48
.10
.68
.54

Factor Correlations

1
2
3
4
5

Item Uniquenesses

1
2
3
4
5

Factor Loadings

1.00
.43
.50
.47
.69
.44
.69
.10
.53
.58

ap

ap
.29
.37
.17

ap
.84
.79
.91

1.00
.41
.49
.47
.36
.60
.15
.36
.19

bf

bf
.31
.15
.20

bf
.83
.92
.90
.84

1.00
.76
.67
.80
.61
.23
.79
.60

co

co
.62
.57
.35
.34
.48

co
.61
.65
.81
.81
.72

1.00
.61
.70
.59
.18
.78
.60

en

en
.43
.33
.42

en
.75
.82
.76

1.00
.53
.84
.23
.67
.63

Es

es
.60
.54
.41
.33
.60

es
.64
.68
.77
.82
.63

Archive Sample

1.00
.48
.19
.58
.55

fl

fl
.38
.28
.33

fl
.79
.85
.82

1.00
.13
.64
.57

gp

gp
.35
.21
.21

gp
.81
.89
.89

CFA Solutions for Normative Archive and Cross-Validation Samples

Appendix B

1.00
.1431
.20

he

he
.58
.57
.38
.50
.49

he
.65
.65
.79
.71
.71

.000
.69

sp

Sp
.22
.20
.19

sp
.88
.90
.90

1.00
(continued)

St
.39
.28
.36

st
.78
.85
.80

475

ac
.46
.34
.18
.26

ac
.73
.81
.91
.86

ac
1.00
.31
.16
.63
.78
.51
.51
.51
.51
.51
.51

1.00
.31
.37
.35
.58
.59
.60
.61
.62
.63

ap

ap
.35
.40
.15

ap
.81
.78
.92

1.00
.24
.33
.41
.60
.55
.28
.25
.09

Bf

bf
.25
.20
.33

Bf
.87
.90
.82

1.00
.59
.63
.57
.59
.13
.77
.69

co

Co
.44
.48
.26
.39
.25

Co
.75
.72
.86
.78
.87

1.00
.60
.43
.59
.09
.68
.66

en

en
.30
.18
.25

En
.83
.91
.87

1.00
.39
.77
.26
.66
.62

es

es
.43
.72
.33
.38
.74

es
.75
.53
.82
.79
.51

Cross-Validation Sample

1.00
.41
.01
.37
.45

fl

fl
.31
.22
.28

fl
.83
.88
.85

1.00
.19
.65
.61

gp

Gp
.28
.21
.19

gp
.85
.89
.90

1.00
.149
.07

He

he
.48
.45
.24
.53
.40

He
.72
.74
.87
.68
.78

1.00
.71

sp

sp
.20
.12
.14

sp
.90
.94
.93

1.00

st
.23
.24
.22

st
.88
.87
.89

Note. Solutions for the two groups were standardized to a common metric, but no parameters were constrained to be invariant across two
groups. In this respect, the chi-square and df for Model SG1B (CFA of 40 items by the Archive group) and SG4B (CFA of 40 item by the crossvalidation group) sum to equal the corresponding values for Model MG1A (corresponding multiple group test with no invariance constraints).

ac
ap
bf
co
en
es
fl
gp
he
sp
st

Factor Correlations

1
2
3
4
5

Item Uniquenesses

1
2
3
4
5

Factor Loadings

Appendix B, continued

Appendix C
MTMM Matrix of Factor Correlations Relating T1 and T2 Responses
(See Model LGIf, Table 4) Solutions for PSDQ-S Responses
Time 1
T1AC

T1AP

T1BF

T1CO

T1EN

T1ES

T1FL

T1GP

T1HE

T1SP

T1ST

1
.21
.26
.46
.80
.34
.45
.48
.09
.58
.54

1
.20
.38
.33
.55
.33
.48
.07
.31
.42

1
.33
.38
.23
.31
.51
.14
.25
.20

1
.55
.60
.66
.60
.20
.68
.61

1
.44
.56
.58
.12
.68
.68

1
.48
.70
.34
.42
.57

1
.58
.16
.43
.60

1
.38
.46
.63

1
.03
.20

1
.58

.39
.37
.44
.51
.52
.56
.48
.77
.35
.40
.54

.06
.05
.07
.19
.10
.24
.14
.28
.59
.03
.22

.50
.27
.26
.61
.63
.35
.39
.40
.05
.91
.54

.45
.32
.17
.51
.60
.46
.49
.50
.17
.52
.86

-.17
-.05
.07

-.04
-.26
-.02

-.27
-.34
-.01

-.19
-.14
.07

Time 1
T1AC
T1AP
T1BF
T1CO
T1EN
T1ES
T1FL
T1GP
T1HE
T1SP
T1ST

Time 2
.76
.20
.24
.37
.68
.27
.39
.16
.78
.19
.30
.27
.41
.27
T2BF
.25
.18
.82
.31
.34
.18
.28
T2CO
.41
.34
.32
.80
.49
.50
.55
T2EN
.73
.30
.35
.50
.89
.38
.48
T2ES
.27
.49
.21
.47
.36
.78
.39
T2FL
.38
.31
.28
.51
.47
.42
.80
T2GP
.37
.41
.42
.48
.49
.55
.46
T2HE
.11
.10
.11
.17
.12
.29
.14
T2SP
.54
.11
.23
.60
.64
.38
.37
T2ST
.48
.40
.19
.53
.62
.51
.51
Gender (FEM = Female), Age Group (SEN = Senior)
FEM
-.10 -.06 -.14 -.19 -.21 -.13 -.01
SEN
-.22 -.15 -.24 -.17
.32
-.10
.07
FxS
.04
.06
.09
.08
.03
.05
.02
T2AC
T2AP

(continued)

476

Appendix C, continued
Time 2
T2AC

Gender & Age


T2AP

T2BF

T2CO

T2EN

T2ES

T2FL

T2GP

T2HE

T2SP

T2ST

1
.03
.20

1
.58

-.04
-.26
.02

-.27
-.34
-.01

-.19
-.01
.00

FEM

Agrp

FAG

1.0
-01
.00

1.0
.00

1.0

Time 1
T1AC
T1AP
T1BF
T1CO
T1EN
T1ES
T1FL
T1GP
T1HE
T1SP
T1ST

Time 2
1
.21
1
T2BF
.26 .20
1
T2CO
.46 .38 .33
1
T2EN
.80 .33 .38 .55
1
T2ES
.34 .55 .23 0.6 .44
1
T2FL
.45 .33 .31 .66 .56 .48
1
T2GP
.48 .48 .51 .60 .58 .70 .58
1
T2HE
.09 .07 .14 .20 .12 .34 .16 .38
T2SP
.58 .31 .25 .68 .68 .42 43 .46
T2ST
.54 .42 .20 .61 .68 .57 .60 .63
Gender (FEM = Female), Age Group (SEN = Senior)
FEM
-.1 -.06 -.14 -.19 -.21 -.13 -.01 -.17
SEN
-.22 -.15 -.24 -.17 -.32 .10 -.07 .05
FxS
.04 .06 .09 .08 .03 .05 .02 .07
T2AC
T2AP

Note. The shaded coefficients are testretest correlations for matching PSDQ-S factors at T1 and T2 (the convergent
validities in a MTMM analysis). For a summary of the MTMM coefficients, see Table 3 (Study 2). Statistically
significant correlations are greater than .07 (p < .05) and .10 (p < .01).

477

478

MST MBF MAC MEN MSP MCO MHE MAP


Marshs PDSQ (70 Items)
MST
1.0
MBF
.11 1.0
MAC
.60 .13 1.0
MEN
.62 .31 .61 1.0
MSP
.72 .31 .62 .68 1.0
MCO
.52 .38 .46 .45 .63 1.0
MHE
.23 .29 .11 .20 .26 .35 1.0
MAP
.32 .44 .09 .25 .36 .51 .26 1.0
MFL
.50 .27 .42 .44 .41 .59 .24 .28
MGP
.60 .60 .50 .55 .68 .62 .32 .54
MES
.36 .51 .22 .32 .42 .57 .54 .65
Foxs PSPP
FST
.77a .15 .40 .42 .54 .34 .16 .27
FBD
.38 .62b .20 .36 .41 .40 .18 .63b
FCON
.55 .37 .64b .67b .65 .44 .20 .29
FSP
.59 .33 .53 .58 .82a .54 .23 .35
FGP
.50 .53 .38 .48 .57 .45 .20 .43
Richardss PSC
RST
.92a .04 .47 .47 .59 .37 .16 .26
RBD
.39 .70b .22 .36 .43 .52 .26 .69b
RAC
.48 .19 .53b .52 .61 .45 .34 .23
RCOM .45
.24 .42 .36 .51 .82a .23 .38
RHE
.25 .13 .10 .22 .23 .27 .87a .26
RAP
.25 .35 .09 .14 .24 .42 .25 .88a
RSA
.26 .35 .22 .23 .35 .35 .17 .24
1.0
.59
.42
.61a
.61
.59
.75
.44
.66
.39
.47
.24
.45
.43

1.0
.45
.39
.34
.27
.37
.33
.31
.38
.37
.32
.57
.18
.20
.25

.25
.59
.32
.43
.36
.54
.32

.31
.47
.34
.41
.52

1.0
1.0
.67
.63
.89

.77a .35
.34 .77a
.35 .25
.30 .31
.14 .16
.20 .58
.26 .39

1.0
.53
.64
.65
.63
.49
.45
.56
.34
.19
.23
.38

1.0
.87
.82

.53
.46
.55
.49
.25
.27
.44

1.0
.81

.46
.62
.36
.40
.16
.39
.44

1.0
1.0
.36
.42
.43
.21
.27
.23

1.0
.34
.49
.25
.72
.35

1.0
.45
.48
.23
.17

1.0
.26
.41
.36

1.0
.31
.13

1.0
(continued)

1.0
.17

MFL MGP MES FST FBD FCO FSP FGP RST RBD RAC RCOM RHE RAP RSA

MTMM Matrix of Factor Correlations Relating Responses to Three Physical Self-concept Instruments
Based on the PSDQ (70 Items) and the PSDQ-S (40 Items)

Appendix D

479

.76a .35
.34 .77b
.35 .25
.30 .31
.13 .16
.20 .58
.26 .39

.33
.63
.31
.49
.34
.57
.32

1.0
.67
.63
.89

1.0
.53
.64
.65
.63

.34
.51
.40
.47
.55

1.0

.49
.45
.56
.34
.19
.23
.38

1.0
.87
.82
.52
.46
.56
.49
.25
.27
.44

1.0
.81

.46
.62
.36
.40
.16
.39
.44

1.0
1.0
.36
.42
.43
.21
.27
.23

1.0
.34
.49
.25
.71
.35

1.0
.45
.48
.23
.17

1.0
.26
.41
.36

1.0
.31
.12

1.0
.16

1.0

Note. Two confirmatory factor analyses (CFAs) each posited 23 physical self-concept factors presenting the three self-concept instruments: Marsh PSDQ, 11 factors; Fox PSPP,
5 factors; Richards PSC, 7 factors. In the a priori model, an independent clusters model was posited in which each item was allowed on one and only one factor (i.e., the one
that it was designed to measure). Separate analyses were conducted with 70-item PSDQ and the 40-item PSDQ-S. Based on content analysis of items in each scale, Marsh et
al. (1994) posited factors from each instrument likely to be most highly related (shaded convergent validities appended with the letter a) and those that were expected to be less
closely matching (shaded convergent validities appended with the letter b).

Marshs PDSQ (40 Items)


MST
1.0
MBF
.16 1.0
MAC
.63 .18 1.0
MEN
.65 .29 .65 1.0
MSP
.75 .28 .66 .65 1.0
MCO
.56 .35 .48 .42 .64 1.0
MHE
.24 .29 .14 .17 .26 .35 1.0
MAP
.38 .35 .17 .23 .35 .48 .23 1.0
MFL
.50 .22 .42 .43 .34 .50 .19 .25 1.0
MGP
.66 .59 .56 .52 .68 .58 .34 .52 .40 1.0
MES
.49 .44 .33 .40 .52 .59 .49 .66 .36 .64
Foxs PSPP
FST
.75a .15 .41 .41 .51 .33 .17 .27 .32 .41
FBD
.40 .62b .24 .34 .40 .38 .19 .60b .26 .60
FCON
.58 .36 .67b .67b .65 .41 .18 .28 034 .60
FSP
.61 .31 .55 .57 .81a .51 .22 .34 .30 .58
FGP
.51 .53 .41 .47 .58 .44 .21 .42 .27 .73a
Richardss PSC
RST
.89a .03 .48 .48 .58 .36 .15 .26 .37 .44
RBD
.44 .67b .28 .36 .42 .51 .27 .66 .34 .64
RAC
.50 .17 .59b .55 .63 .43 .35 .20 .27 .39
RCOM .49
.23 .45 .38 .52 .83a .22 .38 .54 .47
RHE
.26 .11 .15 .23 .24 .26 .86a .24 .15 .23
RAP
.31 .30 .14 .15 .26 .41 .27 .89a .19 .43
RSA
.26 .36 .26 .22 .33 .35 .17 .23 .23 .43

Appendix D, continued

Appendix E1

PSDQ-S

INSTRUMENT

All information supplied will be kept strictly confidential

NAME:
MALE / FEMALE (circle one) PROGRAM:

AGE:

(years)(mths)

DATE:
GROUP:

PLEASE READ THESE INSTRUCTIONS FIRST


This is not a test - there are no right or wrong answers.
This is a chance to look at yourself. It is not a test. There are no right answers and everyone will have different answers. Be sure
that your answers show how you feel about yourself. PLEASE DO NOT TALK ABOUT YOUR ANSWERS WITH
ANYONE ELSE. We will keep your answers private and not show them to anyone.
The purpose of this study is to see how people describe themselves physically. In the following pages you will be asked to think
about yourself physically; For example, how good looking you are, how strong you are, how good you are at sports, whether you
exercise regularly, whether you are physically coordinated, whether you get sick very often and so forth. Answer each sentence
quickly as you feel now. Please do not leave any sentence blank.
When you are ready to begin, please read each sentence and decide your answer (You may read quietly to yourself as I read
aloud). There are six possible answers for each question True, False, and four answers in between. There are six boxes next
to each sentence, one for each of the answers. The answers are written at the top of the boxes. Choose your answer to a sentence
and put a tick in the box under the answer you choose. DO NOT say your answer out loud or talk about it with anyone else.
Before you start there are three examples below. A student named Bob has already answered the first two examples to show you
how to do it. In the third example you must choose your own answer by ticking a box.
1
False

2
Mostly False

3
More false than true

4
More true than false

5
Mostly true

6
True

SOME EXAMPLES
55
A. I am a creative person.
1
2
3
4
6
(The 5 has been circled because the person answering believes the statement I am a creative person is mostly true. That is, the statement is
mostly like him/her.)
2
B. I am good at writing poetry.
1
2
3
4
5
6
(The 2 has been circled because the person answering believes that the statement is mostly false as far as he/she is concerned. That is, he/she
feels he/she does not write good poetry.)
6
55
C. I enjoy playing with pets.
1
2
3
4
6
(The 5 has been circled because at first the person thought that the statement was mostly true but then the person corrected it to 6 to show that the
statement was very true about him/her.)

Please do not leave any statements blank. If unsure, please ASK FOR HELP

480

Herb

Marsh, 2009. SELF Research Centre, University of Oxford, Oxford, UK

Please circle the number which is the most correct statement about you.

Statement

False

True

01. I feel confident when doing coordinated movements.


02. I am a physically strong person.
03. I am quite good at bending, twisting and turning my body.
04. I can run a long way without stopping.
05. Overall, most things I do turn out well.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

06. I usually catch whatever illness (flu, virus, cold etc) is going around.
07. Controlling movements of my body comes easily to me.
08. I often do exercise or activities that make me breathe hard.
09. My waist is too large.
10. I am good at most sports.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

11. Physically, I am happy with myself.


12. I have a nice looking face.
13. I have a lot of power in my body.
14. My body is flexible.
15. I am sick so often that I cannot do all the things I want to do.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

16. I am good at coordinated movements.


17. I have too much fat on my body.
18. I am better looking than most of my friends.
19. I can perform movements smoothly in most physical activities.
20. I do physically active things (e.g. jog, dance, bicycle, aerobics, gym, swim) at
least three times a week.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

21. I am overweight.
22. I have good sports skills.
23. Physically, I feel good about myself.
24. Overall, I am no good.
25. I get sick a lot.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

26. I find my body handles coordinated movements with ease.


27. I do lots of sports, dance, gym, or other physical activities.
28. I am good looking.
29. I could do well in a test of strength.
30. I can be physically active for a long period of time without getting tired.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

31. Most things I do, I do well.


32. When I get sick, it takes me a long time to get better
33. I do sports, exercise, dance or other physical activities almost every day.
34. I play sports well.
35. I feel good about who I am physically.

1
1
1
1
1

2
2
2
2
2

3
3
3
3
3

4
4
4
4
4

5
5
5
5
5

6
6
6
6
6

36. I think I would perform well on a test measuring flexibility.


37. I am good at endurance activities like distance running, aerobics, bicycling,
swimming, or cross-country, skiing.
38. Overall, I have a lot to be proud of.
39. I have to go to the doctor because of illness more than most people my age.
40. Nothing I ever do seems to turn out right.

1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6

Thank you

Herb

Marsh, 2009. SELF Research Centre, University of Oxford, Oxford, UK

481

Appendix E2
Scoring Instructions: Each of the 40 items from PSDQ-S instrument is denoted
by three codes (below). The first is a two digit number (01-40) indicating the item
number on the PSDQ-S instrument. The second is a two-letter abbreviation (see
Appendix A) for the factor and the number of the item from the original PSDQ.
Note that these refer to the number following the two-letter abbreviation is based
on the original long form of the PSDQ and not all these items appear on the short
form of the PSDQ (PSDQ-S). For example there is no AC1 because AC1 (i.e.,
the first item of the AC scale) corresponds to an item that was not selected to be
part of the PSDQ-S. An * indicates that the item should be reversed scored when
computing scale scores (i.e., subtract the response from 7 to get the reverse-scored
response). The third two-digit number (in parentheses) is the item number on the
original PSDQ. In order to compute simple score scores for each PSDQ-S scale,
simply sum responses to items from each scale and divide by the number of items
(making sure to reverse score negatively worded items, those denoted by *, before
summing the responses).
ITEM CODES: 01 CO1 (02); 02 ST1 (08); 03 FL1 (09); 04 EN1 (10); 05 ES1
(11); 06 HE2* (12); 07 CO2 (13); 08 AC2 (14); 09 BF2* (15); 10 SP2 (16);
11 GP2 (17); 12 AP2 (18); 13 ST2 (19); 14 FL2 (20); 15 HE3* (23); 16 CO3
(24); 17 BF3* (26); 18 AP3 (29); 19 CO4 (35); 20 AC4 (36); 21 BF4* (37);
22 SP4 (38); 23 GP4 (39); 24 ES4* (44); 25 HE5* (45); 26 CO5 (46); 27 AC5
(47); 28 AP5 (50); 29 ST5 (51); 30 EN5 (54); 31 ES5 (55); 32 HE6* (56); 33
AC6 (58); 34 SP6 (60); 35 GP6 (61); 36 FL6 (64); 37 EN6 (65); 38 ES6 (66);
39 HE7* (67); 40 ES8* (70).

482

Das könnte Ihnen auch gefallen