Phys Health

Journal of Occupational Health Psychology Copyright 2005 by the Educational Publishing Foundation
2005, Vol. 10, No. 4, 363–381 1076-8998/05/$12.00 DOI: 10.1037/1076-8998.10.4.363
The Physical Health Questionnaire (PHQ): Construct Validation of a

Self-Report Scale of Somatic Symptoms
Aaron C. H. Schat E. Kevin Kelloway
McMaster University St. Mary’s University
Serge Desmarais
University of Guelph
The authors report the results of 3 studies that were conducted to evaluate the psychometric
properties of the Physical Health Questionnaire (PHQ), a brief self-report scale of somatic
symptoms. In Study 1, exploratory factor analysis results revealed 4 empirically distinct dimen-
sions of somatic symptoms: gastrointestinal problems, headaches, sleep disturbances, and respi-
ratory illness. In Study 2, this structure was replicated using confirmatory factor analysis, and
correlations of the PHQ dimensions with measures of negative affect, psychological health, and
job performance provided further validity evidence. In Study 3, a minor revision to the wording
of several items helped to address the limitations of one of the PHQ subscales. Together, these
results provide evidence of the construct validity of the PHQ.
Keywords: physical health, somatic symptoms, validity, measurement, work stress
More than 20 years ago, Schwab (1980) lamented the agement pay too little attention to issues of
lack of attention investigators paid to the construct measurement, and in particular, to evaluating the con-
validity of measures that they used in organizational struct validity of measurement scales. This lack of at-
research. In his article, Schwab suggested that research- tention to construct validity can lead to erroneous con-
ers often proceed to examine substantive relationships clusions about how the variables being investigated are
between variables before adequate evidence of the con- related and, more generally, can impede scientific
struct validity of the variables’ measures has been gath- progress. Thus, it behooves researchers to give more
ered. Although the situation may have improved some- consideration to psychometric issues in their research.
what since Schwab’s article, several recent articles (e.g., Accordingly, in this article, we present the results of
Hinkin, 1995; Judd & McClelland, 1998) have ac- a psychometric examination of the Physical Health
knowledged that researchers in social and organiza- Questionnaire (PHQ), a scale measuring somatic symp-
tional psychology, organizational behavior, and man- toms. An earlier version of this scale was originally
developed by Spence, Helmreich, and Pred (1987) and
has since been revised and used by a number of other
Aaron C. H. Schat, DeGroote School of Business, Mc- researchers (e.g., Rogers & Kelloway, 1997; Schat &
Master University, Hamilton, Ontario, Canada; E. Kevin Kelloway, 2000, 2003). None of the studies using the
Kelloway, Department of Management, St. Mary’s Univer- early or revised versions of the scale, however, have
sity, Halifax, Nova Scotia, Canada; Serge Desmarais, De-
partment of Psychology, University of Guelph, Guelph,
reported extensive consideration of its psychometric
Ontario, Canada. properties. The aim of the present research is to rigor-
An earlier version of this article was presented at the 17th ously examine the construct validity of the PHQ to
Annual Convention of the Society of Industrial and Orga- determine whether it constitutes a meaningful and valid
nizational Psychology, Toronto, Ontario, Canada, April measure of self-reported somatic health for use in or-
2002. Portions of this research were supported by an On-
tario Graduate Scholarship and a Social Sciences and Hu- ganizational research. This validation study is particu-
manities Research Council of Canada (SSHRC) doctoral larly important given the existence of several published
fellowship to Aaron C. H. Schat and SSHRC research grants studies that have used earlier versions of the scale.
to E. Kevin Kelloway and Serge Desmarais. We thank Greg
Chung-Yan for his comments on an earlier version of this
article.
Correspondence concerning this article should be ad- Conceptualizing Physical Health
dressed to Aaron C. H. Schat, DeGroote School of Business,
McMaster University, 1280 Main Street West, Hamilton, Before we discuss issues related to the measure-
Ontario, L8S 4M4, Canada. E-mail: schata@mcmaster.ca ment of physical health, we briefly review its con-
363
364 SCHAT, KELLOWAY, AND DESMARAIS
struct domain and discuss the biological mechanisms health care utilization and costs (e.g., Ganster, Fox, &
by which psychological functioning (e.g., stress, Dwyer, 2001; Manning, Jackson, & Fusilier, 1996).
emotions) is posited to influence physical health. We
note, however, that a comprehensive review of these Measuring Physical Health
issues is beyond the scope of the present research and
refer interested readers to more thorough treatments Physical strain measures have ranged from mea-
thereof (e.g., Cohen & Herbert, 1996; Herbert & suring relatively minor somatic symptoms such as
Cohen, 1993; Kiecolt-Glaser & Glaser, 1988). sleep disturbances, upper respiratory infections, and
Over the past number of years, there has been a digestive problems (e.g., Spence et al., 1987) to life-
growing recognition of the relationship between hu- threatening conditions such as elevated blood pres-
man psychological and physical health and a growing sure (Barling & Kelloway, 1996), hypertension
base of literature empirically demonstrating this con- (Schwartz, Pickering, & Landsbergis, 1996), and cor-
nection (for reviews, see Cohen & Herbert, 1996; onary heart disease (Karasek & Theorell, 1990;
Herbert & Cohen, 1993). Psychological function- Krantz, Contrada, Hill, & Friedler, 1988). Although
ing—and in particular exposure to stressful life ex- behavioral and physiological indicators of somatic
periences and emotional reactions to same— has been health have been used in a number of studies of work
implicated as a potential contributor to a wide range stress (e.g., Evans & Johnson, 2000), self-reports
of physical diseases and symptoms, including upper constitute the most widely used method of measure-
respiratory infections, gastrointestinal problems, and ment in this area and, in addition to the PHQ, include
cardiovascular illness. A substantial body of research the Occupational Stress Indicator (Cooper, Sloan, &
has accumulated that examines the potential linkages Williams, 1988), the Symptoms Checklist (Bartone,
Ursano, Wright, & Ingraham, 1989), and the Physical
between psychological stress and physical disease.
Symptoms Inventory (Spector & Jex, 1998). Despite
One linkage for which substantial evidence exists is
the wide use of physical health measures in occupa-
the role of immunosuppression—the suppression of
tional health research, their validity has rarely been
immune function in response to exposure to stres-
the subject of focused investigation and is often not
sors—in increasing people’s susceptibility to infec-
assessed beyond internal consistency reliability. In
tious disease (Cohen, 1996). As Cohen noted, how-
fact, as Spector and Jex (1998) noted, “considering
ever, the complexity of the immune system, coupled
the widespread use of self-report measures of job-
with the methodological limitations of measuring this
related stressors and strains, it is surprising that rel-
complexity in human studies, makes it difficult to
atively little attention has been paid to demonstrating
gather clear evidence of the mechanisms by which
construct validity of specific scales” (p. 359).
stress influences immune functioning. The available Why has so little attention been paid to validating
evidence suggests that experiencing stressors and measures of physical health in the literature? We
negative affective reactions to them influence sym- offer two reasons for this. First, there appears to be a
pathetic nervous system activation and hormonal re- lack of attention to construct validity in the study of
sponses, which in turn affect the immune system organizational behavior in general (Schwab, 1980),
(e.g., Cohen, 1996). of which the study of work-related stress and strain is
One domain in which the association between a part. Second, in occupational health and work stress
stress and health has received considerable attention research, physical health is rarely the focus of inves-
is the occupational health literature. Research in this tigation; rather, it tends to be used as a stress outcome
area has found that exposure to work-related stress is variable (i.e., strain). Thus, its measurement typically
associated with suppressed immune functioning receives only cursory consideration, often limited to
(O’Leary, 1990) and increased risk of infectious dis- a brief overview of the scale and its internal consis-
ease (e.g., Cohen & Williamson, 1991; Schaubroeck, tency reliability. This is also true of the PHQ, which
Jones, & Xie, 2001) as well as musculoskeletal com- has been used as an outcome measure in studies of
plaints (e.g., myalgia; Lundberg et al., 1999; see also Type A behavior (Spence et al., 1987) and workplace
Carayon, Smith, & Haims, 1999), increased fibrino- aggression (Rogers & Kelloway, 1997; Schat &
gen levels in women (Davis, Matthews, Meilahn, & Kelloway, 2000, 2003).
Kiss, 1985), asthma, ulcers, and the risk of stroke The widespread use of self-report measures, how-
(Quick, Quick, Nelson, & Hurrell, 2001). Given these ever, makes the availability of a psychometrically
findings, it is not surprising that the empirical evi- sound, yet efficiently administered, self-report mea-
dence supports an association between job stress and sure of somatic health critical to research in this area.
PHYSICAL HEALTH QUESTIONNAIRE 365
In this article, we report the results of three studies gests that the original and revised versions of the
that were conducted to examine the psychometric scale have reasonable levels of internal consistency, a
properties of the PHQ. In the first study, we provide more comprehensive evaluation of its psychometric
an initial assessment of validity by conducting ex- properties—particularly the construct validity of the
ploratory factor analysis and reliability analysis on revised version—is needed. In the following two
the PHQ items. In the second study, we extend these studies, we sought to provide a more rigorous psy-
results by replicating the factor structure obtained in chometric evaluation of the PHQ than has been done
Study 1, assessing measurement invariance across previously.
two different samples and providing further tests of
construct validity by examining the associations be-
tween the PHQ dimensions and measures of negative
Study 1
affect, psychological health, and job performance.
Study 2 also included additional reliability (i.e., in- The primary purpose of this study was to provide
ternal consistency) analyses of the items composing a preliminary empirical assessment of the dimension-
the PHQ dimensions. In the third study, we evaluate ality of the PHQ via exploratory factor analysis
whether a minor wording change to several PHQ (EFA). Its secondary purpose was to examine the
items helps to overcome a limitation observed in internal consistency reliability (i.e., Cronbach’s al-
Studies 1 and 2 and whether the factor structure pha) of the items comprising the obtained factors.
remains consistent following this change. Spence et al. (1987) wrote items to tap four dimen-
sions of somatic health (i.e., quality of sleep, diges-
Physical Health Questionnaire tion problems, headaches, and respiratory problems).
Accordingly, to provide evidence of discriminant va-
The PHQ is a shortened and modified version of
lidity, the four dimensions of somatic health the items
the health scale developed by Spence et al. (1987) in
assess should be empirically distinguishable from
their study of the Type A behavior pattern. Because
one another (although, because they all reflect man-
the health scale was not the primary focus of their
ifestations of somatic symptoms, we expected them
study, their discussion of its development was rather
to correlate to some degree). Therefore, the results of
brief. The information available in their article sug-
the factor analysis should demonstrate a four-factor
gests that 32 items were developed to tap four di-
solution in which the PHQ items have high loadings
mensions of somatic health: quality of sleep, diges-
on the somatic health dimensions to which they cor-
tion problems, headaches, and respiratory problems.
On the basis of item–total correlations, the 32-item respond and low cross-loadings (⬍.3) with the other
scale was reduced to a 22-item scale. Spence et al. PHQ factors.
reported that the four subscales were significantly
intercorrelated (ranging from .17 to .43) and exhib-
ited internal consistency reliabilities above ␣ ⫽ .75. Method
A revised and abbreviated (14-item) version of
Spence et al.’s (1987) scale was used by Rogers and Participants1
Kelloway (1997) and Schat and Kelloway (2000,
In this study, surveys were sent to 496 staff members
2003) in their studies of workplace aggression and from a hospital in Ontario, Canada. Of the surveys, 197
violence. In their study, Rogers and Kelloway (1997) were returned, representing a response rate of 39.7%. Of the
used three of the four original subscales (omitting the respondents, 92% were female and 8% were male. Partici-
items pertaining to respiratory difficulties from their pants’ mean age was 43.4 years (SD ⫽ 8.5), and mean
analyses) as indicators of a single latent variable of organizational tenure was 14.9 years (SD ⫽ 7.7). Nurses
and health care aides composed approximately 55% of the
somatic health, although they grouped the items into sample, with various other occupations (e.g., clerical, di-
subscales based on conceptual rather than empirical etary, housekeeping, technicians) composing the remaining
grounds. Schat and Kelloway (2000, 2003) used the 45%.
same 14-item scale as Rogers and Kelloway (1997),
but rather than using the four subscales separately,
1
they created an overall index of somatic health based The sample from which data for this study come cor-
responds to Sample 1 of Schat and Kelloway’s (2000)
on all the items. In both of these studies, the subscale study; however, in this study we focus on item-level data for
and overall scale reliabilities were above ␣ ⫽ .80. the PHQ, whereas the focus of their earlier study was
Although previous research using this scale sug- scale-level data.
Measure results in the systematic exclusion of healthier re-

spondents (i.e., those who reported experiencing no
To assess somatic health, we used the PHQ, a modified serious respiratory infections), compromising the
version of Spence et al.’s (1987) measure of health. The
representativeness of the sample. One means by
scale consisted of 14 items (see the Appendix) pertaining to
the frequency with which respondents experience sleep dis- which this systematic exclusion could be avoided
turbances, headaches, respiratory infections, and gastroin- would be to impute the lowest response option (i.e.,
testinal problems. Items 1–11 were rated on a 7-point fre- a value of 1) on Item 14 for these respondents (re-
quency scale, ranging from 1 (not at all) to 7 (all of the sulting in N ⫽ 184). Although this strategy is not
time). Items 12–14 had different frequency-related response
options (see the Appendix for details). Prior to analyzing the ideal, it is preferable to excluding such individuals
data, we reverse coded all items but Item 4 (endorsement of from the sample, which runs the risk of the results
this item indicated the absence of symptoms, whereas en- being idiosyncratic to more symptomatic members of
dorsement of all other items indicated the presence of symp- the population and, therefore, less representative. To
toms) so that higher mean scores reflect better somatic
health. The results suggest that the different wording of Item determine the appropriateness of this strategy, we
4 did not adversely affect the factor loadings that were conducted the EFA on both the original (nonimputed)
observed for this item. data (N ⫽ 173) and the adjusted (in which values of
1 were imputed on Item 14 for respondents who
Data Analysis responded “0 times” to Item 13) data (N ⫽ 184). In
both analyses, the sample sizes and the number of
To obtain an initial assessment of the factor structure of items on the PHQ (14 items) represent case-to-vari-
the PHQ, we used principal-components extraction and
rotated the extracted factors to a varimax criterion. The able ratios that exceed the general guidelines that
obtained solution is then cross-validated with data from have been suggested for EFA (5:1; Gorsuch, 1983).
independent samples using confirmatory factor analysis Because the results were nearly identical,2 we de-
(CFA) in Study 2. This approach, in which EFA precedes cided to adopt the adjustment strategy described ear-
CFA, is generally considered the most appropriate approach
when a scale’s factor structure is first being examined lier, believing its limitations are less problematic than
(Tabachnick & Fidell, 1996) and has been suggested as a excluding a group of less symptomatic respondents.
viable strategy for theory development and analysis (Gerb- We also note that in Study 3, we report on the results
ing & Hamilton, 1996). of an investigation in which the wording and re-
sponse options of this item are revised to overcome
Results this limitation.
Principal-components extraction with varimax ro-
Prior to conducting the EFA on Sample 1, we
tation was performed on the 14 PHQ items using
examined the distributions of the PHQ items. Al-
SPSS for Windows. In both analyses, four factors
though several items demonstrated slight skewness
were extracted that cumulatively explained 68.9% of
(e.g., Items 10 and 13), there were no serious viola-
the item variance. The item factor loadings, commu-
tions of univariate (e.g., the skewness coefficients for
nalities, and proportions of variance for individual
these items were ⬍2) or multivariate normality, and
factors are found in Table 1, along with item means
all other assumptions were met. One concern that did
and standard deviations. The four factors represented
emerge, however, was the amount of missing data
the following four aspects of physical symptoms:
produced by Item 14. Sixteen cases were missing
Gastrointestinal Problems, Headaches, Sleep Distur-
data on this item, whereas no other item had more
bance, and Respiratory Infections. Following vari-
than 3 cases missing data. The likely explanation for
this pattern of missing data is that any respondent max rotation, the four factors explained 20.04%,
who responded with “0 times” to Item 13 (reflecting 17.74%, 17.42%, and 13.66% of the item variance,
no serious respiratory illnesses) would not be able to respectively. Although we hypothesized that the fac-
respond to Item 14 (which reflects the duration of the tors are likely intercorrelated, we used orthogonal
respiratory illness and has, as its lowest response (varimax) rotation because data from Monte Carlo
option, “1 day”). Inspection of the original data re- simulations have shown that varimax rotation accu-
veals that 11 of the 16 respondents with missing data rately represents known parameter estimates (Gerb-
on Item 14 answered “0 times” to Item 13 and a ing & Hamilton, 1996). We note, however, that the
number of them actually wrote “0 times” or “N/A”
(not applicable) beside that item. Excluding these 2
Owing to space limitations, we do not report the de-
cases results in a reduction in sample size (N ⫽ 173 tailed EFA results for the nonadjusted sample here. How-
following list-wise deletion) but more importantly ever, they are available on request.
Table 1
Means, Standard Deviations, Factor Loadings, Communalities, and Proportions of Variance for
Principle-Components Extraction With Varimax Rotation for Physical Health Questionnaire Items (Study 1)
Factor loading
Gastrointestinal Sleep Respiratory
Item M SD Problems Headaches Disturbance Infections Communalities
1. How often have you had
difficulty getting to sleep
at night? 4.82 1.42 .24 .21 .65 .11 .54
2. How often have you
woken up during the
night? 4.14 1.53 .06 .05 .85 .11 .74
nightmares or disturbing
dreams? 5.54 1.34 .36 .28 .56 .13 .53
4. How often has your
sleep been peaceful and
undisturbed? 4.61 1.54 .07 .17 .85 .03 .76
experienced headaches? 4.59 1.62 .11 .82 .14 .01 .71
6. How often did you get a
headache when there
was a lot of pressure on
you to get things done? 4.98 1.63 .21 .89 .17 .09 .88
headache when you were
frustrated because things
were not going the way
they should have or
when you were annoyed
at someone? 5.25 1.57 .24 .84 .23 .07 .83
suffered from an upset
stomach (indigestion)? 5.42 1.39 .79 .16 .23 .14 .72
9. How often did you have
to watch that you ate
carefully to avoid
stomach upsets? 5.40 1.77 .90 .07 .08 .14 .83
10. How often did you feel
nauseated (“sick to your
stomach”)? 5.84 1.20 .75 .16 .26 .17 .68
11. How often were you
constipated or did you
suffer from diarrhea? 5.32 1.53 .63 .24 .03 .19 .50
12. How many times have
you had minor colds
(that made you feel
uncomfortable but didn’t
keep you sick in bed or
make you miss work)? 5.19 1.34 .14 .14 .06 .77 .64
13. How many times have
you had respiratory
infections more severe
than minor colds that
“laid you low” (such as
bronchitis, sinusitis, etc.)? 6.19 1.18 .28 .06 .14 .72 .63
14. When you had a bad
cold or flu, how long did
it typically last? 4.37 1.95 .09 ⫺.05 .09 .79 .65
Percentage of variance
(following rotation) 20.04 17.74 17.42 13.65
Note. Factor loadings exceeding .40 are presented in boldface.
results of an EFA involving principal-axis extraction tional assessment of the PHQ’s construct validity is
and oblique (i.e., promax) rotation were consistent needed. This should include replicating the obtained
with the reported results and that the hypothesized factor solution on independent data, investigating
factor correlations were incorporated into our CFA whether a hierarchical factor solution provides a
model in Study 2. good representation of the structure of the PHQ, and
Using Cronbach’s (1951) coefficient alpha formula examining the relationships between the PHQ dimen-
(as implemented in SPSS for Windows), we calcu- sions and measures of other relevant variables (e.g.,
lated the internal consistency reliabilities of the four affect, psychological health) to provide additional
PHQ subscales. These analyses revealed Cronbach’s evidence of convergent and discriminant validity.
alpha of .83, .88, .80, and .66, for Gastrointestinal These issues were examined in Study 2.
Problems, Headaches, Sleep Disturbance, and Respi-
ratory Infections, respectively.
Study 2
Discussion The purpose of Study 2 was to extend our inves-
tigation of the validity of the PHQ. To this end, we
This study was carried out as an initial step in the
examined the following issues. First, to evaluate the
validation of the PHQ and involved assessing its
robustness of the exploratory factor structure, we
factor structure and internal consistency reliability.
conducted a CFA of the items using a sample of
The results of the EFA supported the hypothesized
social service employees. This involved comparing
four-dimensional conceptualization corresponding to
three models: a one-factor model, a four-factor or-
the dimensions of physical health that constituted the
thogonal model, and a four-factor oblique model. If
original scale (Spence et al., 1987). Specifically, the
the one-factor model demonstrated superior fit, it
results showed that the PHQ is composed for four
would demonstrate that the PHQ factors are not em-
factors representing the following four types of phys-
pirically distinguishable. Superior fit of either of the
ical symptoms: Gastrointestinal Problems, Head-
four-factor models would demonstrate the empirical
aches, Sleep Disturbances, and Respiratory Infec-
distinctiveness of the PHQ dimensions. However,
tions. The EFA solution showed that the factors were
because we expect the four somatic health dimen-
empirically distinguishable and accounted for sub-
sions to be related with one another, we hypothesize
stantial amounts of item variance. In addition, the
that the four-factor oblique model will demonstrate
PHQ dimensions demonstrated acceptable levels of
significantly better fit than the four-factor orthogonal
internal consistency reliability, although Cronbach’s
model. Accordingly, we present the following
alpha was somewhat low for the Respiratory Infec-
hypotheses:
tions subscale (i.e., ␣ ⫽ .66).
One of the limitations that emerged in this study Hypothesis 1: Each PHQ item will be signifi-
was that Item 14 was characterized by a greater cantly associated with one of the four dimen-
proportion of missing data than other PHQ items, sions of somatic health (to reflect the pattern of
likely because of an idiosyncrasy in the response results that emerged from the EFA).
options used for that item. In particular, a number of
respondents who endorsed the lowest response option Hypothesis 2: The four PHQ factors will be
for Item 13 (which measures frequency of experienc- empirically distinguishable from one another yet
ing serious respiratory symptoms) did not select a still exhibit significant intercorrelation.
response on Item 14 (which measures duration of
serious respiratory symptoms). To prevent the sys- The CFA model that represents this hypothesis is a
tematic exclusion of these respondents from the anal- four-factor oblique model (in which the four factors
yses, we provisionally adopted an approach whereby are allowed to correlate freely with one another). A
the lowest response option was imputed on Item 14 more sophisticated variation of this model is a hier-
for these respondents. Comparison of the EFA results archical model comprising four first-order factors
with and without this adjustment showed that they representing the four types of somatic symptoms and
were nearly identical. Therefore, we elected to adopt one second-order factor representing general physical
the same strategy in Study 2. health. Various commentators have suggested that
Together, the results of Study 1 provide prelimi- hierarchical factor analysis has been underutilized
nary construct validity evidence for the PHQ. How- (e.g., Floyd & Widaman, 1995) and can be useful for
ever, these results are only preliminary, and addi- explaining complex factor solutions (Nunnally &
Bernstein, 1994). The fact that previous research tential stress outcomes (i.e., strains; Pratt & Barling,
using the PHQ (e.g., Schat & Kelloway, 2003; 1988). Because previous research has found psycho-
Spence et al., 1987) has used the scale to compute a logical health to be correlated with somatic symp-
single overall index of somatic health provides addi- toms (Herbert & Cohen, 1993) such as headaches
tional impetus for such a model. Therefore, in addi- (Rasmussen, 1993) and colds and flus (Stone, Reed,
tion to the models described above, we also specified & Neale, 1987), moderate correlations between the
a model to test whether a hierarchical factor solution PHQ factors and psychological health would provide
adequately represents the structure of somatic symp- further construct validity evidence for the PHQ.
toms as measured by the PHQ.
Second, these CFA results were cross-validated Hypothesis 4: The PHQ factors will be posi-
using data from an independent sample of health care tively correlated with a measure of psychologi-
workers. Specifically, we tested the invariance of the cal health.
parameters in the final CFA model (i.e., the factor
Fifth, we examined the correlation between the
loadings and intercorrelations) across the two sam-
PHQ dimensions and a self-report measure of job
ples in the study.
performance to further assess the PHQ’s discriminant
Third, the validity of the PHQ was further assessed
validity. Research has shown that exposure to work-
by using CFA to test whether the PHQ dimensions
related stressors can adversely affect job performance
are correlated with yet empirically distinct from a
(e.g., Motowidlo, Packard, & Manning, 1986; Stew-
measure of negative affect. Negative affect comprises
art & Barling, 1996). As a result, reductions in job
one element of the nomological network (see Cron-
performance can be considered a type of behavioral
bach & Meehl, 1955) of somatic health. For example,
strain, just as somatic symptoms are a type of phys-
there is evidence to suggest that negative affect may
ical strain. However, physical symptoms and job
become manifest in physical symptoms (Watson &
performance represent very different types of strain
Clark, 1984; see also Spector, Zapf, Chen, & Frese,
that would not be expected to correlate to a high
2000). Moreover, on the basis of the physical health
degree. Therefore, we examined this correlation to
literature reviewed earlier (e.g., Cohen, 1996), one of
assess the discriminant validity of the PHQ.3
the pathways by which exposure to stressors has been
Finally, we examine the internal consistency of the
proposed to adversely influence immune functioning
subscales to replicate and extend the results of Study
is via negative affect (and its potential influence on
1 and those obtained by previous researchers (Rogers
nervous system and hormonal functioning). As a
& Kelloway, 1997; Schat & Kelloway, 2000, 2003;
result, a significant correlation between a measure of
Spence et al., 1987).
negative affect and the PHQ dimensions could be
considered as evidence of the PHQ’s construct
validity. Method
Although we believe that negative affect is an
important element of the nomological network of Participants4
somatic health, we are mindful that other researchers
have expressed the concern that negative affect may Participants in this study came from two sources. Sample
1 comprised employees of a social service agency respon-
confound measures of strain, such as somatic symp-
sible for the administration of group homes for develop-
toms (Brief, Burke, George, Robinson, & Webster, mentally disabled adults in Ontario, Canada. Of the 670
1988). As a result, in addition to testing the hypoth- surveys that were sent, 205 were returned, for a response
esis that there is an association between negative
affect and the PHQ dimensions, our analyses also 3
include an assessment of the extent to which they are Because predicting a nonsignificant correlation amounts
to setting out to prove the null hypothesis, we do not include
empirically distinct from one another. a formal hypothesis regarding the relationship between the
PHQ and job performance.
Hypothesis 3: The PHQ factors will be signifi- 4
The two samples in this study correspond to samples
cantly associated with, but empirically distinct whose data were used in previous research by Schat and
from, negative affect. Kelloway (2000, 2003). Specifically, Sample 1 corresponds
to Sample 2 of their 2000 study, and Sample 2 corresponds
to the sample used in their 2003 study. However, the anal-
Fourth, psychological health is also part of the yses reported here are distinct from those reported in their
nomological network of somatic health, and both earlier studies as they reported only scale-level data for the
psychological and physical symptoms represent po- PHQ, whereas in this study we report on item-level data.
rate of 30.6%. Seventy-seven percent of the respondents the fit of a one-factor model and a four-factor orthogonal
were female, and the mean age and organizational tenure of model. Convergence of the factor structure across the EFA
participants were 35.4 years (SD ⫽ 10.5) and 4.7 years and CFA and superior fit of the four-factor oblique model
(SD ⫽ 3.7), respectively. Approximately 90% of the sample compared with the one-factor and four-factor orthogonal
consisted of residential counselors and support workers, 5% models would provide evidence of the scale’s construct
were program managers, and the remainder included voca- validity.
tional counselors and relief staff. Sample 2 comprised em- Second, after comparing these first-order models, we
ployees from three small health care settings in Ontario, assessed the fit of a hierarchical CFA model, consisting of
Canada. A total of 863 surveys were distributed, of which four first-order factors (Gastrointestinal Problems, Head-
229 were returned, resulting in a response rate of 26.5%. aches, Sleep Disturbance, and Respiratory Illness) and one
Eighty-seven percent of the respondents were female. Mean second-order factor (General Physical Health) using Sample
age and organizational tenure of participants were 40.9 1 data. The invariance of the parameters of this model
(SD ⫽ 9.8) and 10 years (SD ⫽ 7.3), respectively. A across Samples 1 and 2 was then assessed via multisample
number of different occupations were represented in the analyses conducted using LISREL 8 (Joreskog & Sorbom,
sample, including nurses and health care aides (approxi- 1996). To test for the invariance of the PHQ factor loadings
mately 45%), laboratory and technical staff (13%), and and correlations across the two samples, we compared mod-
social workers and counselors (7%). els in which these parameters are constrained to be equal
with a model in which the parameters are freely estimated
(i.e., not constrained to be equivalent). The constrained
models are nested within the freely estimated model, allow-
Measures ing for the chi-square difference test to test for differences
in the fit of the models. A significant difference in the
PHQ. To assess somatic health, we used the same ver- chi-square between the two models would suggest a lack of
sion of the PHQ described in Study 1. invariance between the two samples.
Psychological health. Psychological health was mea- Third, to further assess the validity of the PHQ, we
sured with the 12-item version of the General Health Ques- compared the fit of a series of CFA models comprising the
tionnaire (GHQ; Banks et al., 1980). The GHQ is often used four PHQ factors and a measure of negative affect. The
to detect minor levels of psychiatric disturbance in the three models tested were (a) a five-factor oblique model, in
general population and consists of items relating to depres- which the four PHQ factors and negative affect represent
sion, self-confidence, and problem-solving. A 7-point re- separate factors; (b) a one-factor model, in which all PHQ
sponse scale was used, with responses ranging from 1 and negative affect items load onto a single latent variable;
(never) to 7 (always); higher mean scores indicate greater and (c) a four-factor oblique model, in which the negative
psychological health. Previous research using the 12-item affect items load onto all four PHQ factors. Of the three
version of the GHQ has found its internal consistency to models to be examined, the one-factor model is the most
range from ␣ ⫽ .82 to .90 (Banks et al., 1980; Schat & constrained model and the four-factor model the least con-
Kelloway, 2000). In this study, ␣s ⫽ .86 and .90 for strained. On balance, the superiority of the five-factor model
Samples 1 and 2, respectively. would provide the strongest support for Hypothesis 3, be-
Negative affect. Negative affect was measured with cause in this model, the PHQ dimensions and negative
Warr’s (1990) six-item scale. Participants were asked how affect are correlated with one another but represent inde-
often their jobs made them uneasy, contented, depressed, pendent factors. The one-factor model stands in nested
relaxed, enthusiastic, and tense. A 7-point response scale sequence to the five-factor model, but the four-factor model
ranging from 1 (never) to 7 (all of the time) was used. does not. Therefore, the chi-square difference test was used
Positive items (e.g., contented) were reverse coded so that to compare the five- and one-factor models, and other fit
higher scores represented more negative affect. Internal indices (e.g., root-mean-square error of approximation
consistency of the scale was ␣ ⫽ .84 and .86 for Samples 1 [RMSEA] and goodness-of-fit index [GFI]) were used to
and 2, respectively. compare the five- and four-factor models. To limit the
Job performance. Participants provided self-ratings of complexity of reporting the results, we report the results
their job performance using Stewart and Barling’s (1996) using Sample 1 data only. However, we note that the results
20-item scale of interpersonal job performance. This scale were consistent in Sample 2 and are available on request.
was designed to measure performance in direct care roles, Fourth, the PHQ dimensions were correlated with mea-
and therefore a number of participants not in these roles did sures of psychological health and job performance, to assess
not provide ratings (resulting in a slightly smaller sample convergent and discriminant validity, respectively. Finally,
size for analyses in which this scale is used). A 7-point internal consistency analyses were conducted.
response scale was used, ranging from 1 (strongly disagree)
to 7 (strongly agree). Internal consistency of the scale was
␣ ⫽ .92 and .90 for Samples 1 and 2, respectively.
Results
Data Analysis Confirmatory Factor Analysis of PHQ Items
(Sample 1)
In this study, data analyses were carried out in the fol-
lowing steps. First, the replicability of the final model
determined via EFA in Study 1 was assessed using CFA,
Before conducting CFA on Samples 1 and 2, we
based on data from Sample 1. Specifically, the fit of the tested all assumptions and found them to be within
hypothesized four-factor oblique model was compared with acceptable limits. As occurred in Study 1, the amount
of missing data on Item 14 was higher than for the which of the three models provided the best fit.
other PHQ items. In Sample 1, 8 respondents were These results showed that the hypothesized four-
missing data on Item 14, whereas no other item had factor oblique model provided significantly better
more than 2 respondents with missing data. In Sam- fit than the one-factor, ␹2diff(6) ⫽ 393.35, p ⬍
ple 2, 18 respondents were missing data on Item 14, .001, and four-factor orthogonal models,
whereas no other item had more than 2 respondents ␹2diff(6) ⫽ 123.65, p ⬍ .001.
with missing data. In each sample, all but one of the Standardized parameter estimates and associated
Item 14 nonrespondents had endorsed the lowest R2 values for the four-factor oblique model are
option on Item 13. Therefore, using the procedure presented in Table 3. As shown, all model param-
outlined in Study 1, values of 1 were imputed for eters were significant (p ⬍ .001) and explained
these respondents, resulting in final samples of N ⫽ substantial item variance (R2 ranged from .24 to
194 and N ⫽ 222 for Samples 1 and 2, respectively. .73). Parameter estimates representing the associ-
The case-to-variable ratios in these analyses exceed ation between the four factors ranged from ␤ ⫽ .33
9:1, which meet the general guidelines that have been to ␤ ⫽ .55.
suggested for CFA (between 5:1 and 10:1; Bentler & The fit of the hierarchical model was acceptable,
Chou, 1987). ␹2(73, N ⫽ 194) ⫽ 155.54, p ⬍ .001; RMSEA ⫽
We conducted CFA on the covariance matrix of .077, p ⬍ .01; GFI ⫽ .90; comparative fit index [CFI]
the PHQ items using maximum likelihood estima- ⫽ .93; and similar to that of the four-factor oblique
tion as implemented in LISREL 8 (Joreskog & model, providing support for the hierarchical struc-
Sorbom, 1996). Fit indices for the four-factor ture of physical health. A graphical representation of
oblique and orthogonal models and a one-factor this structure, including the standardized parameter
model are presented in Table 2. As shown, the estimates representing the relationships among vari-
one-factor model provides poor fit to the data, ables within the model, is presented in Figure 1. As
␹2(77, N ⫽ 194) ⫽ 541.78, p ⬍ .001; RMSEA ⫽ shown, all parameters are significant in the expected
.20, p ⬍ .001; GFI ⫽ .68, as does the four-factor direction.
orthogonal model. ␹2(77, N ⫽ 194) ⫽ 272.08, p ⬍
.001; RMSEA ⫽ .12, p ⬍ .001; GFI ⫽ .82,
whereas the four-factor oblique model provided Multisample CFA of PHQ Items to Assess
adequate but not excellent fit, ␹2(71, N ⫽ 194) ⫽ Measurement Invariance
148.43, p ⬍ .001; RMSEA ⫽ .075, p ⬍ .01; GFI ⫽
.90. CFA models rarely provide good fit to the To more rigorously assess the factor structure and
data, because of the large number of parameter validity of the PHQ, we examined the invariance of the
constraints inherent in such models (i.e., relatively hierarchical model’s parameters (i.e., first- and second-
few of the total number of possible paths are order factor loadings and factor covariances) between
estimated). Therefore, assessment of model fit Samples 1 and 2. The model was first simultaneously
should involve considerations of comparative as estimated in both samples, ␹2(146, n ⫽ 194 [Sample 1],
well as absolute fit (Kelloway, 1998). Accordingly, n ⫽ 222 [Sample 2]) ⫽ 306.03, p ⬍ .001. Next, the
because both alternative models we examined are parameters in Sample 1 were fixed to be equivalent to
nested within the four-factor oblique model, a chi- the parameters of Sample 2, ␹2(164, n ⫽ 194 [Sample
square difference test was conducted to determine 1], n ⫽ 222 [Sample 2]) ⫽ 319.28, p ⬍ .001. The
Table 2
Fit Indices for the Confirmatory Factor Analysis Models of the Physical Health Questionnaire (Study 2,
Sample 1)
Model ␹2 df RMSEA GFI CFI PGFI
1-factor 541.78** 77 .20** .68 .59 .50
4-factor orthogonal 272.08** 77 .12** .82 .83 .60
4-factor oblique (hypothesized) 148.43** 71 .075* .90 .93 .61
Note. RMSEA ⫽ root-mean-square error of approximation; GFI ⫽ goodness-of-fit index; CFI ⫽ comparative fit index;
PGFI ⫽ parsimonious goodness-of-fit fit index.
* p ⬍ .01. ** p ⬍ .001.
Table 3
Standardized Parameter Estimates and R2 Values for Confirmatory Factor Analysis of Physical Health
Questionnaire Items (Study 2, Sample 1)
Gastrointestinal Sleep Respiratory
Item M SD Problems Headaches Disturbance Infections R2
difficulty getting to sleep at
night? 4.79 1.55 .75 .57
2. How often have you woken up
during the night? 4.05 1.73 .85 .71
dreams? 5.59 1.42 .58 .34
4. How often has your sleep been
peaceful and undisturbed? 4.53 1.53 .67 .45
experienced headaches? 4.53 1.63 .73 .53
headache when there was a lot
of pressure on you to get things
done? 4.72 1.72 .85 .73
frustrated because things were
not going the way they should
have or when you were
annoyed at someone? 5.36 1.54 .85 .72
8. How often have you suffered
from an upset stomach
(indigestion)? 5.10 1.64 .85 .73
9. How often did you have to
watch that you ate carefully to
avoid stomach upsets? 5.38 1.75 .83 .69
10. How often did you feel nauseated
(“sick to your stomach”)? 5.53 1.40 .79 .63
11. How often were you constipated
or did you suffer from diarrhea? 5.28 1.60 .59 .34
12. How many times have you had
minor colds (that made you feel
uncomfortable but didn’t keep
you sick in bed or make you
miss work)? 4.52 1.69 .74 .54
13. How many times have you had
respiratory infections more
severe than minor colds that
“laid you low” (such as
bronchitis, sinusitis, etc.)? 5.97 1.34 .70 .49
14. When you had a bad cold or
flu, how long did it typically
last? 4.07 1.89 .49 .24
Note. All parameters were significant at p ⬍ .001.
difference of fit between these two models, ␹diff2(18) ⫽ ters reflecting factor loadings and covariances), was
13.25, ns, was nonsignificant, suggesting that the mag- also specified and its fit compared with the freely esti-
nitudes of these parameters are invariant across the two mated model. Its overall fit was ␹2(178, n ⫽ 194 [Sam-
samples. ple 1], n ⫽ 222 [Sample 2]) ⫽ 347.89, p ⬍ .001. The
An additional model, in which the item error vari- results of the chi-square difference test, ␹2diff(32) ⫽
ances were declared invariant (along with the parame- 41.86, ns, showed a nonsignificant difference in the fit
Item 1
.7 5
.85 Item 2
Sleep . 58
Disturbance Item 3
.66
Item 4
.55
.73 Item 5
. 86
Headaches Item 6
. 85
.74 Item 7
Physical Health
.71 .85 Item 8
Gastro- .83
Item 9
intest inal
.79
Problems
Item 10
.66 .59
Item 11
.76
Respiratory Item 12
Infections .67
Item 13
.48
Item 14
Figure 1. Graphical representation of the hierarchical structure of the Physical Health

Questionnaire (based on the confirmatory factor analysis of Study 2, Sample 1 data). All
parameters are significant at p ⬍ .001. The item numbers indicated in the figure are consistent
with the item numbers indicated in the Appendix.
between these models, suggesting that the parameters oblique model demonstrated significantly better fit to
reflecting the factor loadings, factor covariances, and the data than the one-factor model, ␹2diff(10) ⫽
error variances are invariant across the two samples. 584.28, p ⬍ .001. Also, as shown in Table 4, the fit
This is particularly strong evidence of measurement of the less constrained four-factor model (in which
invariance (Vandenberg, 2002) and provides additional the negative affect items load on all four PHQ fac-
validity evidence for the PHQ. tors) was generally similar to the five-factor model on
a number of the fit indices (e.g., chi-square, GFI).
CFA of PHQ and Negative Affect Items However, the latter had a lower ␹2/df ratio (2.20) and
(Sample 1) better parsimonious fit indices (parsimonious good-
ness-of-fit index [PGFI] ⫽ .64; parsimonious normed
To further assess the discriminant validity of the fit index [PNFI] ⫽ .68) than the former (␹2/df ratio ⫽
PHQ, we compared the fit of a series of CFA models 2.41; PGFI ⫽ .58; PNFI ⫽ .62) suggesting that
comprising the four PHQ factors and a measure of although negative affect and the PHQ are correlated,
negative affect. Fit indices for the tested models are allowing negative affect items to directly load on
presented in Table 4. As expected, the five-factor the PHQ factors does not improve—and actually
Table 4
Fit Indices for the Confirmatory Factor Analysis Models Including the Physical Health Questionnaire
(PHQ) and Negative Affect Items
Model ␹2 df RMSEA GFI CFI PGFI
a
1-factor 937.07** 170 .170** .62 .52 .50
4-factorb 352.54** 146 .089** .84 .87 .58
5-factorc 352.79** 160 .084** .83 .88 .64
Note. RMSEA ⫽ root-mean-square error of approximation; GFI ⫽ goodness-of-fit index; CFI ⫽ comparative fit index;
PGFI ⫽ parsimonious goodness-of-fit index.
a
This model comprises all PHQ and negative affect items fixed to load on a single factor. b This model comprises the four
PHQ factors, with negative affect items fixed to load on all four factors. c This model comprises the four PHQ factors and
the negative affect items fixed to load on separate factors.
** p ⬍ .001.
reduces—model fit. These results demonstrate that Internal Consistency Analyses

negative affect and physical health (as measured by
the PHQ) are associated but distinct constructs. Internal consistency analyses of the four PHQ sub-
Standardized parameter estimates representing the scales revealed the following Cronbach’s alpha val-
association between negative affect and the PHQ ues for Samples 1 and 2, respectively: .84 and .86 for
dimensions were ␤ ⫽ .47 (Sleep Disturbance), ␤ ⫽ the Gastrointestinal Problems subscale, .84 and .86
.33 (Headaches), ␤ ⫽ .38 (Gastrointestinal Prob- for the Headaches subscale, .79 and .84 for the Sleep
lems), and ␤ ⫽ .39 (with Respiratory Illness).5 These Disturbance subscale, and .66 and .61 for the Respi-
associations support our hypotheses and provide con- ratory Infections subscale.
vergent validity evidence for the PHQ.
Correlations Between the PHQ and GHQ Discussion

Zero-order correlations between psychological The results of this study extend the results of Study
health and the PHQ dimensions were examined and,
1 in several ways. First, the CFAs replicated the
as hypothesized, revealed significant associations be-
factor structure of the PHQ that was observed using
tween these variables in both samples. Specifically,
EFA in Study 1. The hierarchical CFA results ex-
the correlations between psychological health and the
tended these findings by demonstrating that a hierar-
four PHQ dimensions for Samples 1 and 2 were,
chical solution—in which the four types of somatic
respectively, .36 and .45 (Gastrointestinal Problems),
.43 and .42 (Headaches), .50 and .62 (Sleep Distur- symptoms represent first-order factors and general
bance), and .27 and .23 (Respiratory Infections). physical health represents the second-order factor—
provides a good representation of the structure of the
PHQ. Furthermore, this solution was cross-validated
Correlations Between the PHQ and Job by demonstrating the invariance of the parameters in
Performance samples of social service employees and health care
workers. Second, the CFAs including negative affect
Zero-order correlations between the PHQ dimen-
sions and self-reported job performance were also and PHQ items demonstrated the superior fit of the
examined.6 As expected, no significant correlations
emerged. Specifically, the correlations between these 5
The zero-order correlations between these variables are
variables were, for Samples 1 and 2, respectively, .10 .43, .29, .35, and .31 (all significant at p ⬍ .001).
and .002 (Gastrointestinal Problems), .06 and ⫺.04 6
Many items on the job performance scale would not
(Headaches), .05 and .001 (Sleep Disturbance), and apply to employees who have no contact with patients or
clients, resulting in somewhat higher levels of missing data
.09 and ⫺.12 (Respiratory Infections). The lack of on this scale than others in the study. Therefore, these
correlation provides discriminant validity evidence analyses are based on sample sizes of N ⫽ 182 for Sample
for the PHQ. 1 and N ⫽ 193 for Sample 2.
model in which the four PHQ dimensions and nega- Method

tive affect represented correlated yet empirically dis-
tinct factors, providing construct validity evidence. Participants
Third, the significant correlations between the GHQ
This study used data from two samples. Both samples
and PHQ dimensions and the nonsignificant correla- comprised students enrolled in introductory psychology
tions between the measure of job performance and classes at the University of Guelph (in Ontario, Canada)
the PHQ dimensions provide additional evidence of who were invited to participate in research (or complete a
construct validity. Finally, the reliability analyses small assignment if they preferred to not participate in
research) in exchange for course credit. The survey included
showed that three of the four subscales exhibited measures of a number of variables related to students’
internal consistency coefficients of ␣ ⫽ .79 or employment experiences, attitudes, behavior, and well-be-
greater. ing. Data were collected from two different samples of
students, 1 year apart. Sample 1 consisted of 136 partici-
The most significant limitation in these results with
pants, 82.9% of which were women. The average age was
respect to the validity of the PHQ is the relatively low 20.41 years (SD ⫽ 3.68). Participants came from various
levels of internal consistency exhibited by the items programs, the most prevalent of which were psychology
on the Respiratory Illness factor (␣ ⫽ .66 in the (44.7%), biological sciences (9.8%), family and social re-
lations (6.5%), and commerce (5.7%). Seven respondents
Study 1 sample and .66 and .61 in Samples 1 and 2 of did not provide complete data on the PHQ, resulting in a
Study 2, respectively). We believe that one explana- final sample of N ⫽ 129.
tion for the low reliability lies with the nature of the Sample 2 consisted of 198 participants, 81.8% of which
response format used for these three items. As shown were women. The average age was 19.38 years (SD ⫽
3.05). As with Sample 1, students were enrolled in a variety
in the Appendix, the response options for these items of programs, including biological sciences (17.3%), psy-
differ from one another and those of the other 11 chology (16.3%), family and social relations (8.7%), and
items on the scale. This may create some difficulty English (6.1%). Following the deletion of 11 respondents
with missing data on the PHQ, the final sample was N ⫽
for respondents, resulting in increased levels of
187.
measurement error in their responses and, in turn,
attenuating internal consistency of these items. In
addition, we also discussed the concern with Measure
higher levels of missing data occurring for Item 14
as compared with the other scale items and sug- The PHQ, with revisions made to the wording and re-
sponse format of Items 12–14, was used to assess somatic
gested that this was also due to the nature of the health. The wording of these items was changed so that the
response options. In Study 3 we sought to address response options used for the other scale items—a 7-point
these concerns by investigating the impact of mod- scale ranging from 1 (not at all) to 7 (all of the time)—
ifications to the wording of these items and their would also apply to Items 12–14. Below we provide the
revised items:
response options.
12. How often have you had minor colds (that made you
feel uncomfortable but didn’t keep you sick in bed or
make you miss work/school)?
Study 3
13. How often have you had respiratory infections more
The purpose of this study was to address a limita- severe than minor colds (such as bronchitis sinusitis,
tion of the PHQ that emerged from the results of etc.) that “laid you low”?
Studies 1 and 2. In both of these studies, the 3 items 14. When you have a bad cold or flu, how often does it
reflecting respiratory illness exhibited unacceptable last longer than it should?
psychometric properties. One of these items (Item
14) was particularly problematic, as its rate of miss-
ing data was higher than other scale items. In this Data Analysis
study, we revised the wording of the 3 items on this
To assess the impact of these revisions, we investigated
scale to allow for the use of the same response format three issues. First, to determine if the revisions alleviate the
as used for the other scale items and conducted a concerns with missing data on Item 14, we examined the
series of analyses to assess the impact of this revision pattern of response and nonresponse to this item. Second,
we conducted multisample CFA to cross-validate the factor
on the psychometric properties of the PHQ and, in
structure of the PHQ that was demonstrated in Studies 1 and
particular, on the 3 items composing the Respiratory 2. Finally, we assessed the internal consistency reliability of
Illness subscale. the PHQ subscales, focusing on the Respiratory Illness
subscale to determine whether the wording changes im- Internal Consistency Analyses
proved its internal consistency.
Internal consistency analyses of the four PHQ sub-
scales revealed the following Cronbach’s alpha val-
Results ues for Samples 1 and 2, respectively: .84 and .86 for
the Gastrointestinal Problems subscale, .90 in both
Examination of Pattern of Missing Data samples for the Headaches subscale, .81 in both
samples for the Sleep Disturbance subscale, and .70
In Sample 1 there was only one case with missing and .77 for the Respiratory Illness subscale. The
data on Item 14, and in Sample 2 there were no cases values for the Respiratory Illness subscale are partic-
with missing data. Across all PHQ items, in Sample ularly notable as they represent a substantial im-
1, there were 8 items with no missing data (Items 1, provement in reliability compared with the values
2, 4, 7, 8, 9, 11, and 12), 4 items with one case observed in Studies 1 and 2, prior to the revision of
missing data (Items 3, 10, 13, and 14), and 2 items these subscale items (␣ ⬍ .70 in all three samples).
with two cases missing data (Items 5 and 6). In
Sample 2, there were 5 items with no missing data Discussion
(Items 8 –11 and 14), 7 items with one case missing
data (Items 1, 3, 4, 5, 7, 12, and 13), 1 item with two This study was undertaken to evaluate the effects
cases missing data (Item 6), and 1 item with three of revisions made to the wording and response format
cases of missing data (Item 2). Therefore, the data of the Respiratory Illness items on the PHQ, the
suggest that the revision to Item 14 reduced the rate purpose of which was to address problems with miss-
of nonresponse to this item. Moreover, the pattern of ing data and reliability that emerged in Studies 1 and
missing data across all PHQ items appears to be 2. Results of this study represent three important
nonsystematic. contributions of the revisions made to the PHQ Items
12–14 and, more generally, to the validity of the
PHQ. First, the revisions contributed to a reduction in
missing data on Item 14. Nonresponse to this item
Multisample Hierarchical CFA of PHQ Items
was problematic not only because it reduced sample
Before conducting the CFA on Samples 1 and 2, size (and, therefore, statistical power) but, more sig-
we examined the data and found no violations of the nificantly, because the pattern of nonresponse sys-
assumptions underlying the analyses. In addition, the tematically excluded the data of healthier respon-
case-to-variable ratios of the analyses conducted are dents (because the item was more likely to result in
the nonresponse of someone experiencing none of the
within the acceptable range (e.g., between 5:1 and
symptoms indicated by the item). Therefore, that the
10:1; Bentler & Chou, 1987). The CFAs were based
wording change led to a substantial reduction in
on the covariance matrix of the PHQ items and were
missing data suggests that it should serve to enhance
conducted using maximum likelihood estimation as
both sample size and representativeness in research
implemented in LISREL 8 (Jöreskog & Sörbom,
using the revised PHQ.
1996). The model was first freely estimated in both
Second, the results suggest that the wording revi-
samples exhibiting acceptable fit, ␹2(146, n ⫽ 129 sion improved the internal consistency of the Respi-
[Sample 1], n ⫽ 187 [Sample 2]) ⫽ 280.53, p ⬍ ratory Illness subscale items. Prior to the revision, its
.001; RMSEA ⫽ .072, ns; GFI ⫽ .92; CFI ⫽ .94, internal consistency was less than ␣ ⫽ .70, whereas
with all hypothesized parameters significant in the following the revision it exceeded ␣ ⫽ .70. This level
expected directions. A second model was then esti- of reliability suggests that the amount of error with
mated with the Sample 2 parameters (factor loadings, which the PHQ measures the four dimensions of
covariances, and error variances) fixed to be invariant somatic health falls within conventionally acceptable
to those of Sample 1 and also exhibited acceptable fit, levels.
␹2(178, n ⫽ 129 [Sample 1], n ⫽ 187 [Sample 2]) ⫽ Finally, the CFA results show that the hierarchical
322.55, p ⬍ .001; RMSEA ⫽ .065, ns; GFI ⫽ .91; factor structure consisting of four first-order factors
CFI ⫽ .93. Model comparison showed a nonsignifi- and one second-order general physical health factor
cant difference between the fit of these two models, extends to the revised PHQ. Furthermore, because
␹2diff(32) ⫽ 42.02, ns, demonstrating the invariance the data in this study came from a sample of under-
of the parameters between the two samples. graduate students, the results also demonstrate the
generalizability of the structure of the PHQ to a ful association between the constructs, yet low
population of undergraduate students (albeit a select enough to suggest that the PHQ is not redundant with
population thereof). the GHQ. On a more substantive level, these corre-
lations suggest that it is meaningful and important to
General Discussion separately measure somatic and psychological health
as two different manifestations of strain when a com-
This research was undertaken to evaluate the struc- prehensive understanding of participants’ reported
ture and psychometric properties of the PHQ. The health is required. The absolute values of the corre-
factor analysis results across the three studies con- lations between the PHQ dimensions and self-re-
verged in suggesting that a model consisting of four ported job performance ranged from .001 to .12,
first-order factors composed of Gastrointestinal Prob- providing evidence of the PHQ’s discriminant
lems, Headaches, Sleep Disturbances, and Respira- validity.
tory Illness, and a second-order factor composed of Overall, the internal consistency analyses of the
general physical health, most closely reflected the PHQ subscale items suggest that all four subscales
data. The EFA solution in Study 1 showed that the can be reliably measured with the PHQ. In Studies 1
factors were empirically distinguishable and ac- and 2, the Respiratory Illness subscale exhibited
counted for substantial amounts of item variance. The lower internal consistency than the other three sub-
CFA results in Study 2 replicated the EFA results and scales (i.e., ␣ ⬍ .70). Following these studies, we
further established the superiority of the model hy- posited that the different response format used for the
pothesizing four oblique first-order factors and a items on this subscale was responsible for this low
higher-order general factor over orthogonal and com- internal consistency. The results of Study 3—in
mon factor models. which the effects of modifications to the items’ word-
The demonstration of the factor structure of the ing and response format were assessed—substanti-
PHQ using EFA on a sample of health care staff ated our position, as the internal consistency coeffi-
(Study 1) and its cross-validation using CFA on cients exceeded ␣ ⫽ .70 in both samples.
samples of social services employees, health care
staff (Study 2), and undergraduate students (Study 3)
supports the robustness and generalizability of the Limitations and Directions for Future
scale’s factor structure and provides strong evidence Research
of its construct validity. However, it remains for
future research to examine whether the dimensional- The data in this study are based entirely on self-
ity of the PHQ generalizes to other populations and is reports, which makes it impossible to objectively
equivalent for both men and women. verify the presence of physical symptoms reported by
The analyses examining the relationships between participants using the PHQ. Despite this and related
negative affect and the PHQ dimensions demon- concerns with self-reports of physical symptoms, the
strated that they are significantly associated and em- available data suggest that alternatives to self-reports,
pirically distinct. The significant associations be- such as physiological measures, are themselves
tween negative affect and physical health support our fraught with limitations and may be less accurate
hypotheses and are consistent with previous research than self-reports (Fried, Rowland, & Ferris, 1984).
(e.g., Cohen, 1996; Watson & Clark, 1984). The CFA Furthermore, most stress process models (see Kahn
results also demonstrate that the PHQ factors are & Byosiere, 1990, for a review) emphasize the im-
empirically distinguishable from negative affect, portance of people’s perceptions and subjective ex-
which is important given concerns that have appeared periences, which self-reports are often deemed to
in the literature regarding the potential confounding measure. Nonetheless, future research that correlates
influence of negative affect on self-reports of stress more objective measures of somatic health (e.g.,
and strain (e.g., Brief et al., 1988). medical records) with the PHQ dimensions would
The significant correlations between the PHQ di- provide additional evidence of its construct validity.
mensions and the GHQ and the lack of correlation Given the growing literature on the effects of sur-
between the PHQ and a measure of self-reported job vey characteristics including response format on sur-
performance provide additional evidence of the vey responses (see Schwarz, 1999, for a review),
scale’s construct validity. The correlations between research examining the effects and appropriateness of
the PHQ dimensions and the GHQ ranged from .23 to different response alternatives would also be useful.
.62, which are high enough to demonstrate meaning- The present version of the PHQ uses general frequen-
cy-based response options, ranging from not at all to As Spector and Jex (1998) noted, there are hundreds
all of the time. Other response formats that merit of measures used in the occupational health literature,
investigation would include options based on specific yet very few articles in this literature on scale devel-
frequencies (e.g., ranging from never to daily) or opment and validation. Therefore, there are many
symptom intensity (e.g., ranging from mild to measures being used for which limited validity evi-
severe). dence is available. Although comprehensive scale
A number of investigators (e.g., Bollen & Lennox, development and validation is desirable prior to
1991; Fayers, Hand, Bjordal, & Groenvold, 1997; widespread use of the scale, post hoc validation of
Spector & Jex, 1998) have suggested the importance existing measures is superior to the absence thereof,
of distinguishing between causal and effect indicators and we encourage this type of research on measures
in measurement. Traditional psychometric theory is for which validity evidence is lacking. The accumu-
based on the premise that the latent variable deter- lation of the evidence derived from such research is
mines the indicators, and therefore, assumes that all important on several levels: (a) It ensures that the
items are effect indicators. With the causal approach, measures used are valid; (b) it renders the knowledge
however, this is reversed, whereby the items deter- derived from this research more accurate; and, in
mine the construct. Bollen and Lennox suggested that turn, (c) it increases the likelihood that interventions
the type of indicator on which a measure is based can based on such knowledge will be effective.
influence the type of measurement model that is used
for validation purposes. For example, they argued
that factor and reliability analyses are appropriate for Conclusion
effect but not causal indicator measures.
To date, the distinction between causal and effect Taken together, our results suggest that the PHQ is
indicators has not been widely adopted in the orga- a psychometrically sound instrument that can be used
nizational research literature (Spector & Jex, 1998), to measure four dimensions of somatic health: gas-
and the traditional psychometric approach continues trointestinal problems, headaches, sleep disturbances,
to be the norm for scale validation. Accordingly, we and respiratory illness. In addition, composed of only
elected to apply the traditional psychometric ap- 14 items, the PHQ is a rather brief measure of so-
proach in our assessment of the validity of the PHQ. matic health. With most organizational research stud-
We acknowledge, however, that a number of the ies involving complex, multivariate designs, instru-
PHQ items may be appropriately classified as causal ments that are both efficient to administer and
indicators, and we suggest that researchers should psychometrically sound are becoming increasingly
investigate the potential implications of this classifi- important (Stanton, Sinar, Balzer, & Smith, 2002).
cation for the PHQ and other organizational Although future research that further examines the
measures. psychometric properties of the scale is required, the
Despite the validity evidence that we present for PHQ makes an important contribution to research in
the PHQ, we note that this series of studies does not organizational behavior, occupational health psy-
fully establish its construct validity, and additional chology, social psychology, and other disciplines in
evidence of convergent and discriminant validity is which a valid self-report measure of physical health
needed. Establishing construct validity is a continu- is required.
ous process that unfolds over time with numerous
studies (e.g., Nunnally & Bernstein, 1994; Schwab,
References
1980), and therefore, a single study— or even several
studies such as we report here— can contribute lim- Banks, M. J., Clegg, C. W., Jackson, P. R., Kemp, N. J.,
ited validity evidence. The validity evidence we Stafford, E. M., & Wall, T. D. (1980). The use of the
present is based on data collected using a single General Health Questionnaire as an indication of mental
method (i.e., self-reports). According to the classic health in occupational settings. Journal of Occupational
Psychology, 53, 187–194.
guidelines suggested by Campbell and Fiske (1959), Barling, J., & Kelloway, E. K. (1996). Job insecurity and
further evidence of convergent and discriminant va- health: The moderating role of workplace control. Stress
lidity could be provided through the use of different Medicine, 12, 253–260.
methods. Indeed, we encourage additional validation Bartone, P. T., Ursano, R. J., Wright, K. M., & Ingraham,
L. H. (1989). The impact of a military air disaster on the
research on the PHQ and other occupational health health of assistance workers. Journal of Nervous and
measures whose construct validity and psychometric Mental Disease, 177, 317–328.
properties have received too little research attention. Bentler, P. M., & Chou, C. P. (1987). Practical issues in
structural equation modeling. Sociological Methods and Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8 user’s
Research, 16, 78 –117. reference guide. Chicago: Scientific Software International.
Bollen, K., & Lennox, R. (1991). Conventional wisdom on Judd, C. M., & McClelland, G. H. (1998). Measurement. In
measurement: A structural equation perspective. Psycho- D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The
logical Bulletin, 110, 305–314. handbook of social psychology (4th ed., Vol. 1, pp.
Brief, A. P., Burke, M. J., George, J. M., Robinson, B., & 180 –232). Boston: McGraw-Hill.
Webster, J. (1988). Should negative affectivity remain an Kahn, R. L., & Byosiere, P. (1990). Stress in organizations.
unmeasured variable in the study of job stress. Journal of In M. D. Dunnette & L. M. Hough (Eds.), Handbook of
Applied Psychology, 73, 193–199. industrial and organizational psychology (2nd ed., Vol.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and 3, pp. 571– 650). Palo Alto, CA: Consulting Psycholo-
discriminant validation by the multitrait–multimethod gists Press.
matrix. Psychological Bulletin, 56, 81–105. Karasek, R. A., & Theorell, T. (1990). Healthy work: Stress,
Carayon, P., Smith, M. J., & Haims, M. C. (1999). Work productivity and the reconstruction of working life. New
organization, job stress, and work-related musculoskele- York: Basic Books.
tal disorders. Human Factors, 41, 644 – 663. Kelloway, E. K. (1998). Using LISREL for structural equa-
Cohen, S. (1996). Psychological stress, immunity, and up- tion modeling: A researcher’s guide. Thousand Oaks,
per respiratory infections. Current Directions in Psycho- CA: Sage.
logical Science, 5, 86 –90. Kiecolt-Glaser, J. K., & Glaser, R. (1988). Psychological
Cohen, S., & Herbert, T. B. (1996). Health psychology: influences on immunity: Making sense of the relationship
Psychological factors and physical disease from the per- between stressful life events and health. In G. P. Chrou-
spective of human psychoneuroimmunology. Annual Re- sos, D. L. Loriaux, & P. W. Gold (Eds.), Mechanisms of
view of Psychology, 47, 113–142. physical and emotional stress (pp. 237–247). New York:
Cohen, S., & Williamson, G. M. (1991). Stress and infec- Plenum Press.
tious disease in humans. Psychological Bulletin, 109, Krantz, D. S., Contrada, R. J., Hill, D. R., & Friedler, E.
5–24. (1988). Environmental stress and biobehavioral anteced-
Cooper, C. L., Sloan, S., & Williams, S. (1988). Occupational ents of coronary heart disease. Journal of Consulting and
stress indicator. Windsor, England: NFER-Nelson. Clinical Psychology, 3, 333–341.
Cronbach, L. J. (1951). Coefficient alpha and the internal Lundberg, U., Dohns, I. E., Melin, B., Sandjo, L., Palmerud,
G., Kadefors, R., et al. (1999). Psychophysiological
structure of tests. Psychometrika, 6, 297–334.
stress responses, muscle tension, and neck and shoulder
Cronbach, L. J., & Meehl, P. A. (1955). Construct validity
pain among supermarket cashiers. Journal of Occupa-
in psychological tests. Psychological Bulletin, 32,
tional Health Psychology, 4, 245–255.
281–302.
Manning, M. R., Jackson, C. N., & Fusilier, M. R. (1996).
Davis, M. C., Matthews, K. A., Meihlahn, E. N., & Kiss,
Occupational stress and health care use. Journal of Oc-
J. E. (1995). Are job characteristics related to fibrogen
cupational Health Psychology, 1, 100 –109.
levels in middle aged women? Health Psychology, 14,
Motowidlo, S. J., & Packard, J. S., & Manning, M. R.
310 –318. (1986). Occupational stress: Its causes and consequences
Evans, G. W., & Johnson, D. (2000). Stress and open-office for job performance. Journal of Applied Psychology, 71,
noise. Journal of Applied Psychology, 85, 779 –783. 618 – 629.
Fayers, P. M., Hand, D. J., Bjordal, K., & Groenvold, M. Nunnally, J. C., & Bernstein, I. (1994). Psychometric theory
(1997). Causal indicators in quality of life research. (3rd ed.). New York: McGraw-Hill.
Quality of Life Research, 6, 393– 406. O’Leary, A. (1990). Stress, emotion, and human immune
Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in function. Psychological Bulletin, 108, 363–382.
the development and refinement of clinical assessment Pratt, L. I., & Barling, J. (1988). Differentiating between
instruments. Psychological Assessment, 7, 286 –299. daily events, acute and chronic stressors: A framework
Fried, Y., Rowland, K. M., & Ferris, G. R. (1984). The and its implications. In J. J. Hurrell Jr., L. R. Murphy,
physiological measurement of work stress: A critique. S. L. Sauter, & C. L. Cooper (Eds.), Occupational stress:
Personnel Psychology, 37, 583– 615. Issues and developments in research (pp. 41–53). New
Ganster, D. C., Fox, M. L., & Dwyer, D. J. (2001). Explain- York: Taylor & Francis.
ing employees’ health care costs: A prospective exami- Quick, J. C., Quick, J. D., Nelson, D. L., & Hurrell, J. J., Jr.
nation of stressful job demands, personal control and (2001). Preventive stress management in organizations.
psychological reactivity. Journal of Applied Psychology, Washington, DC: American Psychological Association
86, 954 –964. Rasmussen, B. K. (1993). Migraine and tension-type head-
Gerbing, D. W., & Hamilton, J. G. (1996). Viability of ache in a general population: Precipitating factors, fe-
exploratory factor analysis as a precursor to confirmatory male hormones, sleep pattern and relation to lifestyle.
factor analysis. Structural Equation Modeling, 3, 62–72. Pain, 53, 65–72.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, Rogers, K., & Kelloway, E. K. (1997). Violence at work:
NJ: Erlbaum. Personal and organizational outcomes. Journal of Occu-
Herbert, T. B., & Cohen, S. (1993). Stress and immunity in pational Health Psychology, 2, 63–71.
humans: A meta-analytic review. Psychosomatic Medi- Schat, A. C. H., & Kelloway, E. K. (2000). The effects of
cine, 55, 364 –379. perceived control on the outcomes of workplace aggres-
Hinkin, T. R. (1995). A review of scale development prac- sion and violence. Journal of Occupational Health Psy-
tices in the study of organizations. Journal of Manage- chology, 4, 386 – 402.
ment, 21, 967–988. Schat, A. C. H., & Kelloway, E. K. (2003). Reducing the
adverse consequences of workplace aggression and vio- Spence, J. T., Helmreich, R. L., & Pred, R. S. (1987).
lence: The buffering effects of organizational support. Impatience versus achievement strivings in the Type A
Journal of Occupational Health Psychology, 8, 110 –122. pattern: Differential effects on students’ health and aca-
Schaubroeck, J., Jones, J. R., & Xie, J. L. (2001). Individual demic performance. Journal of Applied Psychology, 72,
differences in using control to cope with job demands: 522–528.
Effects on susceptibility to infectious disease. Journal of Stanton, J. M., Sinar, E. F., Balzer, W. K., & Smith, P. C.
Applied Psychology, 86, 265–278. (2002). Issues and strategies for reducing the length
Schwab, D. P. (1980). Construct validity in organizational of self-report scales. Personnel Psychology, 55,
behavior. In B. M. Staw & L. L. Cummings (Eds.), 167–194.
Research in organizational behavior (Vol. 2, pp. 3– 43). Stewart, W., & Barling, J. (1996). Daily work stress, mood,
Greenwich, CT: JAI Press. and interpersonal job performance: A mediational model.
Schwartz, J. E., Pickering, T. G., & Landsbergis, P. A. Work and Stress, 10, 336 –351.
(1996). Work-related stress and blood pressure: Current
Stone, G. C., Reed, B. R., & Neale, J. M. (1987). Changes
theoretical models and considerations from a behavioral
in daily event frequency precedes episodes of physical
medicine perspective. Journal of Occupational Health
symptoms. Journal of Human Stress, 13, 70 –74.
Psychology, 1, 287–310.
Schwarz, N. (1999). Self-reports: How the questions shape Tabachnick, B. G., & Fidell, L. S. (1996). Using multivar-
the answers. American Psychologist, 54, 93–105. iate statistics (3rd ed.). New York: HarperCollins.
Spector, P. E., & Jex, S. M. (1998). Development of four Vandenberg, R. J. (2002). Toward a further understanding
self-report measures of job stressors and strain: Interper- of and improvement in measurement invariance methods
sonal Conflict at Work Scale, Organizational Constraints and procedures. Organizational Research Methods, 5,
Scale, Quantitative Workload Inventory, and Physical 139 –158.
Symptoms Inventory. Journal of Occupational Health Warr, P. (1990). The measurement of well-being and other
Psychology, 3, 356 –367. aspects of mental health. Journal of Occupational Psy-
Spector, P. E., Zapf, D., Chen, D. Y., & Frese, M. (2000). chology, 63, 193–210.
Why negative affectivity should not be controlled in job Watson, D., & Clark, L. A. (1984). Negative affectivity:
stress research: Don’t throw out the baby with the bath The disposition to experience aversive emotional states.
water. Journal of Organizational Behavior, 21, 79 –95. Psychological Bulletin, 96, 465– 490.
Appendix
Physical Health Questionnaire
The following items focus on how you have been feeling physically during the past [period of time]. Please respond by
circling the appropriate number.
Once Some
Not at in a of the Fairly All of
Over the past [period of time] . . . all Rarely while time often Often the time
difficulty getting to sleep at
night? 1 2 3 4 5 6 7
2. How often have you woken
up during the night? 1 2 3 4 5 6 7
dreams? 1 2 3 4 5 6 7
4. How often has your sleep
been peaceful and
undisturbed? 1 2 3 4 5 6 7
experienced headaches? 1 2 3 4 5 6 7
headache when there was a
lot of pressure on you to get
things done? 1 2 3 4 5 6 7
frustrated because things
were not going the way they
should have or when you
were annoyed at someone? 1 2 3 4 5 6 7
8. How often have you suffered
from an upset stomach
(indigestion)? 1 2 3 4 5 6 7
9. How often did you have to
watch that you ate carefully
to avoid stomach upsets? 1 2 3 4 5 6 7
10. How often did you feel
nauseated (“sick to your
stomach”)? 1 2 3 4 5 6 7
11. How often were you
constipated or did you suffer
from diarrhea? 1 2 3 4 5 6 7
12. How many times have you
had minor colds (that made
you feel uncomfortable but
didn’t keep you sick in bed
or make you miss work)? 0 times 1–2 time 3 times 4 times 5 times 6 times 7⫹ times
13. How many times have you
had respiratory infections
more severe than minor
colds that “laid you low”
(such as bronchitis, sinusitis,
etc.)? 0 times 1–2 times 3 times 4 times 5 times 6 times 7⫹ times
14. When you had a bad cold or
flu, how long did it typically
last? 1 day 2 days 3 days 4 days 5 days 6 days 7⫹ days
Note. Item 4 should be reverse scored. This version of the Physical Health Questionnaire was used in Studies 1 and 2. The
wording of and response alternatives for Items 12–14 were revised for Study 3 (see text for details).
Received April 10, 2003

Revision received July 11, 2004
Accepted April 18, 2005 y

Phys Health

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Phys Health

Hochgeladen von

Copyright:

Verfügbare Formate

Journal of Occupational Health Psychology Copyright 2005 by the Educational Publishing Foundation

2005, Vol. 10, No. 4, 363–381 1076-8998/05/$12.00 DOI: 10.1037/1076-8998.10.4.363

The Physical Health Questionnaire (PHQ): Construct Validation of a

Keywords: physical health, somatic symptoms, validity, measurement, work stress

Measure results in the systematic exclusion of healthier re-

Figure 1. Graphical representation of the hierarchical structure of the Physical Health

reduces—model ﬁt. These results demonstrate that Internal Consistency Analyses

Correlations Between the PHQ and GHQ Discussion

model in which the four PHQ dimensions and nega- Method

Physical Health Questionnaire

Received April 10, 2003

Das könnte Ihnen auch gefallen