Beruflich Dokumente
Kultur Dokumente
net/publication/12822068
CITATIONS READS
263 2,517
5 authors, including:
Johannes Kruse
Justus-Liebig-Universität Gießen
306 PUBLICATIONS 3,963 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Do traumatic life events have a negative impact on the efficacy of psychological treatment? View project
All content following this page was uploaded by Johannes Kruse on 08 September 2014.
ORIGINAL PAPER
Abstract Background: The treatment of mental disor- in primary care practice and research. The use of GHQ-
ders in Germany is mainly done by primary care phy- 12 or SCL-90-R, employed as a ®rst step, supplemented
sicians. Several studies have shown that primary care by a second-stage interview, may enhance the detection
physicians have diculty in diagnosing these disorders. rate of mental disorder in primary care settings.
Recently, several self-report questionnaires have been
developed that can be used as screening instruments to
identify psychopathology in primary care settings and in Introduction
the community. The aim of this paper was to investigate
the screening properties of the General Health Ques- The problem of identifying mental disorders is increas-
tionnaire (GHQ-12) and the Symptom Check-List (SCL- ingly recognised as an important health care issue.
90-R) in a primary care setting in Germany. Method: A Mental disorders result in substantial patient suering
randomly selected sample (n = 408) of adult outpatients and health care cost and are present in primary care in at
from 18 primary care oces in DuÈsseldorf was screened least 20±36% of primary care outpatients (Spitzer et al.
using the German versions of the GHQ-12 and the SCL- 1994; Tress et al. 1997). In fact, more patients with
90-R. A structured diagnostic interview (SCID) and an mental disorders are cared for in the primary care sector
impairment rating (IS) were used as a gold standard to than in the mental health sector (e.g. Manderscheid
which both questionnaires were compared. Test perfor- et al. 1993). However, several studies have shown that
mance was evaluated by receiver operating characteristic primary care physicians have diculty in diagnosing
(ROC) analysis. Results: We found no dierence in the these disorders in the majority of patients, who usually
performance of the general scores of the two question- present with somatic symptoms suggestive of a medical
naires. Both instruments were able to detect cases. condition, while volunteering few psychological com-
Complex scoring methods oered no advantages over plaints. As Spitzer et al. (1994) pointed out, major ob-
simpler ones for the GHQ-12. ROC analysis con®rmed stacles to the recognition of mental disorders by primary
that the SCL-90-R subscales ``anxiety'' and ``depres- care physicians include inadequate knowledge of the
sion'' showed acceptable concurrent validity for the diagnostic criteria, uncertainty about the best question
diagnostic groups anxiety and depression (according to to ask for evaluating whether those criteria are met, and
DSM-III-R). Conclusions: GHQ-12 and SCL-90-R ap- time limitations inherent in a busy oce setting.
peared to be useful tools for identifying mental disorders Katzelnick et al. (1997) have shown that identi®cation
and treatment of mental disorders in primary care can
reduce disability and health care utilisation and improve
quality of life.
N. Schmitz (&) á J. Kruse á C. Heckrath á L. Alberti á W. Tress
Recently, several self-report questionnaires have been
Clinic for Psychosomatic Medicine and Psychotherapy, developed that can be used as screening instruments to
Heinrich-Heine University, identify psychopathology in primary care settings and in
Bergische Landstrasse 2, H19, the community. Examples are the General Health
D-40605 DuÈsseldorf, Questionnaire (GHQ, Goldberg 1972) and the Symptom
Germany
e-mail: schmitzn@uni-duesseldorf.de, Check-List (SCL-90-R, Derogatis 1977; Franke 1995).
Tel.: +49-211-9224723, Both SCL-90-R and GHQ are well-researched instru-
Fax: +49-211-9224709 ments and are frequently used for case identi®cation
* Grant support for this investigation was given by the German (e.g. Gureje and Obikoye 1990; Koeter 1992; Witnitzer
Federal Ministry for Research and Technology (BMFT). et al. 1992; Lykouras et al. 1996). Once an appropriate
361
cut-o point has been chosen, the questionnaires can be Impairment Score
used as screening devices for the detection of psycho-
The Impairment Score (IS, German: BSS, Schepank 1995; Franz
logical distress. et al. 1997) is a standardised instrument which allows trained and
A valid comparison between the German version of clinically experienced interviewers to assess the severity ± ranging
the GHQ-12 and the SCL-90-R cannot be made because from ``not at all'' (0) to ``extremely'' (4) ± of clinically present
to date the two instruments have not been examined psychological impairment on three subscales: physical, psychic, and
socio-communicative (behaviour) impairment due only to mental
simultaneously using the same case criterion and the disorders (not caused by somatic reasons). The total range of the
same population. sum-score is from 0 (no impairment) up to the maximum value 12
The aim of the present paper was to compare the (extreme impairment); patients with a sum-score of 4, 5 or 6 can be
criterion validity of SCL-90-R (90 items) with the cri- described as medium symptomatic while patients with a score
terion validity of a short form of GHQ (GHQ-12, 12 above 7 can be described as severely symptomatic.
items) in a primary care setting, while the external cri-
terion is the presence or absence of psychopathology as Structured Clinical Interview
indicated by an experienced psychotherapist. Receiver
The Structured Clinical Interview (SCID, Wittchen et al. 1990) for
operating characteristic (ROC) analysis was used as an the Diagnostic and Statistical Manual of Mental Disorders (DSM-
evaluation technique for these two tests, using data from III-R) is a diagnostic instrument that is widely used by researchers
an epidemiological investigation in primary care (Tress and clinicians in order to guide the diagnostic evaluation process.
et al. 1997). SCID ascertains the presence and severity of psychological signs
and symptoms in the 4 weeks prior to the interview.
Sensitivity is de®ned as the number of true cases of a disorder correlation was 0.64 for the total scores, suggesting that
detected by the test (true-positives) divided by the number of all a general factor for the two instruments may be present.
diseased subjects. Conversely speci®city is de®ned as the number of
true-negatives (nondiseased subjects who were considered negative Figure 1 shows the distribution of the Global Severity
by the test), divided by the number of all nondiseased subjects. Index (GSI) for the cases and non-cases. As expected,
Positive predictive value is de®ned as the proportion of true cases there are large overlapping parts of the empirical distri-
who are correctly diagnosed, while negative predictive value is butions, although the distributions are dierent for the
de®ned as the proportion of nondiseased subjects who are correctly
diagnosed. The predictive values are clinically useful but depend
®rst and second moments. As shown in Fig. 2, similar
very strongly on prevalence. results can be found for the GHS General Score.
With the introduction of receiver operating characteristic Dierences in the total scores can be found for the
(ROC) analysis, an innovative method has become available for the cases regarding gender: women are characterised by
graphic description of the relationship between sensitivity and higher scores for the global indices (SCL-90-R: female
speci®city and their relationship to dierent cut-o points. A ROC
curve is obtained by combining sensitivity and speci®city data over mean = 0.94 (SD = 0.65), male mean = 0.71
all cut-o points. At each cut-o point, sensitivity is plotted as a (SD = 0.44), t = 2.01, df = 140, P = 0.037; GHQ:
function of 1 ) speci®city (or false positive rate). The points can be female mean = 5.07 (SD = 4.12), male mean = 3.68
connected by a smooth curve. For a perfectly accurate test (sensi- (SD = 3.42) t = 1.91, df = 140, P = 0.059), while
tivity = 1, speci®city = 1), the ROC curve is a horizontal line
connecting the points (0, 1) and (1, 1). For a random test whose there are no signi®cant dierences for the population of
discriminatory ability is no better than chance, the ROC curve will non-cases.
be a diagonal line connecting the points (0, 0) and (1, 1). This has The most prevalent items of the SCL-90-R and the
been referred to as the ``line of no information''. Most actual tests GHQ are shown in Table 2. To make the prevalence of
will produce curves lying between these two extremes. The higher
the sensitivity and speci®city at various cut-o points, the more
the items comparable, the items of the SCL-90-R were
closely the curve approaches the upper left corner of the graph. A dichotomised according to the scaling of the GHQ.
measure used for the overall performance of an instrument is the Somatic symptoms seemed to be common in this sample,
area under the curve (AUC). Parametric and nonparametric but were not more prevalent than psychological symp-
methods exist that allow the calculation of the AUC and the toms. Similar results were found by Araya et al. (1992)
comparison of tests. In this study, we used a computer program
(ROCFIT, Metz et al. 1993), which ®ts a curve by a maximum in a primary care sample in Chile.
likelihood technique. Figure 3 presents the performance of the SCL-90-R
and GHQ-12 total scores as screening instruments for
mental disorders in primary care. The curves were ob-
Results tained by plotting sensitivity against false-positive rate
for all cut-o points of the two screening tests and ®tting
Preliminary analysis showed that there was a linear re- a smooth likelihood curve. Both scores show acceptable
lationship between the GHQ-12 and the SCL-90-R. The concurrent validity (i.e. ROC curves well above the di-
agonal line of no information). Moreover, the ROC ®cant dierence between the areas under the curves [SCL-
curve for the SCL-90-R total score (GSI) is nearly 90-R: mean AUC = 0.75 (SD = 0.026); GHQ: mean
identical to that of the GHQ total score. AUC = 0.73 (SD = 0.028); Z = 0.73, P = 0.464]. No
A numerical presentation for sensitivity, speci®city and signi®cant dierence was found between ROC curves for
predictive values is provided in Table 3 and Table 4. The GHQ bimodal and Likert scaling procedure.
dierences between SCL-90-R and GHQ were small in this Dierences between the performance of the two
German primary care sample. A two-sample Z-test (two- questionnaires were, therefore, probably due to chance.
tailed) was applied to test the dierence for statistical In a second step we examined validity of the SCL-90-
signi®cance (Erdreich and Lee 1981). There was no signi- R anxiety and depression subscales. Validity was es-
tablished with the DSM-III-R diagnosis for anxiety mental disorders in a sample of 18 randomly selected
(300.00±02; 300.21±23; 300.29±30; 309.89) and depres- primary care clinics in DuÈsseldorf, Germany. The small
sion (296.20±23; 296.29; 300.40; 311.00) disorders. Both dierence between the questionnaires in their ability to
scales show acceptable concurrent validity for the two detect cases of psychological morbidity was not statis-
diagnostic groups (i.e. ROC curves well above the main tically signi®cant. The two screening instruments were
diagonal). Results are shown in Table 5. In comparison, found to be easy to administer, although there was a
the GHQ-12 was not designed to screen for dierent dierence in the time spent by patients to complete them
diagnostic groups. Although several studies have found (roughly 2±5 min for the GHQ-12 and 10±20 min for the
a two-factor structure of the GHQ-12 (e.g. Gureje 1991), SCL-90-R). The questions in both questionnaires were
it is more appropriate to use the GHQ-12 as a global well understood by the respondents.
screening instrument. Alternatively, another version of In our opinion, the SCL-90-R compared to the GHQ-
the GHQ (GHQ-28; 28 items with anxiety and depres- 12 has some de®nite advantages. The SCL-90-R covers a
sion subscales) can be used to screen for depression and broad range of psychological problems and symptoms of
anxiety. psychopathology. A global index as a measure for gen-
eral distress and nine primary symptom dimensions can
be computed. The analysis of the anxiety and depression
Discussion subgroups indicated that the subscales can be used as
screening instruments, too. In contrast, the GHQ-12
This is the ®rst study reporting on the comparative contains only 12 items. As a consequence, only a general
performance of the GHQ-12 and the SCL-90-R in a distress factor can be computed from the items.
German primary care setting, using identical external On the other hand, there is often a time limitation in
criteria. primary care clinics. Administration and evaluation of
The major ®nding is that the GHQ-12 and the SCL- the SCL-90-R needs much more time than the applica-
90-R general scores performed equally well in detecting tion of the GHQ-12.
Table 3 Validity coecients for the SCL-90-R at dierent classi®cation may occur due to inaccuracy of diagnoses
thresholds for all screened individuals and the denying of symptoms in questionnaires.
Threshold 0.4 0.5 0.6 0.7 0.8 0.9 Some restrictions must be kept in mind regarding the
measurement of mental disorders with the GHQ-12 and
Sensitivity 0.75 0.64 0.57 0.52 0.46 0.39 the SCL-90-R in the present study. First, the present
Speci®city 0.59 0.74 0.78 0.82 0.85 0.95 study was not designed to assess validity of the two
Positive predictive value 0.51 0.58 0.60 0.62 0.63 0.77
Negative predictive value 0.81 0.77 0.76 0.75 0.73 0.70 instruments in a primary care setting. No further self-
report measures (e.g. Inventory of Interpersonal Prob-
lems, Beck Depression Inventory, Spielberg Anxiety
Inventory, etc.) were used to study construct validity of
Table 4 Validity coecients for the GHQ at dierent thresholds the instruments.
for all screened individuals
Second, interviewing, diagnosing and rating was done
Threshold 1.5 2.5 3.5 4.5 5.5 6.5 by one mental health professional. Although this mental
health professional was supervised by a team of re-
Sensitivity 0.68 0.60 0.51 0.46 0.39 0.32 searchers (physicians and psychologists), the single in-
Speci®city 0.65 0.74 0.81 0.85 0.89 0.93
Positive predictive value 0.53 0.57 0.60 0.64 0.67 0.72 terviewer design may lack reliability and validity of the
Negative predictive value 0.78 0.76 0.74 0.73 0.72 0.71 diagnostic data (e.g. Horowitz et al. 1979).
Third, the study was conducted at 18 primary care
clinics located in DuÈsseldorf, Germany. Although the
If a primary care physician is only interested in gen- primary care physicians were selected randomly, the
eral psychological distress, the GHQ-12 questionnaire present study does not comprise a nationally represen-
can be used as a screening instrument in the primary tative sample. Dierences across cities and countries
care sector. However, if there is a need for more detailed may occur due to dierences in sociodemographics. The
information concerning depression and anxiety, the high prevalence rate of mental disorder (36.8%) needs to
GHQ-28 may be used. A more detailed diagnosis is at- be replicated in further studies.
tained by using the SCL-90-R. Nevertheless, some lim- Fourth, there was a small sample size in the diagnostic
itations of the screening instruments must be recognised. groups anxiety and depression. Comorbidity appeared in
GHQ-12 and SCL-90-R focus on breaks in normal both groups. However, our ®ndings are consistent with
function, rather than upon lifelong traits. Additionally, other studies. Sandanger et al. (1998) used the Hopkins
both questionnaires are not able to detect personality Symptom Checklist-25 (HSCL-25) in a Norwegian pop-
and adjustment disorders (e.g. Goldberg and Williams ulation survey as a screening instrument for mental dis-
1991). As a result, misclass®cation may occur. Figures 1 orders. They found a better performance for the subscales
and 2 show great overlaps of the symptom scores be- depression and anxiety than for the general score.
tween cases and non-cases. There are several cases with Despite these limitations, the goals for the ®eld testing
low scores on GHQ-12 and SCL-90-R, indicating low of the German version of the SCL-90-R and the GHQ-12
levels of psychological distress. Further analysis of these in a primary care setting were achieved. The instruments
subjects indicated that more than half of the cases with showed acceptable qualities for diagnosing mental health
diagnoses of ``adjustment disorders'' and ``psychological disorders in the primary care sector. The use of GHQ-12
factors aecting physical condition'' have GHQ sum or SCL-90-R, employed as ®rst step, supplemented by a
scores of less than 2 and GSI scores of less than 0.5. In second-stage interview, may enhance the detection rate of
fact, these diagnostic groups are not well identi®ed by mental disorder in primary care settings.
the two screening instruments. On the other hand, all There is a need for future research in this ®eld. Due to
subjects in this study were physically ill, which may re- time limitation in primary care clinics, a computerised
sult in a general distress factor (high scores on both administration of the self-report measures could be used
scales) for the non-cases, too. Further sources of mis- in primary care.