Beruflich Dokumente
Kultur Dokumente
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/301696512
CITATIONS READS
5 784
7 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Stevan Lars Nielsen on 01 May 2016.
CITATION
Nissen-Lie, H. A., Goldberg, S. B., Hoyt, W. T., Falkenström, F., Holmqvist, R., Nielsen, S. L., &
Wampold, B. E. (2016, April 28). Are Therapists Uniformly Effective Across Patient Outcome
Domains? A Study on Therapist Effectiveness in Two Different Treatment Contexts. Journal of
Counseling Psychology. Advance online publication. http://dx.doi.org/10.1037/cou0000151
Journal of Counseling Psychology © 2016 American Psychological Association
2016, Vol. 63, No. 3, 000 0022-0167/16/$12.00 http://dx.doi.org/10.1037/cou0000151
Bruce E. Wampold
This document is copyrighted by the American Psychological Association or one of its allied publishers.
As established in several studies, therapists differ in effectiveness. A vital research task now is to
understand what characterizes more or less effective therapists, and investigate whether this differential
effectiveness systematically depends on client factors, such as the type of mental health problem. The
purpose of the current study was to examine whether therapists are universally effective across patient
outcome domains reflecting different areas of mental health functioning. Data were obtained from 2 sites:
the Research Consortium of Counseling and Psychological Services in Higher Education (N ⫽ 5,828) in
the United States and from primary and secondary care units (N ⫽ 616) in Sweden. Outcome domains
were assessed via the Outcome Questionnaire-45 (Lambert et al., 2004) and the CORE-OM (Evans et al.,
2002). Multilevel models with observations nested within patients were used to derive a reliable estimate
for each patient’s change (which we call a multilevel growth d) based on all reported assessment points.
Next, 2 multilevel confirmatory factor analytic models were fit in which these effect sizes (multilevel ds)
for the 3 subscales of the OQ-45 (Study 1) and 6 subscales of CORE-OM (Study 2) were indicators of
1 common latent factor at the therapist level. In both data sets, such a model, reflecting a global therapist
effectiveness factor, yielded large factor loadings and excellent model fit. Results suggest that therapists
effective (or ineffective) within one outcome domain are also effective within another outcome domain.
Tentatively, therapist effectiveness can thus be conceived of as a global construct.
When patients seek help from a therapist, they typically hope a clinically versatile and therapeutically flexible clinician who can
that the therapist has the ability to help them with the particular treat a range of patients. Similarly, a mental health clinic would
problem or difficulty with which they struggle. Unless they know likely want to employ clinicians able to treat most concerns in the
about the therapist’s particular area of expertise, they must trust patient population they serve (see Benton, Robertson, Tseng, New-
that the therapist is able to address a range of psychological ton, & Benton, 2003).
problems. Likewise, when training therapists most universities and Does reality meet the expectation that therapists are skilled
training institutions would expect their trainees to learn to work across problem domains? In other words, is therapist effectiveness
with a range of mental health concerns when they enter the a global factor, or does it depend on the type of patient difficulty?
profession (Norcross & Beutler, 2000). Usually, the aim is to foster We will explore this question with client outcome data from the
United States and Sweden in two different investigations using
new methods to study therapist uniformity.
Previous research has persuasively demonstrated that therapists
do differ in effectiveness (Baldwin & Imel, 2013; Crits-Christoph
Helene A. Nissen-Lie, Department of Psychology, University of Oslo; & Mintz, 1991; Kim, Wampold, & Bolt, 2006; Kraus, Castonguay,
Simon B. Goldberg and William T. Hoyt, Department of Counseling Boswell, Nordberg, & Hayes, 2011; Lutz, Leon, Martinovich,
Psychology, University of Wisconsin-Madison; Fredrik Falkenström and Lyons, & Stiles, 2007; Nissen-Lie, Monsen, Ulleberg, & Røn-
Rolf Holmqvist, Department of Behavioural Sciences and Learning, nestad, 2013; Okiishi, Lambert, Nielsen, & Ogles, 2003; Wampold
Linköping University; Stevan Lars Nielsen, Department of Psychology,
& Brown, 2005). In a meta-analysis of therapist effects based on
Brigham Young University; Bruce E. Wampold, Department of Counsel-
k ⫽ 45 studies, Baldwin and Imel (2013), reported that 5% of the
ing Psychology, University of Wisconsin-Madison and Modum Bad Psy-
chiatric Center, Vikersund, Norway. variability in outcomes was explained by differences among ther-
Correspondence concerning this article should be addressed to Helene A. apists. While 5% may seem modest, it should be understood in
Nissen-Lie, Department of Psychology, University of Oslo, P.O. Box 1094 context. For one, the effect of receiving psychotherapy versus not
Blindern, 0317 Oslo, Norway. E-mail: h.a.nissen-lie@psykologi.uio.no is generally estimated to be at or below Cohen’s d ⫽ 0.80
1
2 NISSEN-LIE ET AL.
(Wampold & Imel, 2015), which translates into explaining about also effective with more severely distressed patients. In other
14% of the variation in outcome (Baldwin & Imel, 2013). In words, therapist effectiveness was consistent across patient age
addition, the proportion of variance explained by the therapist is and symptom severity (Wampold & Brown, 2005). In support of
roughly equivalent in magnitude to the variability in outcomes this notion, a recently published study by Green, Barkham, Kellett,
attributable to other key therapy ingredients, most notably the and Saxon (2014) reported a remarkably high correlation (viz., r ⫽
therapeutic alliance (Horvath, Del Re, Flückiger, & Symonds, .96) between therapist rankings in treating depressive symptoms
2011). Further, even small differences can yield substantial real- and treating anxiety symptoms in a large sample of patients (N ⫽
world effects. Saxon and Barkham (2012), for example, found that 1,122) seen by 21 practitioners, indicating that therapists who are
of 119 therapists in practice treating almost 2,000 patients, 19 more effective at treating depression are also more effective at
therapists had outcomes that were considered “below average.” If treating anxiety. This correlation may not be as noteworthy since
the patients treated by these therapists had been seen by one of the treating depression and treating anxiety would probably not rep-
other 100 therapists instead, an additional 265 patients of the total resent very different clinical challenges as they are both internal-
patient sample would likely have recovered (Laska, Gurman, & izing symptoms, while a mixture of externalizing and internalizing
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Wampold, 2014). For these 265 patients, the effect of the therapist symptoms might present more variation and thus potentially re-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
clients or vice versa. Therapists’ own gender did not account for Method
this difference.
In addition to this finding, a number of studies seem to find Participants and Procedures (Study 1)
evidence of a specific cultural competence of psychotherapists
(Hayes, Owen, & Bieschke, 2015; Imel et al., 2011). These studies, Data were obtained from the treatment research archive at the
examining disparities in therapists effectiveness in treating racial/ counseling center of a large, public university in the Western
ethnic minority (REM) patients compared to treating majority United States. Data were collected over the course of 18.43 years
(White) patients, have found evidence that some therapists produce (from December 1995 to May, 2014) on therapists in practice
better outcomes with REM patients than other therapists, suggest- during that period. Psychotherapy at the counseling center was
ing that therapeutic competence in treating majority clients and provided without session limits or extra fees beyond academic
more specific cultural competence needed when treating patients tuition. Patients completed the Outcome Questionnaire-45 (OQ-
of cultural minorities, are distinct. However, again, the effects 45; Lambert et al., 2004) prior to each session. We limited our
found were relatively small, and generally those therapists who analyses of the available data in several ways in keeping with
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
obtained better outcomes with minority (or REM) patients also studies examining naturalistic psychotherapy data (e.g., Baldwin,
This document is copyrighted by the American Psychological Association or one of its allied publishers.
obtained better outcomes with majority (White) patients, which Berkeljon, Atkins, Olsen, & Nielsen, 2009). First, we included
only outcome data from individual counseling sessions (excluding
seems to indicate that there is overlap between the two competen-
group and couples therapy). Second, to avoid cross-classification
cies.
of patients and therapists, we included individuals who met with
There are many methodological approaches to the question of
only one therapist at a time. Third, we included only the first
therapist uniformity but all the studies mentioned above operation-
episode of care with this therapist, considering an episode of care
ally defined therapist effectiveness or competence on the basis of
as ending if a period of 120 days had elapsed between sessions.
client outcomes, as recommended in the literature (e.g., Wampold,
Fourth, we included only those patients who attended at least three
2005). In our study we build on this analytic approach and propose
sessions and completed at least three OQ-45 measures. Fifth, we
that psychotherapists’ global competence across outcome domains
included only those patients whose first OQ-45 total score was in
should be defined as the psychotherapist’s ability to achieve pos- the clinical range (i.e., 63 or above; Lambert et al., 2004). This was
itive psychotherapy outcomes across domains. In addition, in our important especially since these data are from a university treat-
study we propose some new analytic tools to examine the extent to ment center. It is a typical cut-off when one wants one’s findings
which therapists effects depend on patient outcome domain or not. to be generalizable to more general clinical practices. Lastly, we
First, we investigated outcome measures with subscales (or do- included only those patients whose therapist had 10 or more cases
mains) that are relevant for all patients (i.e., symptom distress, in the data set. Setting a minimum number of clients per therapist
work functioning, interpersonal problems), irrespective of clinical was intended to allow more reliable estimates of therapist-level
diagnosis. This may be important because some problem types outcomes (Baldwin et al., 2012). The data set included OQ mea-
(like mania) occur at lower base rates, which will attenuate asso- surements from a total of 49,600 sessions. Patients attended on
ciations at the therapist level and increase the likelihood of ther- average 8.58 sessions (SD ⫽ 8.48, range ⫽ 3 to 153).
apists failing to show effectiveness in a given problem area (when Patients. Based on these requirements, sufficient data were
this may simply be due to not encountering patients with a given available for 5,828 patients: 3,672 (63.0%) were women and 2,156
difficulty, or not addressing that area of difficulty during treat- were men; average age at intake was 22.63 years (SD ⫽ 4.11).
ment). Second, the study uses data from two different treatment Reported ethnicities were 81.9% Caucasian; 6.0% Hispanic; 3.4%
settings (even different cultures), including a primary sample Asian; 1.4% Indigenous American, 1.3% Pacific Islander; 0.8%
(Study 1) with data from the United States and a replication1 Black; 0.5% Other; and 4.6% gave no report. No diagnostic
(Study 2) with data from Sweden. We address the question of assessment was conducted so we do not have information on
therapist uniformity across outcomes through the use of two dif- clinical diagnoses of the patients enrolled. Consent was obtained
ferent but commonly used outcome measures (i.e., the OQ-45 and prior to initiating treatment. Patients had agreed to use of these
the CORE-OM, see below). We hope this design reduces cultural, de-identified records in research, and the university’s human sub-
site, and instrument effects. Also, particularly in the primary study, ject review board approved use of these de-identified records.
we have a large sample size, with many patients per therapist to Therapists. Psychotherapy was provided by 158 therapists,
ensure reliable estimates of therapist effectiveness (see Baldwin, 65 (41.1%) women and 93 men. On average, therapists saw 36.89
Imel, & Atkins, 2012; Baldwin & Imel, 2013). Finally, by applying patients in the data set (SD ⫽ 47.76, range ⫽ 10 to 333). Of the
a multilevel modeling approach to investigate the question, both psychotherapy sessions provided, 30.5% were provided by train-
when deriving a reliable patient effect size and when addressing ees, 38.7% were provided by licensed professionals, and 30.8%
effectiveness across domains, we account for the nested structure were provided by the therapists who straddled these two statuses.
of the data (with patient level effect sizes in outcome nested within The majority of therapists described themselves as following an
therapists). The aims of the present study were: integrative or eclectic approach to treatment, adopting techniques,
interventions, and styles as seemed to fit the therapeutic situation.
1. To investigate the conjecture of therapist uniformity with
new analytic methods 1
When we use the term replication study, we mean replication of the
statistical methods used to examine our research question, not a replication
2. To investigate the extent to which therapists are effective of all methodological features, such as inclusion criteria, sample charac-
across outcome domains teristics, treatments and so on.
4 NISSEN-LIE ET AL.
Exceptions were one therapist who described himself as a practi- therapist uniformity), the patients selected met the same criteria as
tioner of Rational Emotive Behavior therapy, another who de- in Study 1: (a) patients had attended therapy and completed as-
scribed herself as a psychodynamically oriented therapist, and two sessments for at least three sessions and (b) patients had been
others who identified themselves as ACT (Acceptance and Com- treated by a therapist who had at least 10 patients enrolled in the
mitment Therapy) therapists. We do not have information on the study. The sample meeting these criteria was 616 outpatients
total amount of clinical experience these therapists had, but as a treated by 38 therapists (see below for more information of these
proxy we know that they had between 0.32 to 17.93 years of data participants). The average patient caseload per therapist enrolled in
available in the data set, with a mean of 4.91 years (SD ⫽ 5.15). this study was 16.20 patients (ranging from 10 –28). The mean
Additionally, therapists were on average 33.08 years old (SD ⫽ number of sessions in the primary care sample was 7.20 sessions
9.14, range ⫽ 22.72 to 70.10) with an average of 5.38 years (SD ⫽ (range ⫽ 3–55) and in the psychiatric sample 7.65 (range 3–54).
7.71, range ⫽ 0.05 to 40.10) since they began their graduate Primary care patients were included during a period of 6 months
training. (November 2009 through April 2010). The psychiatry data was
collected from 2011 through 2014. The patient received an enve-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
included in Study 2 and treatments they received are viewed as available for the Swedish translation of the questionnaire (Elfström
representative for mental health care in Sweden (see also Hol- et al., 2013). In the current study the internal consistency of six of
mqvist et al., 2014; Falkenström et al., 2016). the seven subscales used was acceptable for all subscales: ␣ ⫽ .72
Therapists. The 31 therapists in primary care were social (Subjective well-being), .80 (Depression), .77 (Anxiety), .66
workers (n ⫽ 20), psychologists (n ⫽ 10), and one psychiatric (Problems in close relationships), .80 (Problems in general func-
nurse. Their mean years of practice experience was 10.0 years. tioning), and .60 (Problems in social relationships). The one ex-
Twenty-eight therapists had postgraduate training in psychother- ception was for Physical symptoms (␣ ⫽ .42), which was likely
apy (in addition to the training included in their basic education), due to the subscale being based on only two items. This subscale
and eight had also received advanced postgraduate psychotherapy was thus excluded from analyses (see below). Again, we relied on
training, leading to certification according to the Swedish system. the factors proposed by the scale developers (Evans et al., 2002)
Seventeen therapists had training in psychodynamic therapy when examining therapist uniformity in these data. As with the
(PDT), 15 in cognitive therapy and nine in cognitive– behavioral OQ-45, factor analyses of the CORE-OM yield a somewhat com-
therapy (CBT) or behavior therapy. Many therapists had training in plex structure; some analyzes show two different factors repre-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
several methods. In the secondary care sample there were seven senting psychological distress and risk, or three different factors
This document is copyrighted by the American Psychological Association or one of its allied publishers.
therapists. These therapists were psychologists or social workers reflecting positively and negatively keyed items and risk items, or
with basic or advanced training in psychotherapy. Most of them a structure with lower order factors loading on a higher order
had received training in both CBT and PDT. Most therapists in factor latent for psychological distress (Lyne, Barrett, Evans, &
Study 2 were female (80%). Barkham, 2006). We discuss the implications of the pretreatment
factor structure in the present sample in the Results section.
Outcome Measure (Study 2)
In Study 2, patient outcome domains were measured with the Statistical Analyses
Clinical Outcome in Routine Evaluation–Outcome Measure
(CORE-OM; Evans et al., 2002). The CORE-OM comprises 34 Subscale scores of the OQ-45 and CORE-OM were computed as
items developed to assess four main (problematic) aspects of a the mean of the items drawn from that subscale based on published
patient’s life: Subjective Well-Being, Problems, Function, and scoring methods. This was done to support the generalizability of
Risk, which in turn reflect 10 different outcome domains: 1) the current study to other researchers and clinicians using these
Subjective well-being (e.g., “I have felt overwhelmed by prob- same measures. We examined therapist variation in baseline scores
lems”) measured by four items; 2) Depression (e.g., “I have felt on these subscales by computing intraclass correlations (ICCs),
despairing and hopeless”) measured by four items; 3) Anxiety denoting therapist variance in proportion to total variance (patient
(e.g., “I have felt panic or terror”) measured by four items; 4) and therapist variance). Additionally, we examined intercorrela-
Trauma (e.g., “Unwanted images or memories have been distress- tions of the subscales at baseline, both at the within level (patient
ing me”) measured by two items; 5) Physical symptoms (e.g., “I level) and at the between level (therapist level) to provide an
have difficulty getting to sleep or staying asleep”) measured by indication of their overlap and to assess the degree to which they
two items; 6) Problems in close relationships (e.g., ”I have felt I were independent at each level.
have no friends”) measured by four items; 7) Problems in general Effect size estimation procedure. To prepare for the therapist
functioning (e.g., ”I have been able to do most things I needed uniformity analyses using a multilevel confirmatory factor analytic
to”)2 measured by four items; 8) Problems in social functioning framework (see below), we calculated patient-level effect sizes
(e.g.,”I have felt humiliated or shamed by other people”) measured which we call multilevel growth ds. In doing so we fitted two-level
by four items; 9) Risk to self (e.g.,”I made plans to end my life”) multilevel models (MLMs) with outcome observations nested
measured by four items; and 10) Risk to other (e.g., “I have been within patients predicting scores on OQ-45 or CORE-OM sub-
physically violent to others” measured by two items. Patients scales from session number (centered at the first session). These
scores on Subscales 1, 2, 3, 5, 7, and 8 were used in the present MLMs were fit using the R programming language (version 3.1.0;
(replication) study only, excluding those of Subscales 4, 9, and 10 R Development Core Team, 2014) and the “nlme” multilevel
(trauma, risk to self, and risk to other), which were relatively modeling package (Pinheiro, Bates, DebRoy, Sarkar, & the R
infrequently reported and thus not relevant to most patients, and Development Core Team, 2013). Separate models were con-
Subscale 6 (Physical symptoms) due too poor internal consistency. structed for each OQ-45 and CORE-OM subscale. First simpler
Respondents rated the frequency at which each event or situation models were constructed that included only a fixed effect for
occurred within the last week on a 5-point Likert-type scale session number. These models were then compared with more
ranging from 0 (Never) to 4 (Almost all the time). The CORE-OM complex models that included additional random slope parameters
has demonstrated good psychometric properties by all relevant for session number. Formal model comparison was used to assess
criteria—reliability (i.e., the scores demonstrate high internal and the contribution of random slopes. An example of these second
test-rest reliability), convergent validity (i.e., meaningful correla- more complex models is as follows:
tions with similar instruments such as BDI, BAI, SCL-90 and
IIP-32), and discriminant validity (i.e., scores discriminate well Y ij ⫽ 00 ⫹ 10(Session #)i ⫹ [b00j ⫹ b10j(Session #)i ⫹ eij]
between clinical and nonclinical samples)—as well as demonstrat- (1)
ing sensitivity to change (in terms of substantial and statistically
significant improvements on all scores in three different treatment
2
settings, see Evans et al., 2002). Similar psychometric support is This and seven other items of the CORE-OM are reverse scored.
6 NISSEN-LIE ET AL.
final random term (eij) is the error or residual term reflecting the d_subscale d_subscale
d_subscale
unexplained variation in OQ-45 or CORE-OM scores. 1 2 3
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Both the fixed and random slope components were used in order
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Table 1 for SD, IR, and SR subscales, all p ⬍ .001). Thus, the MLM
Intraclass Correlation Coefficients (ICCs) of Baseline Scores of equation defined above (Equation 1) represents the model used to
CORE-OM and OQ-45 estimate random slope parameters. Patient-level random slopes
were then converted into multilevel growth ds (Equation 2).
Baseline scores ICCs Patients showed substantial change in OQ scores from baseline to
CORE subscale posttreatment as indexed by the magnitude of their multilevel growth
WB .091 d effect sizes. Effect sizes for the OQ total score and three OQ
A .100 subscales were as follows: d_total: mean ⫽ ⫺0.74, 95% CI ⫽
D .086 [⫺0.76, ⫺0.72], SD ⫽ 0.67; d_SD: mean ⫽ ⫺0.74, [⫺0.76, ⫺0.72],
CR .080
GF .074 SD ⫽ 0.65; d_SR: mean ⫽ ⫺0.48, [⫺0.50, ⫺0.46], SD ⫽ 0.54; d_IR:
SR .057 mean ⫽ ⫺0.38, [⫺0.40, ⫺0.36], SD ⫽ 0.48. All 95% CIs did not
OQ subscale contain zero indicating that patient-level change across time differed
SD .003 significantly from zero.
SR .004
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
modeling the covariation between the indicators (i.e., change in an 2). The model fit was excellent, with a posterior predictive p value
OQ-45 subscale) as loading on a latent effectiveness factor at the of .46 (this value is .50 if fit is perfect).
therapist level while simultaneously modeling the patient level Study 2. Replicating Study 1, in the model of therapist uni-
correlations (see Figure 2). To accomplish this we used the Bayes- form effectiveness across outcome domains, the standardized fac-
ian estimator3 in both data sets, which applies Markov Chain tor loadings of the indicators (i.e., subscale effect sizes) were large;
Monte Carlo (MCMC) simulation. MCMC simulation is less sen- four were statistically significant (Well-being, Depression, Close
sitive to sample size issues (Hamaker & Klugkist, 2011). Two relationships, Social relationships, p ⬍ .05) while two (Anxiety
MCMC chains were run, using 50,000 iterations to achieve a stable and General functioning) were not. R-Squared indices for the
convergence. Convergence was checked using the potential scale therapist level were all statistically significant at p ⬍ .001, ranging
reduction factor (⬍1.05) and the Kolmogorov Smirnoff test for from .50 (General functioning) to .90 (Depression). As in Study 1,
differences between the two MCMC chains (should be nonsignif- patient-level correlations were also statistically significant and
icant for all parameters). Mplus default priors were used, with slightly smaller in size (ranging from .50⫺.90) than the therapist-
infinite variances to make them noninformative in order to base level factor loadings. In this model of six indicators loading on one
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
estimates only on the data. When Bayesian estimation is used a latent factor, model fit was also very good, with a posterior
This document is copyrighted by the American Psychological Association or one of its allied publishers.
model fit test statistic is obtained even if the model is ‘just predictive p value of .40. See Figure 3 for a visualization of the
identified’ (i.e., with 0 degrees of freedom), as it was in the MLM CFA of the Swedish data. See also Table 3 with intercor-
primary (U.S.) study with only three indicators loading on one relelations between subscales at the within (patient) level in this
latent factor. In this case, model fit tests the distributional assump- model.
tions of the model (Asparouhov & Muthén, 2010). In the replica- To sum, both Studies 1 and 2 suggest that therapists effective (or
tion (Swedish) study, the model fit can be interpreted directly since ineffective) within one outcome domain are also effective within
it models one latent factor based on six indicators. See detailed the other domains of these two commonly used instruments.
results from the two CFAs below.
Study 1. The standardized output from the model in the U.S.
sample yielded large and significant factor loadings for all sub- Discussion
scales of the OQ: s ⫽ .97, .94, and .62 for SD, SR, and IR, Although there is accumulating evidence to suggest that thera-
respectively (all ps ⬍ .001), indicating that a latent general factor pists on average differ in effectiveness (Baldwin & Imel, 2013),
was well represented by the three subscale effect sizes. few studies have investigated the extent to which this differential
The patient-level correlations that were fit in the same model efficacy systematically depends on patient problem domain (e.g.,
were also significant, albeit smaller than the factor loadings (SD Kraus et al., 2011; Wampold & Brown, 2005). This study, com-
with IR ⫽ .68, p ⬍ .001; SD with SR ⫽ .71, p ⬍ .001 and SR with prising one primary study and one replication, utilized new meth-
IR ⫽ .53, p ⬍ .001). The R-squared indices at the therapist level ods to investigate change and to assess the therapist uniformity
showed significant explained variance for all subscales (SD ⫽ .93, conjecture. Using two common outcome measures (OQ-45,
p ⬍ .001; SR ⫽ .88, p ⬍ .001, and IR ⫽ .38, p ⬍ .01; see Figure CORE-OM), we assessed the interrelationships between their sep-
arate problem domains reflecting different outcomes. The findings
of the main study indicate that these domains are relatively dis-
T_g tinct. In the second study they are less distinct and we discuss this
in the limitations. To measure therapist uniformity across outcome
.97 .62 domains, we computed a multilevel growth d which takes advan-
.94
tage of all available assessment points, accounts for treatment
length, and yields an effect size in Cohen’s d units.
d_SD d_SR d_IR In the multilevel confirmatory factor analyses, modeling these
Between-
multilevel ds as indicators of one latent therapist factor, fit indices
Therapists suggested an excellent model fit in both studies. At the patient
level, large correlations between the multilevel ds of the outcome
Within- .68
Therapists .71 .53 domains indicated that patients who improve more in one domain
also improve in the other domains. At the therapist level, which
d_SD d_SR d_IR directly addresses the therapist uniformity conjecture, factor load-
ings were large for the latent factor (the “therapist g-factor”) of
therapist effectiveness across domains, providing support for the
notion of therapist uniformity. Large loadings at the therapist level
Figure 2. Model of therapist uniformity from the multilevel confirmatory
were present even when simultaneously modeling the relationships
factor analysis (MLM CFA) using OQ subscales (Study 1). Note: d_SD; between the subscale multilevel ds at the patient level in both data
d_SR and d_IR are patient-level estimates of change in the subscales of
Symptom distress (SD); Social role performance (SR) and Interpersonal 3
In the main study, we first estimated the model using the default
relationships (IR) from OQ-45 (Lambert et al., 2004); T_g is the latent estimator, Maximum Likelihood with robust standard errors (MLR), but
therapist “g-factor.” Bold characters mark latent factor and factor loadings. the results showed a nonpositive definite covariance matrix due to a
All factor loadings and intercorrelations are significant, p ⬍ .001. Bayesian negative (but nonsignificant) variance estimate (for Symptom distress).
estimation was used in model fitting. The model showed excellent fit This was likely due to an overly complex model on the between-level in
(posterior predictive value ⫽ .46). relation to the between-level sample size.
THERAPIST UNIFORMITY 9
adherence to a protocol— even competence in delivering certain specific diagnoses rather than the more general outcome domains
therapeutic interventions (Barber, 2009; Beutler et al., 2004; Gold- that were used. Despite the evidence of therapist uniformity, there
berg et al., 2016; Tracey et al., 2014). is a chance that therapist effectiveness can be both global (i.e., cut
An additional line of basic research in support of this notion can across different outcome domains) and specific (i.e., depend on
be found within the study of psychopathology. A growing body of some aspect of patients’ problems or pathology; Kraus et al.,
evidence suggests that clinical disorders or outcome domains do 2011). Future work might shed more light on this possibility.
not represent distinct categories; indeed, comorbidity is the rule A related limitation of this study is that the subscales of the OQ
rather than the exception (Gadermann et al., 2012). Furthermore, and CORE-OM at pretreatment were correlated. The correlations
there is evidence that mental disorders are sequentially comorbid among the three subscales of the OQ were relatively low (viz., .29)
and dimensional, rather than distinct categories; one general psy- but they were larger for the CORE-OM (viz., .57 within therapists
chopathology dimension, the “p factor,” explained the variation and .65 between therapists). One could argue that the CORE
among patients with different mental health concerns in a recent outcomes measured were not sufficiently distinct at pretreatment;
study (see Caspi et al., 2014). Evidence drawn from genetics however, these correlations are similar to the correlations of well-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
likewise supports the possibility of shared genetic risk factors accepted subscales including the SCL-90 (Hafkenscheid, 1993)
This document is copyrighted by the American Psychological Association or one of its allied publishers.
underlying multiple forms of psychopathology (e.g., Hyman, and the TOP (Kraus et al., 2005).
2010). Higher scores on such a general pathology factor represent Moreover, outcomes in two out of six subscales of the
more life impairment, worse prognosis, and early life damage CORE-OM did not yield factor loadings that reached statistical
(Caspi et al., 2014). If this proposed p factor is valid, even though significance, and thus might not seem to fully support the therapist
we did not study therapist effectiveness across clinical diagnoses, uniformity conjecture. However, it should be noted that the two
our findings are even less surprising; there could well be a match insignificant loadings (for Anxiety and General functioning) were
between patients’ general pathology and therapist global effective- large (i.e., .81 and .70) and that failing to reach significance may
ness, rather than a match between specific problem areas and skills be related to low statistical power, since in this study the number
in treating each of them. The observations of Caspi et al. (2014) on of therapists was rather modest (n ⫽ 38).
the patient level correspond in turn with findings of Saxon and A third limitation is that the U.S. data are from one single
Barkham (2012) suggesting that therapists differences increase university center, which may limit generalizability of the findings.
with increasing patient severity of problems, rather than with type A general lack of detailed information of participants is a problem
of problem. in this kind of study. In order to be better able to determine
As mentioned before, with regards to differential therapist ef- whether a sample is representative we would have wanted to know
fectiveness with cultural minorities (Imel et al., 2011; Hayes et al., the level of education, income status, gender/sexual orientation,
2015), although they are indicative of a specific cultural compe- and clinical diagnoses of the patients.
tence, it seems that this competence should be better conceived of Notwithstanding the limitations of the current study, we suggest
as a form of Multicultural Orientation (MCO; Owen, 2013) char- that the contributions to the literature of this work are: (a) probing
acterized for example by therapists’ cultural humility, openness further the question of therapist uniformity; (b) proposing analytic
and cultural comfort rather than a specific skill or cultural knowl- methods to investigate therapist uniformity; and (c) providing
edge; that is, a way of being, rather than a way of doing (Owen, further evidence suggesting that therapists are uniformly effective
2013). These ideas are receiving empirical support (e.g., Owen et across the subscales of two widely used outcome measures (i.e.,
al., 2015) and to us seem to align with an understanding that highly OQ, CORE-OM). We hope our effort can stimulate other research-
effective therapists have an openness and a flexibility that allow ers to further our collective knowledge on the nature of therapist
them to tailor interventions to suit the specific needs of a client and effectiveness and its interplay with client factors.
a sensitivity toward context. This corresponds well with the notion
of therapist appropriate responsiveness (see Stiles et al., 1998), References
defined as the therapist’s ability to achieve optimal benefit for the
client by continually modifying responses to the state of the client Adelson, J. L., & Owen, J. (2012). Bringing the psychotherapist back:
Basic concepts for reading articles examining therapist effects using
and the interpersonal interaction (Hatcher, 2015). This promising
multilevel modeling. Psychotherapy, 49, 152–162. http://dx.doi.org/10
concept, may likely distinguish more from less effective therapists .1037/a0023990
in general. Amble, I., Gude, T., Stubdal, S., Øktedalen, T., Skjorten, A. M., Andersen,
The current study has several important limitations. One limi- B. J., . . . Wampold, B. E. (2014). Psychometric properties of the
tation may be that the patient outcome domains that were mea- Outcome Questionnaire-45.2: The Norwegian version in an international
sured were somewhat global in nature (i.e., generally not disorder- context. Psychotherapy Research, 24, 504 –513, http://dx.doi.org/10
specific) and may thus not have detected the specificity of therapist .1080/10503307.2013.849016
effectiveness, if specificity indeed exists. There is reason to be- Anderson, T., Ogles, B. M., Patterson, C. L., Lambert, M. J., & Ver-
lieve that there may be a trade-off between two contrasting con- meersch, D. A. (2009). Therapist effects: Facilitative interpersonal skills
siderations in testing the uniformity conjecture: the advantage of a as a predictor of therapist success. Journal of Clinical Psychology, 65,
755–768. http://dx.doi.org/10.1002/jclp.20583
wide spectrum of patient problems, parts of which are likely
Asparouhov, T., & Muthén, B. O. (2010). Bayesian analysis using Mplus:
irrelevant for a sizable portion of a patient sample, or having a Technical implementation. Retrieved from http://citeseerx.ist.psu.edu/
more limited set of generally relevant outcome domains, with a viewdoc/download?doi⫽10.1.1.310.3903&rep⫽rep1&type⫽pdf
loss of problem area specificity. It is possible that therapist spec- Baldwin, S. A., Berkeljon, A., Atkins, D. C., Olsen, J. A., & Nielsen, S. L.
ificity in effectiveness could have emerged if we applied the (2009). Rates of change in naturalistic psychotherapy: Contrasting dose-
current statistical methods to examine therapist uniformity across effect and good-enough level models of change. Journal of Consulting
THERAPIST UNIFORMITY 11
and Clinical Psychology, 77, 203–211. http://dx.doi.org/10.1037/ clinical setting. Journal of Counseling Psychology, 63, 1–11. http://dx
a0015235 .doi.org/10.1037/cou0000131
Baldwin, S. A., & Imel, Z. E. (2013). Therapist effects: Findings and Goldfried, M. R. (2009). Searching for therapy change principles: Are we
methods. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of there yet? Applied & Preventive Psychology, 13, 32–34. http://dx.doi
psychotherapy and behavior change (6th ed., pp. 258 –297). Hoboken, .org/10.1016/j.appsy.2009.10.013
NJ: Wiley. Green, H., Barkham, M., Kellett, S., & Saxon, D. (2014). Therapist effects
Baldwin, S. A., Imel, Z. E., & Atkins, D. C. (2012). The influence of and IAPT Psychological Wellbeing Practitioners (PWPs): A multilevel
therapist variance on the dependability of therapists’ alliance scores: A modelling and mixed methods analysis. Behaviour Research and Ther-
brief comment on “The dependability of alliance assessments: The apy, 63, 43–54. http://dx.doi.org/10.1016/j.brat.2014.08.009
alliance-outcome correlation is larger than you think” (Crits-Christoph et Hamaker, E. L., & Klugkist, I. (2011). Bayesian estimation of multilevel
al., 2011). Journal of Consulting and Clinical Psychology, 80, 947–951. models. In J. J. Hox & J. K. Roberts (Eds.), Handbook for advanced
http://dx.doi.org/10.1037/a0027935 multilevel analysis (pp. 137–161). New York, NY: Routledge/Taylor &
Barber, J. P. (2009). Toward a working through of some core conflicts in Francis Group.
psychotherapy research. Psychotherapy Research, 19, 1–12. http://dx Hatcher, R. L. (2015). Interpersonal competencies: Responsiveness, tech-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Benton, S. A., Robertson, J. M., Tseng, W. C., Newton, F. B., & Benton, 757. http://dx.doi.org/10.1037/a0039803
S. L. (2003). Changes in counseling center client problems across 13 Hayes, J. A., Owen, J., & Bieschke, K. J. (2015). Therapist differences in
years. Professional Psychology: Research and Practice, 34, 66 –72. symptom change with racial and ethnic minority clients. Psychotherapy,
http://dx.doi.org/10.1037/0735-7028.34.1.66 52, 308 –314.
Beutler, L. E., Malik, M., Alimohamed, S., Harwood, T. M., Talebi, H., Hafkenscheid, A. (1993). Psychometric evaluation of the Symptom Check-
Noble, S., & Wong, E. (2004). Therapist variables. In M. J. Lambert list (SCL-90) in psychiatric patients. Personality & Individual Differ-
(Ed.), Bergin & Garfield’s handbook of psychotherapy and behavior ences, 14, 751–756.
change (5th ed., pp. 227–306). New York, NY: Wiley. Holmqvist, R., Ström, T., & Foldemo, A. (2014). The effects of psycho-
Caspi, A., Houts, R. M., Belsky, D. W., Goldman-Mellor, S. J., Harrington, logical treatment in primary care in Sweden—A practice-based study.
H., Israel, S., . . . Moffitt, T. E. (2014). The p factor: One general Nordic Journal of Psychiatry, 68, 204 –212. http://dx.doi.org/10.3109/
psychopathology factor in the structure of psychiatric disorders? Clinical 08039488.2013.797023
Psychological Science, 2, 119 –137. http://dx.doi.org/10.1177/ Horvath, A. O., Del Re, A. C., Flückiger, C., & Symonds, D. (2011).
2167702613497473 Alliance in individual psychotherapy. Psychotherapy, 48, 9 –16. http://
Chambless, D. L., & Hollon, S. D. (1998). Defining empirically supported dx.doi.org/10.1037/a0022186
therapies. Journal of Consulting and Clinical Psychology, 66, 7–18. Hox, J. (2010). Multilevel analysis. Techniques and applications (2nd ed.).
http://dx.doi.org/10.1037/0022-006X.66.1.7 Mahwah, NJ: Erlbaum.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences Hyman, S. E. (2010). The diagnosis of mental disorders: The problem of
(2nd ed.). Hillsdale, NJ: Erlbaum. reification. Annual Review of Clinical Psychology, 6, 155–179. http://
Crits-Christoph, P., & Mintz, J. (1991). Implications of therapist effects for dx.doi.org/10.1146/annurev.clinpsy.3.022806.091532
the design and analysis of comparative studies of psychotherapies. Imel, Z. E., Baldwin, S., Atkins, D. C., Owen, J., Baardseth, T., &
Journal of Consulting and Clinical Psychology, 59, 20 –26. http://dx.doi Wampold, B. E. (2011). Racial/ethnic disparities in therapist effective-
.org/10.1037/0022-006X.59.1.20 ness: A conceptualization and initial study of cultural competence.
Del Re, A. C., Flückiger, C., Horvath, A. O., Symonds, D., & Wampold, Journal of Counseling Psychology, 58, 290 –298. http://dx.doi.org/10
B. E. (2012). Therapist effects in the therapeutic alliance-outcome rela- .1037/a0023284
tionship: A restricted-maximum likelihood meta-analysis. Clinical Psy- Kim, D., Wampold, B., & Bolt, D. (2006). Therapist effects in psycho-
chology Review, 32, 642– 649. http://dx.doi.org/10.1016/j.cpr.2012.07 therapy: A random-effects modeling of the National Institute of Mental
.002 Health Treatment of Depression Collaborative Research Program data.
Elfström, M. L., Evans, C., Lundgren, J., Johansson, B., Hakeberg, M., & Psychotherapy Research, 16, 161–172. http://dx.doi.org/10.1080/
Carlsson, S. G. (2013). Validation of the Swedish version of the Clinical 10503300500264911
Outcomes in Routine Evaluation Outcome Measure (CORE-OM). Clin- Kim, S., Beretvas, S. N., & Sherry, A. R. (2010). A validation of the factor
ical Psychology & Psychotherapy, 20, 447– 455. http://dx.doi.org/10 structure of OQ-45 scores using factor mixture modeling. Measurement
.1002/cpp.1788 and Evaluation in Counseling and Development, 42, 275–295. http://dx
Evans, C., Connell, J., Barkham, M., Margison, F., McGrath, G., Mellor- .doi.org/10.1177/0748175609354616
Clark, J., & Audin, K. (2002). Towards a standardised brief outcome Kraus, D. R., Castonguay, L., Boswell, J. F., Nordberg, S. S., & Hayes,
measure: Psychometric properties and utility of the CORE-OM. The J. A. (2011). Therapist effectiveness: Implications for accountability and
British Journal of Psychiatry, 180, 51– 60. http://dx.doi.org/10.1192/bjp patient care. Psychotherapy Research, 21, 267–276. http://dx.doi.org/10
.180.1.51 .1080/10503307.2011.563249
Falkenström, F., Josefsson, A., Beggren, T., & Holmqvist, R. (2016). How Kraus, D. R., Seligman, D. A., & Jordan, J. R. (2005). Validation of a
much therapy is enough? Comparing dose-effect and good-enough- behavioral health treatment outcome and assessment tool designed for
models in two different settings. Psychotherapy, 53, 130 –139. http://dx naturalistic settings: The Treatment Outcome Package. Journal of Clin-
.doi.org/10.1037/pst0000039 ical Psychology, 61, 285–314. http://dx.doi.org/10.1002/jclp.20084
Gadermann, A. M., Alonso, J., Vilagut, G., Zaslavsky, A. M., & Kessler, Lambert, M. J., Burlingame, G. M., Umphress, V., Hansen, N. B., Ver-
R. C. (2012). Comorbidity and disease burden in the National Comor- meersch, D. A., Clouse, G. C., & Yanchar, S. C. (1996). The reliability
bidity Survey Replication (NCS-R). Depression and Anxiety, 29, 797– and validity of the Outcome Questionnaire. Clinical Psychology &
806. http://dx.doi.org/10.1002/da.21924 Psychotherapy, 3, 249 –258. http://dx.doi.org/10.1002/(SICI)1099-
Goldberg, S. B., Rousmaniere, T., Miller, S. D., Whipple, J., Nielsen, S. L., 0879(199612)3:4⬍249::AID-CPP106⬎3.0.CO;2-S
Hoyt, W. T., & Wampold, B. E. (2016). Do psychotherapists improve Lambert, M. J., Morton, J. J., Hatfield, D., Harmon, C., Hamilton, S., Reid,
with time and experience? A longitudinal analysis of outcomes in a R. C., . . . Burlingame, G. B. (2004). Administration and scoring manual
12 NISSEN-LIE ET AL.
for the Outcome Questionnaire-45. Orem, UT: American Professional R Development Core Team. (2014). R: A language and environment for
Credentialing Services. statistical computing. Vienna, Austria: R Foundation for Statistical
Laska, K. M., Gurman, A. S., & Wampold, B. E. (2014). Expanding the Computing. Retrieved from http://www.R-project.org/
lens of evidence-based practice in psychotherapy: A common factors Rice, K. G., Suh, H., & Ege, E. (2014). Further evaluation of the Outcome
perspective. Psychotherapy, 51, 467– 481. http://dx.doi.org/10.1037/ Questionnaire– 45.2. Measurement and Evaluation in Counseling and De-
a0034332 velopment, 47, 102–117. http://dx.doi.org/10.1177/0748175614522268
Lutz, W., Leon, S. C., Martinovich, Z., Lyons, J. S., & Stiles, W. B. (2007). Saxon, D., & Barkham, M. (2012). Patterns of therapist variability: Ther-
Therapist effects in outpatient psychotherapy: A three-level growth apist effects and the contribution of patient severity and risk. Journal of
curve approach. Journal of Counseling Psychology, 54, 32–39. http://dx Consulting and Clinical Psychology, 80(4), 535-–546. http://dx.doi.org/
.doi.org/10.1037/0022-0167.54.1.32 10.1037/a0028898
Lyne, K. J., Barrett, P., Evans, C., & Barkham, M. (2006). Dimensions of Schottke, H., Flückiger, C., Goldberg, S. B., Eversmann, J., & Lange, J.
variation on the CORE-OM. The British Journal of Clinical Psychology, (2015). Predicting psychotherapy outcome based on therapist interper-
45, 185–203. http://dx.doi.org/10.1348/014466505X39106 sonal skills: A five-year longitudinal study of a therapist assessment
Muthén, L. K., & Muthén, B. O. (1998 –2011). Mplus user’s guide (6th protocol. Psychotherapy Research. Advance online publication. http://
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.