Sie sind auf Seite 1von 15

Best Practice & Research Clinical Rheumatology

Vol. 19, No. 4, pp. 593–607, 2005


doi:10.1016/j.berh.2005.03.003
available online at http://www.sciencedirect.com

Clinically important outcomes in low back pain

Raymond W.J.G. Ostelo* PhD, PT


Institute for Research in Extramural Medicine (EMGO Institute), VU University Medical Center,
van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands
Research Group ‘Allied Health Care’, Amsterdam School of Allied Health Education, Amsterdam, The Netherlands

Henrica C.W. de Vet PhD


Institute for Research in Extramural Medicine (EMGO Institute), VU University Medical Center,
van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands

Four important domains directly related to low back pain are: pain intensity, low-back-pain-
specific disability, patient satisfaction with treatment outcome, and work disability. Within each of
the domains, different questionnaires have been proposed. This chapter focuses on validated and
widely used questionnaires. Details of the background and the measurement properties, and of
the minimally clinically important change (MCIC) using these questionnaires, are described. The
MCIC can be estimated using various methods and there is no consensus in the literature on what
the most appropriate technique is. This chapter focuses primarily on two adequate and frequently
used methods for estimating the MCIC. We argue that the MCIC should not be considered as a
fixed value and that the MCIC values presented in this chapter are used as indications.
For patients with subacute or chronic low back pain, the MCIC for pain on a visual analogue scale
(VAS) should at least be 20 mm and for acute low back pain it seems reasonable to suggest that
the MCIC should at least be at the level of approximately 35 mm. If a numerical rating scale (NRS)
is used it seems reasonable to suggest that the MCIC should at least be 3.5 and 2.5 for patients
with acute and chronic low back pain, respectively. For functional disability as measured with the
Roland Disability Questionnaire it seems reasonable that the MCIC should at least be 3.5 points,
whereas an MCIC of at least 10 points when the Oswestry Disability Index is used. For global
perceived effect, we argue that the MCIC is most appropriately defined in terms of at least ‘much
improved’ or ‘very satisfied’, instead of including ‘slightly improved’. Finally, we argue that, from
the point of view of cost effectiveness, every day of earlier return to work is important. The exact
value for the MCIC can be determined, taking into account the aim of the measurement, the initial
scores, the target population and the method used to assess MCIC.

Key words: low back pain; minimally clinically important change; questionnaires; reproducibility;
validity.

* Corresponding author. Tel.: C31 20 444 8149; Fax: C31 20 444 6775.
E-mail address: r.ostelo@vumc.nl (R.W.J.G. Ostelo).

1521-6942/$ - see front matter Q 2005 Elsevier Ltd. All rights reserved.
594 R. W. J. G. Ostelo and H. C. W. de Vet

The clinical success of the treatment of low back pain has traditionally been measured in
terms of physiological parameters (e.g. nerve conduction) or improvement in physical
findings (e.g. range of motion or muscle strength). Non-specific low back pain itself is
characterized by pain, disability and work absence. These concepts are closely linked
but the relationship between them is subtle and complex and influenced by many
factors. It is therefore important to make a clear distinction between pain, disability and
work loss, and to assess each separately. Furthermore, it is now widely recognized that
patients’ perspectives are essential in making medical decisions and judging the results
of treatment. Patients’ opinions of their low back pain determine not only the
subjective expression of complaints (e.g. pain) but also their utilization of health care
and their resumption of work. These patient-based outcome measures have become of
major importance and there has been a rapid growth in their number and type. In 2000,
an international panel of experts in low back pain agreed that a core set of measures
would include five domains: pain intensity, low-back-pain-specific disability, work
disability, generic functional status, and patient satisfaction with the process of care as
well as with the treatment outcome.1

OBJECTIVE

The current chapter focuses on four of the important domains that are directly related
to low back pain: pain intensity, low-back-pain-specific disability, patient satisfaction
with treatment outcome, and work disability. Measures of generic functional status are
not specifically designed for low back pain but have their strength in their applicability to
patients across different types of condition. Many different questionnaires have been
proposed within each of the four domains. We will elaborate on those questionnaires
that are validated and widely used in low back pain and will present details on their
background, measurement properties and minimally clinically important change.

MEASUREMENT PROPERTIES

Validity

Validity is the essential issue in the quality of a measurement instrument. A


measurement is valid if it measures what it is intended to measure. Although the
terminology in validity testing can be rather confusing, in general, three ‘types’ of
validity can be distinguished.2 Content validity focuses on the question of whether the
items of a scale adequately cover the domain intended to measure. The definition of
criterion validity is the correlation of a scale with other measures intended to measure
the domain of interest (e.g. functional disability): the so-called ‘gold standard’. For the
measures that we focus on in this paper (and also in the field of low back pain in
general), hardly any gold standards exist. Therefore, it is conventional to compare
scores of a questionnaire with scores of established measures: this is called construct
validity. It refers to the extent to which scores on a particular instrument relate to
other measures in a manner that is consistent with theoretically derived hypotheses
concerning the constructs that are being measured (Table 1).3
Clinically important outcomes in LBP 595

Table 1. Definitions of measurement properties.

Measurement properties Definition


Validity A measurement is valid if it measures what it is intended to measure. In
general, three ‘types’ of validity are distinguished: content validity, criterion
validity and construct validity
Reproducibility Reproducibility includes two concepts: reliability and agreement. Agree-
ment represents the lack of measurement error. Reliability represents the
extent to which individuals can be distinguished from each other, despite
measurement error
Minimal clinically The smallest change that is important to patients
important change

Reproducibility

The concept of reproducibility reflects the amount of error—random and systematic—


inherent in any measure by repetition of the measurement. Reproducibility includes
two concepts—reliability and agreement—but the distinction between these terms is
not always appreciated.4 Agreement represents the lack of measurement error;
reliability represents the extent to which individuals can be distinguished from each
other, despite measurement error. In other words, reliability parameters
(e.g. correlation coefficients) are important when the aim is to discriminate between
individuals, and agreement parameters (e.g. standard error of measurement) are
important when one wants to detect changes in health status.

Minimally clinically important change

Observed change between two measures may be due to various sources. Thus an
important question is whether the observed change (i.e. improvement) of a patient on
repeated measures is merely caused by measurement error, as can be expressed by
agreement parameters, or by real change that is clinically relevant. In the literature, this
is often referred to as a minimal clinically important difference or change. Jaeschke5
defines minimally clinically important difference (MCID) as “the smallest difference in
score in the domain of interest which patients perceive as beneficial and would
mandate, in the absence of troublesome side effects and excessive cost, a change in the
patient’s management”. Stratford et al6 define the MCID as ‘the smallest change that is
important to patients’. We prefer to use the term ‘minimally clinically important
change’ (MCIC) for the change of health status within patients and the term MCID to
indicate differences between patients. Estimating the MCIC of relevant outcome
measures enables a comparison between interventions on patient level and can
contribute to the relevance and interpretability of change scores.
Several reviews and studies have presented a clear overview of the different
methods to assess MCIC and provided some priorities for future research.7–11
However, there is no consensus in the literature on the most appropriate technique for
determining the MCIC; different methods to estimate the MCIC result in different
values. This chapter therefore presents a range of MCIC values for the included
questionnaires as reported in the literature. We will focus primarily on two adequate
and frequently used methods for estimating the MCIC. Both methods use a global rating
596 R. W. J. G. Ostelo and H. C. W. de Vet

scale of change as an external criterion. We believe the change can be defined as


‘clinically important‘ because individual patients graded their own health status.
The first method aims to estimate the smallest change possible to detect, with a 95%
probability beyond the measurement error. First, the standard error of measurement
(SEM) is estimated. The SEM indicates the precision of the outcome measure. To be
95% confident that observed change is real change and not caused by measurement pffiffiffi
error, the smallest change possible to detect is calculated as 1:96 ! 2 !SEM.
Observed change is a result of two measurements pffiffiffi (baseline and follow-up), therefore,
measurements errors can occur twice, hence 2. If, based on the external criterion,
only ‘unchanged’ patients are included in the analysis, i.e. patients with no clinically
relevant improvement, this smallest-change-possible-to-detect can be interpreted as
the MCIC. Patients with a change score smaller or equal to this smallest-change-
possible-to-detect have a chance of more than 95% that no real change has occurred.
Patients with change scores larger than this smallest-change-possible-to-detect have a
less than 5% chance that no real change has occurred. This chance is so small that these
patients can be considered to have made a real, clinically important, change.
The second method aims to estimate the optimal cut-off point. Deyo and Centor12
suggested viewing measurement scales as diagnostic tests for discriminating between
improved and unchanged patients. When the questionnaires are considered as
diagnostic tests and the global rating scale of change as the golden standard, the
sensitivity is the number of patients correctly identified as ‘improved’ by the
questionnaire divided by the number of patients categorized as ‘improved’ according
to the global rating scale of change. The specificity is the number of patients correctly
identified as ‘unchanged’ by the questionnaire divided by the number of ‘unchanged’
patients according to the global rating scale of change. The receiver operating
characteristic (ROC) curve is the result of using different cut-off points for change
scores, each with a given sensitivity and specificity. When determining the MCIC, the
optimal cut-off point is usually defined as that point that yields the lowest overall
misclassification. That is, the most optimal balance between sensitivity and specificity. It
is important to realize that using this cut-off point might result in some ‘unchanged’
patients being classified falsely as ‘improved’ because the cut-off takes into account both
sensitivity and specificity. The first method, aiming to estimate the smallest change
possible to detect, takes into account only the ‘unchanged’ patients.
A final remark regarding the MCIC is that we think that it should not be considered
as a fixed value. The exact value for the MCIC should be determined, taking into
account the aim of the measurement, the initial scores and the method used to assess
MCIC. The MCIC values presented in this paper should therefore be used as
indications only.

PAIN INTENSITY

Pain intensity can be defined as how much a patient is hurt by his or her low back pain.
Pain intensity is a quantitative estimate of the severity or magnitude of perceived pain.
Von Korff et al13 provides a comprehensive review of the assessment of pain intensity
using self-report. The two most commonly used methods to assess pain intensity are
the visual analogue scale (VAS) and numerical rating scale (NRS). It is important to
realize that, no matter how numeric the values provided by these measurement
instruments appear to be, the measurement of pain intensity remains a subjective
Clinically important outcomes in LBP 597

Table 2. Indications for the minimally clinically important change per questionnaire.

Domain Questionnaire Minimally clinically important change


Pain intensity VAS O20 mm for chronic low back pain;
O35 mm for acute low back pain
NRS O2.5 points chronic low back pain;
O3.5 points for acute low back pain
Functional status RDQ O3.5 points for all types of low back pain
ODI O10 points for all types of low back pain
Patient satisfaction GPE Most appropriately defined in terms of at
least ‘much improved’ or ‘very satisfied’
instead of including ‘slightly improved’
Work disability Return to work (days) Every day of earlier return to work is
important

GPE, global perceived effect; NRS, numerical rating scale; ODI, oswestry disability index; RDQ, Roland
disability questionnaire; VAS, visual analogue scale.

interpretation of the pain experience and the patient’s assignment of the value to the
measurement scale. There is growing evidence to suggest that pain intensity, in
combination with its interference with activities, contributes to an underlying construct
of global pain severity.13 There is a large body of research on the assessment of back
pain’s interference with activities. Measures to assess how back pain interferes with
activities will be discussed later; findings from methodological studies indicate that the
recall of key parameters of chronic recurrent pain, which includes pain intensity, have
acceptable levels of validity for at least a 3-month recall period.13 This indicates that
self-report pain intensity measures with an extended recall period can yield useful
information on pain outcomes (Table 2).

Visual analogue scale (VAS)

A VAS consists of a line, usually 100-mm long, with ends labelled as the extremes of pain
(e.g. ‘no pain’ to ‘pain as bad as it could be’). Specific points along this line might be
labelled with intensity-denoting adjectives or numbers. Patients are asked to indicate
which point along the line best represents their pain intensity and the distance from the
no-pain end to the mark made by the patient is the patient’s pain intensity score.
There is much evidence to support the validity of the VAS of pain intensity. Many
studies have demonstrated the construct validity: pain intensity scores as measured by
the VAS correlated positively with other self-reported measures of pain intensity.13 The
reliability of VAS scores has also been demonstrated.14–16 The VAS has a high number of
response categories: because it is usually measured in millimeters, a 100-mm VAS can
be considered as having 101 response levels. This makes the VAS potentially more
sensitive to changes in pain intensity than measures with a more limited number of
response categories. Although research comparing the VAS to other measures
indicates minimal differences in sensitivity to change most of the time, when differences
are found the VAS is usually more sensitive than other measures, especially those with a
limited number of response categories.13 Although the VAS is easy to administer,
investigators who plan to use VAS measures must explain the measurement scale and
procedures carefully to decrease failures.
598 R. W. J. G. Ostelo and H. C. W. de Vet

Minimally clinically important change of the VAS


Hagg et al17 assessed clinically important changes in a population of chronic low back
pain using the patient’s global assessment of treatment effect as external criterion. The
authors first estimated the smallest change that it was possible to detect with 95%
probability beyond the measurement error, which turned out to be 15 mm. Then they
analysed the difference in mean change scores between patients who considered
themselves ‘unchanged’ and patients who considered themselves ‘better’. This
difference turned out to be 18–19 mm, exceeding the smallest change possible to
detect. Beurskens et al18 assessed the responsiveness of the VAS in patients suffering
from non-specific low back pain for at least 6 weeks using the optimal cut-off point. The
patient’s global perceived effect was considered to be the golden standard. It appeared
that using a cut-off point from 10 to 18 mm on a VAS for pain intensity discriminates
best between patients who were improved and patients who were unchanged. It seems
reasonable for clinical decision making (and for power calculation of sample size for
studies on low back pain) to suggest that for patients with subacute or chronic low back
pain the MCIC should at least be 20 mm.
The MCIC has been assessed in patients with acute low back pain. It turned out that
the smallest change possible to detect with 95% probability beyond measurement error,
as measured with a VAS in unchanged patients, was 36.2 mm (95% confidence interval
(CI) 32.4, 41.0) (Ostelo, unpublished data). So, for acute low back pain it seems
reasonable to suggest that the MCIC should at least be at the level of approximately
35 mm.

Numerical rating scales (NRS)

An NRS involves asking patients to rate the pain from 0 to 10 (an 11-point scale), 0 to
20 (a 21-point scale) or 0 to 100 (a 101-point scale), with the understanding that
0 represents one end of the pain intensity continuum (i.e. no pain) and 10 or 100
represents the other extreme of pain intensity (i.e. pain as bad as it could be). The
11-point scale is most frequently used in low back pain studies. The patient is asked to
tick a score that best represents the intensity of his or her pain. The construct validity
of the NRS has been well documented.13 It demonstrates positive and significant
correlations with other measures of pain intensity and is easy to administer and to
score; it can also be administered over the phone. Overall, the NRS is a simple and
robust measurement method.

Minimally clinically important change of the NRS


Farrar et al19 assessed the clinically important change in chronic pain intensity as
measured on the NRS. Data from 2724 subjects in 10 placebo-controlled trials of
various chronic pain conditions, including chronic low back pain, were analysed. A
7-point patient global impression of change was used as the external criterion for what
patients considered to be a clinically important improvement. A consistent relationship
was demonstrated regardless of the study, disease type, age, sex, study result or
treatment group. The authors concluded that, on average, a decrease of 2 or more
points was associated with the category ‘much improved’ of the global rating scale of
change. Decreases of at least 4 points corresponded to ‘very much improved’.
However, this value does not represent the minimal detectable change as described
earlier in this paper because it does not take into account the measurement error.
Clinically important outcomes in LBP 599

Furthermore, it appeared that using a cut-off point of a 2-point decrease on the NRS
discriminates best between patients who were improved and patients who were
unchanged.
Van der Roer et al41 used data from a randomized controlled trial of 442 low back
pain patients. The MCIC was estimated over a 12-week period and, as described earlier,
both methods were used for estimating the MCIC using the global perceived effect
scale (GPE) as an external criterion and determinant for clinically important change. It
turned out that using a cut-off point 3.5 and 2.5, for patients with acute and chronic low
back pain, respectively, discriminates best between patients who were improved and
patients who were unchanged. The smallest change possible to detect with 95%
probability beyond the measurement error turned out to be 4.7 points and 4.5 points
for patients with acute and chronic back pain, respectively. High initial scores need to be
reduced more to be labelled as clinically important. In sum, it seems reasonable to
suggest that the MCIC should be at least 3.5 for patients with acute low back pain,
whereas for chronic low back pain 2.5 seems reasonable.

LOW-BACK-PAIN-SPECIFIC DISABILITY

The World Health Organization20 defines disability as ‘any restriction or lack of ability
(resulting from an impairment) to perform an activity in the manner or within the range
considered normal for a human being’. In low back pain, this is often interpreted as pain
interfering with activities such as mobility, dressing, sitting and standing. Patients can
give this information by completing disability questionnaires. Questionnaires are more
consistent and reliable than interviews (e.g. history taking) because they present the
questions in exactly the same way to every patient, every time.21 Many questionnaires
are available but definite statements about the superiority of one back-specific measure
over another cannot be made. However, an international expert panel recommends
using one of the two widely used measures, the Roland–Morris Disability
Questionnaire (RDQ) or the Oswestry Disability Index (ODI).1 It is important to
emphasize that the differences between these instruments are small: both cover about
the same content, are widely used, have been extensively tested and are applicable in a
wide variety of settings.

Roland–Morris disability questionnaire

The RDQ22 is derived from the Sickness Impact Profile. The RDQ contains 24 yes/no
items. Patients are asked whether the statements apply to them that day (i.e. the last
24 hours). The RDQ-24 score is calculated by adding up the number of items with a
‘yes’, which will range from 0 (no disability) to 24 (maximum disability). Several
modifications have been suggested but these seem to provide only modest
improvements over the original version and the original version has been
recommended for use.1
The RDQ focuses on a limited range of physical functions, including walking, bending
over, sitting, lying down, dressing, sleeping, self-care and daily activities. This limited
range is both a strength and a weakness for its content validity. The weakness is that
psychological and social problems are not included and, in situations where these
domains are important, the RDQ should be combined with specific measures of these
600 R. W. J. G. Ostelo and H. C. W. de Vet

domains. The strength is that, due its restricted nature, the RDQ is easy to score,
understand and interpret.23
The RDQ has demonstrated positive and significant correlations with other
measures of self-reported disability, such as the Oswestry Disability Index, the Quebec
Back Pain Disability Scale and the physical subscales of the SF-36. The RDQ does not
attempt to measure psychological and social problems associated with low back pain
and, as could be expected, RDQ scores correlate less well with measures intended to
measure psychosocial disability. The RDQ shows only modest correlation with direct
measures of physical function.23

Minimally clinically important change of the RDQ


The MCIC of the RDQ has been assessed in a number of studies. Stratford et al6
assessed the optimal cut-off value that best classified patients as those who had
achieved an important change and those who had not achieved an important change. In
total, 226 patients with low back pain for less than 6 weeks were included. The optimal
cut-off value turned out to be 5 points. Furthermore, these authors suggest that it
seems reasonable to take the initial RDQ score into account: an MCIC of 1–2 points for
patients with low initial scores (low level of disability) and an MCIC of 7–8 points for
patients with high initial scores (high level of disability).
A study by Beurskens et al18 included 81 patients with non-specific low back pain for
at least 6 weeks; the methodology to assess the MCIC was similar to that used in
Stratford et al’s study. Beurkens et al reported an optimal cut-off value of 2.5–5 points
for classifying patients who reported an important change and those who had not.
Patrick et al24 suggested an MCIC of 2–3 points based on a study that included
patients with sciatica, using the 23-item version of the RDQ.
A more recent study assessed the MCIC in patients with residual complaints after
lumbar disc surgery.25 It turned out that the smallest clinically relevant change possible
to detect with 95% probability beyond measurement error, as measured with the RDQ,
was 5.4 points. The optimal cut-off point of the RDQ-24 to discriminate between
patients who had significantly improved versus patients who were stable turned out to
be 3.5. However, using a cut-off point of 3.5 means that some stable patients will be
classified falsely as improved.
The above studies included various types of low back (acute, subacute and chronic).
The results did not yield consistent differences between these types of back pain but it
can be anticipated that, as in measuring pain intensity, the MCIC for acute low back is
probably larger than the MCIC for chronic low back pain. Therefore, in sum, it seems
reasonable to suggest that the MCIC for all types of low back pain should at least be 3.5.

Oswestry disability index

The ODI has 10 items that refer to activities of daily living that might be disrupted by
LBP. Answers should relate to the situation of ‘today’.26 Each item has six response
alternatives, ranging from ‘no problem’ to ‘not possible’. The ODI score is calculated as
follows: if the first statement (‘no problem’) is marked, the score is 0; if the last
statement (‘not possible’) is marked, the score is 5. Intervening statements are scored
accordingly to rank. So, for each item of six statements the maximum score is 5. If all 10
items are completed the score is calculated as follows: Total scored/[50 (total possible
score)!100] to obtain the score expressed in percentages. So the total ODI score
ranges from 0 (no disability) to 100 (maximum disability). Like the RDQ, the ODI
Clinically important outcomes in LBP 601

focuses on a limited range of physical functions, including standing, walking, lifting,


sitting, lying down, dressing and personal care. As with the RDQ this limited range is
both a strength and a weakness for its content validity. The ODI was validated and
improved in a study by Medical Research Council group and this version (2.0) is now
recommended for general use.23 The construct validity of the ODI has been assessed
by correlation with other questionnaires measuring low-back-pain-specific disability.
The results were satisfactory and justify the use of the ODI.23 Reproducibility was
originally tested by Fairbank26, who included patients with chronic low back pain.
Patients were tested twice within 24 hours and the correlation between the two
scores was high. However, another study with a greater time interval resulted in a poor
test–retest correlation.23

Minimally clinically important change of the ODI


In the study mentioned above, Beurskens et al18 also assessed the optimal cut-off of the
ODI to discriminate between patients who had significantly improved versus patients
who remained stable. The authors present a cut-off value of 4–6 points of the 100
points for the ODI. But, as already stated, using a cut-off point as assessed by this ROC
methodology means that some stable patients will be classified falsely as improved.
Hagg et al17 estimated the MCIC in a study including 289 patients treated surgically or
non-surgically. The smallest clinically relevant change possible to detect with 95%
probability beyond measurement error, as measured with the ODI, was 10 points.
Although the studies included various types of low back pain it was not possible
specifically to differentiate between acute and chronic low back pain. However, as with
measuring pain intensity, it can be anticipated that for acute low back the MCIC is most
likely larger than the MCIC for chronic low back pain. Overall, it seems reasonable to
suggest that the MCIC should at least be 10 points. However, as with the RDQ, this
value should be used as an indication. Depending on the aim of the measurement, the
exact value for the MCIC can be determined, taking baseline values, the target
population and the method used into account.

GLOBAL PERCEIVED EFFECT

Patient satisfaction is measured in many ways. Hudak and Wright27 reviewed the
various instruments and provided a framework for understanding the strengths and the
weaknesses of various instruments. Their main conclusion was that it is not possible to
recommend any particular single measure because no measure alone is ideal.
Satisfaction with care should be assessed separately from satisfaction with treatment
outcome. Here, we focus on the satisfaction with treatment outcome. Global ratings of
change are often used to measure satisfaction with treatment outcome. Norman et al28
have raised two major concerns regarding the use of the global rating scales: (i) the
reliability and validity of global ratings are unknown; and (ii) global ratings are typically
correlated with the patient’s present status and are not an unbiased measure of change.
Despite these criticisms, other authors regard global rating scales of change as clinical
relevant outcome measures and as being valid and responsive measures of the patients’
perceived benefit.1,29,30 Most physicians would be reluctant to label a patient as
improved or deteriorated against the patient’s personal assessment.
Also in the field of low back pain, the use of a global rating of change is common.
Based on the review by Hudak and Wright27, the expert panel recommends the use of
602 R. W. J. G. Ostelo and H. C. W. de Vet

a 7-point rating scale.1 There are many variations of this global question. Hudak and
Wright27 prefer the following question: ‘All things considered, how satisfied are you
with the results of your recent treatment?’ 1Zextremely satisfied, 2Zvery satisfied,
3Zsomewhat satisfied, 4Zmixed (approximately equal satisfaction and dissatisfaction),
5Zsomewhat dissatisfied, 6Zvery dissatisfied, and 7Zextremely dissatisfied. Unlike
the expert panel, Hudak and Wright also includes an eighth option: ‘not sure/no
opinion’.
Another frequently used 7-point scale asks the patients to score their perceived
change after the treatment, ranging from: 1Zcompletely recovered, 2Zmuch
improved, 3Zslightly improved, 4Zno change, 5Zslightly worsened, 6Zmuch
worsened, and 7Zworse than ever. Sometimes a 6-point scale is used, which excludes
the option ‘worse than ever’.

Reliability and validity

In general, there is very little information regarding reliability and validity of global rating
scales of change.28 Fritz and Irrgang31 assessed the construct validity of the global rating
scale of change and concluded that the global rating of change could be used to separate
‘stable’ patients from ‘improved’ patients in the dimension physical impairment.
However, the global rating scale they used ranges from K7 (‘a very great deal worse’)
through 0 (‘about the same’) to C7 (‘a very great deal better’) and is therefore difficult
to compare with the 7-point scale.

Minimally clinically important change


Some studies clustered the categories of the global rating into an ‘improved’ and an
‘unchanged group’. Typically, categories 1 (Zcompletely recovered) and 2 (Zmuch
improved) were considered as ‘improved’ whereas category 3 (Zslightly improved)
was considered as ‘unchanged’.17,18,25,32 There was support for including ‘slightly
improved’ in the ‘unchanged’ group because there were statistically significant
differences between the change scores of disability and pain of the ‘slightly improved’
group and the ‘much improved’ group. Moreover, these studies did not show
statistically significant differences between the change scores of the ‘slightly improved’
group and the ‘no change’ group, which were both included in the ‘unchanged’ group.
However, conceptually it seems more reasonable to define a clinical important change
in terms of at least ‘much improved’ or ‘very satisfied’ instead of only ‘slightly
improved’.

WORK DISABILITY/WORK STATUS

An important consequence of low back pain and disability is that people are unable to
work, resulting in work absence due to sick leave and work disability. Parameters of
work absence are the duration of the absence, the percentage of work disability in
cases of partial sick leave, and the reason of work absence. Duration of sick leave is
usually expressed in number of days of work absence due to a low back pain. If sick
leave is measured continuously (e.g. by registries) time to return to work is the most
adequate outcome measure. If work status is measured at certain points in time,
percentage of persons returned to work is the parameter of interest. To assess
Clinically important outcomes in LBP 603

the total days of sick leave in a specific period, the duration of all episodes of sick leave
in this period are summed. When calculating days of sick leave, partial work should be
considered. For example, a 1-week absence for a person with an 80% job amounts to
4 days of sick leave, not 5. In addition, partial sick leave should be taken into account.
For example, a person on sick leave for 100% for 2 weeks and then for 50% for an
additional 2 weeks is counted as having had 3 weeks of sick leave. There are different
definitions in the determination of the period of sick leave: one researchers might
consider that 1 day back to work marks a new episode of sick leave because of back
pain33; others consider a period of sick leave to be ended if a person resumes work
for a defined period. For example, successful return to work was defined by
Steenstra34 as 4 weeks of work resumption without any day of sick leave in this
4-week period. It is recognized that persons who return to work are not as
productive as before they developed back pain. It is a challenge for future research to
measure productivity at work.

Reliability and validity

There are several methods to measure employees’ work absence. The most
important data come from registries held by the company or occupational health
service; data collected by self-administered questionnaires or by interview are also
extremely useful. Data from registries are usually considered to be valid because they
are registered prospectively. However, short periods of sickness absence might be
missed by registries35, also, the reason registered for work absence is usually provided
by the worker. To assess criterion validity, several studies have compared self-
reported sick leave data with data from registries, and expressed the results in terms
of sensitivity and specificity. Sensitivity is the percentage of persons with sick leave due
to low back pain according to the registry, who report sick leave in the questionnaire/
interview; specificity indicates the percentage of persons without work absence
according to the registries, who report no work absence in the questionnaire/
interview. The specificity of questionnaires is high, meaning that persons seldom
report sick leave due to low back pain when there has been no sick leave according to
the registry. The sensitivity varied from 88% in the study by Burdorf35 to 55% in van
Poppel’s study.36 In both studies, sensitivity depends on the period of recall, the level
of education of the workers and the duration of the episode of sick leave. Fredriksson
et al37 assessed the validity of self-reported sick leave due to musculoskeletal diseases
by comparing these data with registered sick leave. They concluded that the validity of
retrospectively collected self-reported sick-leave data was sufficient for use as a
measure of musculoskeletal morbidity in the analyses of associations with work-
related conditions. Because of the relatively low sensitivity, such data will
underestimate the prevalence of sick leave and should not be used for surveys of
morbidity.
The ability to recall sick leave accurately is good over a period of 2–3 months35,38,
but diminishes as the period becomes longer.38 Separate episodes of sick leave are
more difficult to report than the question of whether sick leave due to back pain has
occurred during a specified period.36 The employee usually reports the reason for
sick leave; this holds both for the registries and the self-reported sick leave. However,
in case of low back pain the difference with doctors’ reports about back pain will be
small. Reliability of work absence data has seldom be reported. Fredriksson et al37
604 R. W. J. G. Ostelo and H. C. W. de Vet

showed that self-reported sick leave related to musculoskeletal diseases is fairly


reliable.

Minimally clinically important change


The minimally clinically important change has not been defined for work absence;
there are two reasons for this. First, determining minimally clinically important
changes is especially relevant for outcome measures with unknown metrics, such as
disability questionnaires. The metrics of work absence, being days off work, is well
known. Second, sick leave and work disability are more regarded as social and
economical issues than as indicators of morbidity.39 In other words, the question of
whether a certain change in sick leave is important, is mostly answered from a
socioeconomic perspective. Besides the fact that long duration of sick leave can lead
to social isolation, sick leave and work disability have major economical consequences.
The indirect costs of lost productivity due to work absence usually outweigh the
direct medical costs.40 Therefore, from a socioeconomic point of view, work absence
is a very important issue and each day of earlier return to work is important from that
perspective.

SUMMARY

This chapter focuses on the four most important domains that are directly related to
low back pain: pain intensity, low-back-pain-specific disability, patient satisfaction with
treatment outcome and work disability. It elaborates on those questionnaires that are
validated and widely used in low back pain. The two most commonly used methods to
assess pain intensity are the visual analogue scale (VAS) and numerical rating scale
(NRS). It seems reasonable to suggest that for the VAS the MCIC should at least be
20 mm and at least be 35 mm for chronic and acute low back pain, respectively. For
the NRS it seems reasonable to suggest that the MCIC should at least be 2.5 and at
least 3.5 points for chronic and acute low back pain, respectively. The Roland–Morris
Disability Questionnaire (RDQ) and the Oswestry Disability Index (ODI) are two
widely used questionnaires for measuring low-back-pain-specific disability and both
have been tested extensively. For the RDQ it seems reasonable to suggest that the
MCIC should at least be 3.5. When using the ODI it seems reasonable to suggest that
the MCIC should at least be 10 points. However, these values for the MCIC should be
used as an indication and, depending on the aim of the measurement, the exact value
for the MCIC can be determined and initial disability scores and the target population
can be taken into account. For measuring the satisfaction with treatment outcome, a
7-point rating scale of change is advised and there is support for the fact that
categories 1 (Zcompletely recovered) and 2 (Zmuch improved) can be considered
as ‘clinically relevant improved’ whereas category 3 (Zslightly improved) should be
considered as ‘unchanged’. There are two frequently used methods to measure work
absence: data from registries that are held by the company or occupational health
service, and data collected by self-administered questionnaires or by interview. Data
from registries is usually considered to be a valid measure because it is registered
prospectively. The indirect costs due to lost productivity because of work absence
usually outweigh the medical costs. Therefore, every day of earlier return to work is
important.
Clinically important outcomes in LBP 605

Practice points

† to improve the comparability between studies and patients’ scores, it is


important to administer widely used questionnaires that have been well
validated
† the minimally clinically important change, as presented in this chapter, should be
used as an indication. Depending on the aim of the measurement, the exact
value for the MCIC can be determined, taking initial disability scores and the
target population into account

Research agenda

† to develop a reliable and valid external criteria to determine the MCIC


† to study the influence of baseline scores on the MCIC
† to determine whether the MCIC can better be expressed in scale points or in
percentages of improvement
† to develop measures to quantify productivity at work

REFERENCES

*1. Bombardier C. Outcome assessments in the evaluation of treatment of spinal disorders: summary and
general recommendations. Spine 2000; 25: 3100–3103.
2. Streiner DL & Norman GR. Health Measurement Scales. 3rd edn. Oxford: Oxford University Press; 2003.
3. Kirshner B & Guyatt G. A methodological framework for assessing health indices. Journal of Chronic
Disease 1985; 38: 27–36.
4. Vet de HC, Terwee CB & Bouter LM. Current challenges in clinimetrics. Journal of Clinical Epidemiology
2003; 56: 1137–1141.
5. Jaeschke R, Singer J & Guyatt GH. Measurement of health status. Ascertaining the minimal clinically
important difference. Controlled Clinical Trials 1989; 10: 407–415.
6. Stratford PW, Binkley JM, Riddle DL & Guyatt GH. Sensitivity to change of the Roland–Morris back pain
questionnaire: part 1. Physical Therapy 1998; 78: 1186–1196.
*7. Wells G, Beaton D, Shea B et al. Minimal clinically important differences: review of methods. Journal of
Rheumatology 2001; 28: 406–412.
8. Lee JS, Hobden E, Stiell IG & Wells GA. Clinically important change in the visual analog scale after
adequate pain control. Academic Emergency Medicine 2003; 10: 1128–1130.
9. Crosby RD, Kolotkin RL & Williams GR. Defining clinically meaningful change in health-related quality of
life. Journal of Clinical Epidemiology 2003; 56: 395–407.
* 10. Bombardier C, Hayden J & Beaton DE. Minimal clinically important difference. Low back pain: outcome
measures. Journal of Rheumatology 2001; 28: 431–438.
* 11. Beaton DE, Boers M & Wells GA. Many faces of the minimal clinically important difference
(MCID): a literature review and directions for future research. Current Opinion in Rheumatology 2002;
14: 109–114.
12. Deyo RA & Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to
diagnostic test performance. Journal of Chronic Disease 1986; 39: 897–906.
* 13. Von Korff M, Jensen MP & Karoly P. Assessing global pain severity by self-report in clinical and health
services research. Spine 2000; 25: 3140–3151.
14. Carlsson AM. Assessment of chronic pain. I. Aspects of the reliability and validity of the visual analogue
scale. Pain 1983; 16(1): 87–101.
606 R. W. J. G. Ostelo and H. C. W. de Vet

15. Revill SI, Robinson JO, Rosen M & Hogg MI. The reliability of a linear analogue for evaluating pain.
Anaesthesia 1976; 31: 1191–1198.
16. Sriwatanakul K, Kelvie W, Lasagna L et al. Studies with different types of visual analog scales for
measurement of pain. Clinical Pharmacology and Therapeutics 1983; 34: 234–239.
17. Hagg O, Fritzell P & Nordwall A. The clinical importance of changes in outcome scores after treatment
for chronic low back pain. European Spine Journal 2003; 12: 12–20.
18. Beurskens AJ, de Vet HC & Koke AJ. Responsiveness of functional status in low back pain: a comparison of
different instruments. Pain 1996; 65: 71–76.
19. Farrar JT, Young Jr. JP, LaMoreaux L et al. Clinical importance of changes in chronic pain intensity
measured on an 11-point numerical pain rating scale. Pain 2001; 94: 149–158.
20. World Health Organization (WHO). International Classification of Impairments, Disabilities and Handicaps.
Geneva: WHO; 1980.
21. Waddell G. The Back Pain Revolution. Edinburgh: Churchill Livingstone; 1998.
22. Roland M & Morris R. A study of the natural history of back pain. Part I: development of a reliable and
sensitive measure of disability in low-back pain. Spine 1983; 8: 141–144.
* 23. Roland M & Fairbank J. The Roland–Morris Disability Questionnaire and the Oswestry Disability
Questionnaire. Spine 2000; 25: 3115–3124.
24. Patrick DL, Deyo RA, Atlas SJ et al. Assessing health-related quality of life in patients with sciatica. Spine
1995; 20: 1899–1908.
25. Ostelo RW, de Vet HC, Knol DL & van den Brandt PA. 24-item Roland–Morris Disability Questionnaire
was preferred out of six functional status questionnaires for post-lumbar disc surgery. Journal of Clinical
Epidemiology 2004; 57: 268–276.
26. Fairbank JC, Couper J, Davies JB & O’Brien JP. The Oswestry low back pain disability questionnaire.
Physiotherapy 1980; 66: 271–273.
* 27. Hudak PL & Wright JG. The characteristics of patient satisfaction measures. Spine 2000; 25:
3167–3177.
28. Norman GR, Stratford P & Regehr G. Methodological problems in the retrospective computation
of responsiveness to change: the lesson of Cronbach. Journal of Clinical Epidemiology 1997; 50:
869–879.
29. Bombardier C, Tugwell P, Sinclair A et al. Preference for endpoint measures in clinical trials: results of
structured workshops. Journal of Rheumatology 1982; 9: 798–801.
30. Fries JF. Toward an understanding of patient outcome measurement. Arthritis and Rheumatism 1983; 26:
697–704.
31. Fritz JM & Irrgang JJ. A comparison of a modified Oswestry Low Back Pain Disability Questionnaire and
the Quebec Back Pain Disability Scale. Physical Therapy 2001; 81: 776–788.
32. Davidson M & Keating JL. A comparison of five low back disability questionnaires: reliability and
responsiveness. Physical Therapy 2002; 82: 8–24.
33. Vet de HC, Heymans MW, Dunn KM et al. Episodes of low back pain: a proposal for uniform definitions to
be used in research. Spine 2002; 27: 2409–2416.
34. Steenstra IA, Anema JR, Bongers PM et al. Cost effectiveness of a multi-stage return to work program for
workers on sick leave due to low back pain, design of a population based controlled trial
[ISRCTN60233560]. BMC Musculoskeletal Disorder 2003; 4: 26.
* 35. Burdorf A, Post W & Bruggeling T. Reliability of a questionnaire on sickness absence with specific
attention to absence due to back pain and respiratory complaints. Occupational and Environmental Medicine
1996; 53: 58–62.
* 36. van Poppel MN, de Vet HC, Koes BW et al. Measuring sick leave: a comparison of self-
reported data on sick leave and data from company records. Occupational Medicine (London) 2002;
52: 485–490.
37. Fredriksson K, Toomingas A, Torgen M et al. Validity and reliability of self-reported retrospectively
collected data on sick leave related to musculoskeletal diseases. Scandinavian Journal of Work, Environment
and Health 1998; 24: 425–431.
* 38. Severens JL, Mulder J, Laheij RJ & Verbeek AL. Precision and accuracy in measuring absence from
work as a basis for calculating productivity costs in The Netherlands. Social Science and Medicine
2000; 51: 243–249.
Clinically important outcomes in LBP 607

39. Bourbonnais R, Vinet A, Vezina M & Gingras S. Certified sick leave as a non-specific morbidity indicator: a
case-referent study among nurses. British Journal of Industrial Medicine 1992; 49: 673–678.
40. van Tulder MW, Koes BW & Bouter LM. A cost-of-illness study of back pain in The Netherlands. Pain
1995; 62: 233–240.
41. van der Roer N, Ostelo RWJG, van Tulder MW & de Vet HCW. MCIC of three important outcome
measures in low back pain. Spine 2005; (in press).

Das könnte Ihnen auch gefallen