Helen A. Clare, MAppSc,a Roger Adams, PhD,b and Christopher G. Maher, PhDc

Background and Purpose: The poor reliability of lateral shift detection has been attributed to lack of rater

training, biologic variation, and test reactivity. This study aimed to remove the potential confounding arising from
biological variation and test reactivity and control the level of rater experience/training in making judgments of lateral
Subjects: One hundred forty-eight raters with 3 levels of clinical physical therapy experience and training in the
McKenzie method participated.
Method: The raters viewed photographic slides of 45 patients with low back pain. Slides were judged on a
numerical scale for presence and direction of a shift. Intrarater reliability was evaluated using the intraclass
correlation coefficient (ICC) and interrater reliability was evaluated using both the ICC and ␬ statistic.
Results: Reliability of shift judgments was only moderate for all groups (eg, ICC [2,1] values ranged from 0.48 to
Conclusion: Lateral shift judgements have only moderate reliability, even when trained raters judge stable stimuli.
We propose that the photo model employed can be used to explore the source of error in this process. (J
Manipulative Physiol Ther 2003;26:476-80)
Key Indexing Terms: Low Back Pain; Lumbar Spine; Lateral Shift; Reliability of Testing; McKenzie Method

recent survey of physical therapists in the United
States1 reported that a McKenzie evaluation was
one of the most common evaluations performed for
patients with low back pain (LBP) and that almost half the
therapists viewed the McKenzie method as the most useful
management approach for low back pain. Similar results
have been reported for British and Irish physiotherapists.2,3
The method has received support as an effective LBP
treatment in a systematic review of activity prescription for
back pain4 and also in Danish clinical practice guidelines,5
based on the 2 existing clinical trials.6,7 Subsequent to the
completion of both reviews, Cherkin et al8 published a
clinical trial evaluating 3 approaches (chiropractic manipulation, McKenzie therapy, and an educational booklet) and



Private practice of physiotherapy, Sydney, Australia, and PhD
candidate, School of Physiotherapy, The University of Sydney,
Sydney, Australia.
Senior lecturer, School of Physiotherapy, The University of
Sydney, Sydney, Australia.
Associate Professor, School of Physiotherapy, University of
Sydney, Sydney, Australia.
Submit requests for reprints to: Helen Clare, PT, MAppSc, 16
Ayres Road, St Ives NSW 2075, Sydney, Australia (e-mail:
Paper submitted June 6, 2002.
Copyright © 2003 by National University of Health Sciences.
0161-4754/2003/$30.00 ⫹ 0


found chiropractic manipulation and McKenzie therapy to
have similar effects and costs. However, both treatments
provided only marginally better outcomes than an educational booklet.8 In this environment, further information
about the use of the basic criteria in the method is needed.
The principal aim of the McKenzie assessment is to first
determine those suitable for treatment with this approach.
Suitable patients must fit one of 3 syndromes: postural,
dysfunction, or derangement.9 The derangement syndrome
is further divided into 7 subsyndromes on the basis of pain
location, the behavior of the pain in response to the application of repeated spinal movements, and on the presence or
absence of deformities including a lateral shift. Because
classification determines the specific treatment used by the
treating clinician, accurate classification is believed essential for the effective management of the LBP patient.
In employing the method, the presence of a lateral shift is
determined by visual inspection at the time the patient’s
posture is evaluated. If a lateral shift is deemed to be
present, lateral glide movements are performed to assess if
these alter the patient symptoms. Where this is the case, the
shift is classified as “relevant’ and directs the initial treatment approach.9 The initial step of detecting a shift is of
paramount importance, because it is only if a shift is identified to be present that its relevance is determined.
A lateral shift is defined as a lateral displacement of the
trunk in relation to the pelvis.9 The prevalence of a lateral

477 . including first-year undergraduate students. Riddle and Rothstein11 examined the intertester reliability of assessments of LBP patients made by physical therapists using the McKenzie method. ␬ ⫽ 0. Two hypotheses have been offered for the poor reliability observed: Clare. Subjects Patients with low back pain. The photographs of the patients had been taken by the first author on the same day that she performed a full clinical examination of these patients.6%. The aims of the study were to investigate: ● the intrarater/interrater reliability of judgements of lateral shift made from inspection of photographs of patients with low back pain. duration of symptoms. They visually determined whether a lateral shift deformity was present for each patient. 2 physiotherapists with some training in the McKenzie method simultaneously evaluated 41 patients. location of symptoms. The attribute is subtle.16 2. Adams. The paired assessments were completed consecutively. The 2 physical therapists involved in this study were both trained extensively in the McKenzie method and assessed the patients simultaneously in an attempt to reduce the error related to repeated examinations.17 One way to explore the first hypothesis is to use a model of clinical practice that allows for greater control than would be possible in the clinic. probably because of the problems with measurement of this attribute. It is therefore worthwhile to explore the source of disagreement. demographic and clinical data were recorded for each patient. and the reliability estimates are in the range poor to moderate. and also allows for a much larger panel of raters than is practical in a traditional clinical reliability study. Patients attending a private physiotherapy clinic for low back pain were invited to participate in the study. Sixteen of the therapists had attended at least 1 postgraduate course in the McKenzie method. ␬ ⫽ 0. previous history of LBP. and there was insufficient data to allow calculation of this statistic. height. graduate physiotherapists with no formal training in the McKenzie method. On the same visit. The criteria for inclusion were that they were currently experiencing low back pain with or without radiation to the leg. and graduate physiotherapists with a minimum of 70 hours training in the McKenzie method.Journal of Manipulative and Physiological Therapeutics Volume 26. we selected a cross section of raters. for example. This method avoids the potentially confounding effect of the biologic variation of the shift. They also aimed to determine whether training in the McKenzie method influenced reliability. To explore the effect of clinical experience and training. there was no summary reliability statistic reported to allow comparison to other studies. working status. however. however.12 The reliability of therapists in determining the presence of a lateral shift has been evaluated in 6 studies to date. METHOD Project Overview The design of the experiment required raters to inspect a set of photographic slides of patients with low back pain and to judge whether a shift was present. They found a high error rate in the determination of the presence of a lateral shift (60% agreement. The measuring devices used to date do not seem to improve reliability. later studies report approximate prevalences of 20%11 and 80%. allows for an unlimited number of repetitions of the same stimuli. frequency. There was only 55% agreement on the presence or absence of a lateral shift. Donahue et al12 attempted to improve the reliability of the determination of the presence and direction of a lateral shift by using a simple measuring device. age. ● whether interrater reliability and discriminability were influenced by level of education in the McKenzie method. but the reported ␬ value for the decisions indicated very poor reliability. pain intensity. these studies did not provide ␬ values. Porter and Miller10 suggest that it is an uncommon feature. Number 8 shift has proved hard to establish. weight. Information was collected from the subjects regarding their gender. with a time interval between examinations. The raters consisted of: ● 60 first-year undergraduate physical therapy students with no clinical experience or training in the McKenzie method. Improved reliability in determining the presence of a lateral shift (78% agreement.26) and concluded that this was a possible source of error in the determination of the syndrome classifications. Forty-nine physical therapists from 8 clinics examined 363 patients. a value similar to the findings of Nelson et al. Raters. Based on the research to date. In the Kilby et al13 study. However. McLean et al15 investigated 3 different techniques for measuring trunk list and concluded that the use of a plumb line provided the most reliable measures. citing a prevalence of 5.14 who reported that the detection of lumbar tilt (lateral shift) had high interobserver error. and Maher Reliability of Shift Detection The attribute is inherently unstable and changes with repeated examination. All subjects gave written consent prior to participating. and clinical experience and training are necessary to reliably measure a lateral shift. the use of photographs as the stimuli to be rated rather than real patients. 1. it remains unclear whether a lateral shift can be detected with acceptable reliability. and functional status (Table 1).52) was demonstrated by Razmjou et al16 for therapists observing the same patient assessment.

The interrater reliabilities. This was done for each of the 3 groups.” The assessors were provided with a data collection form and were instructed that for each subject slide they were required to make 2 determinations. The photographs were converted into slides and duplicates were made. The instructions given to the raters were that they were to determine the presence or not of a lumbar lateral shift. ⫺1 ⫽ uncertain shifted to the left. Tokyo. The patient then resumed their normal treatment. The data sheets were collected. Intrarater reliability was determined by comparing the judgments of lateral shifts of the first presentation of the 45 subjects with the second presentation. shift absent. The assessors were instructed not to share their views about each slide with others. which resulted in 90 slides of the 45 patients. The camera was placed on a tripod 3 meters from the subjects.20 DISCUSSION Despite using a simplified model of clinical practice that removed any potential for reactivity and biologic variation. This analysis considers the data as continuous data. again in the range representing fair to good reliability. The interrater reliability (as determined by the ICC and ␬) were calculated for each of the 3 groups. again suggesting fair reliability. 1 ⫽ uncertain shifted to the right. The intrarater reliability (ICC 2. To allow comparison with other studies that have evaluated reliability with ␬. the argument is unnecessary. Procedure Investigator HC conducted a complete clinical examination of each patient and then asked the patient to stand within a doorway with their back toward a camera (Cannon EOS 3000 88. Subject characteristics Characteristic Number of subjects Age (y) Height (cm) Weight (kg) Pain intensity (VAS cm) Quebec Disability score Female gender Past LBP Frequency of pain (% constant) Duration of symptoms Acute (ⱕ7 days) Subacute (⬎7 days . Data Analysis Reliability of detecting a shift.7) 47. 0 ⫽ neutral.6 (14) 164 (12) 73 (15) 5. whereas others may consider the data to represent ordinal data. set on auto focus. The ICC values ranged from 0.49 to 0. However.1) for each subject was determined and then a group mean value and 95% CI for the group was determined.1]) for each group. The second determination required them to indicate the level of certainty of the first determination by rating it either certain or uncertain. The first determination consisted of 1 of 3 choices: left lateral shift present. Adams. A photograph was then immediately taken.3 (19) 58% 87% 51% 18% 31% 51% 56% 27% 44% Data for continuous variables are mean values with SDs in parentheses. These were randomly positioned in a slide tray so that the order of the second set of slides varied from the first set.48 to 0. ranged from 0. the absolute difference between groups was small and was revealed as statistically significant because of the high power of the .38. and Maher Reliability of Shift Detection Journal of Manipulative and Physiological Therapeutics October 2003 Table 1. For both intrarater and interrater reliability. The raters’ judgements were converted to a 5-point scale of confidence that the patient had a right shift: ⫺2 ⫽ certain shifted to the left.7 weeks) Chronic (⬎7 weeks) Radiation into leg Radiation below the knee Working normal duties 45 50. categorical variables are percentages.SPS macro in SPSS 10.64. While the McKenzie trained raters were more reliable in judging a shift than the other 2 groups of raters. as Fleiss and Cohen18 have shown that weighted ␬ and the ICC are equivalent. using the SPSS Macro ICCSF2.59.0 (SPSS. and was able to be activated from a distance. They were read the following: “McKenzie 1981 defines a lateral shift as when the top half of the patient’s body has moved laterally in relation to the bottom half.478 Clare. ● ● 46 graduate physical therapists with some clinical experience but no formal training in the McKenzie method. Ninety-five percent CIs for each statistic were calculated. and the information was entered for analysis. which falls within the range of ICC values described by Fleiss19 as representing fair to good reliability.26 to 0. Ill).SPS within SPSS 10. inspection of the 95% CIs reveals that the McKenzie group had statistically significantly greater reliability than the other groups. RESULTS Intrarater reliabilities. the reliability of shift detection remained unacceptably low.0. 42 graduate physical therapists who had clinical experience and had completed a minimum of 70 hours of formal training in the McKenzie method. are shown in Table 2. right lateral shift present. 2 ⫽ certain shifted to the right. Japan). as expressed by ICC values with 95% CIs. we calculated the multirater ␬ (an unweighted form of ␬) using the MKAPPASC. The ␬ (Table 2) ranged from 0.6 (1. expressed as ICC values. Chicago. Intrarater and interrater reliability were evaluated by calculating the intraclass correlation coefficient (ICC [2. The slides were shown to the 3 sets of raters. This was performed for all raters.

Once these have been established.1:1106. Stankovic R. Management of nonspecific low-back pain by physiotherapists in Britain and Ireland. Spine 1995. we are unable to provide an explanation for the difficulty in determining the presence of a lateral shift.46-0.21 Interestingly. Porter RW. The lumbar spine: mechanical diagnosis and therapy. 11:596-600. Linton SJ. the authors reported perfect agreement in judging lateral shifts. anthropometric variables of the stimuli.71) 0. Our study had unusually high power because we used a model that allowed for a large rater pool (range 42-60 raters).339:1021-29. Thompson KA.36 (0. 8. McDonough SM. Cherkin D. Baxter GD. Danish Health Technol Assess 1999.61) 0. Riddle DL. Managing low-back pain: attitudes and treatment preferences of physical therapists. 6. Back pain and trunk list.24:1332-42. Number 8 Clare. with the model that we utilized in this study. N Engl J Med 1998. p. A comparison of physical therapy. Intertester reliability of McKenzie’s classifications of the syndrome types present in patients with low-back pain. and we have now developed a protocol that allows physical therapists to accurately judge stiffness.59 (0. and Maher Reliability of Shift Detection Table 2. Maher C.37) 0. Phys Ther 1994. 16:214-28. Nwuga V. and visual accuracy of the raters. Barlow W. Associates also alerted us to an earlier study that similarly used slides of patients as the stimuli to be rated. 3. Conservative treatment of acute lowback pain.51) 0. Refshauge K. Waikanae. Latimer J. 11.43-0. 24-6. Wheeler K.42-0.27) 0. Clin J Pain 2000. Prescription of activity for low-back pain: what works? Aust J Physiother 1999.57-0.2) and direction of a lateral shift (␬ ⫽ 0.26 (0. chiropractic manipulation. Dunn R. study. 2. As for the difference in ICC values not being large in absolute terms.12 that have noted major problems with the detection of lumbar shifts. 479 .48 (0. A 5-year follow-up study of two methods of treatment. Allen JM. Sulivan MS. a protocol may be able to be developed to improve the reliability of detection of the lateral shift. Miller CG. A series of studies has led us to a greater understanding of the issue. We would view this endeavor as similar to the one we embarked on 7 years ago when we reported similarly low reliability for physical therapists’ judgements of lumbar posteroanterior spinal stiffness. Relative therapeutic efficacy of the Williams and McKenzie protocols in back pain management.37-0. Subsequent to completing this study. Phys Ther 1996.45:12132. McKenzie RA. Baxter GD. Danish Institute for Health and Technology Assessment.64 (0. Ciol MA. Battie M. Spine 1999.20:469-72. Intrarater and interrater reliability of shift judgements Interrater† Intrarater* Raters ICC ICC Kappa First-year students Graduate physical therapist McKenzie trained physical therapists 0. an additional study has been published that has evaluated the reliability of shift detection. intraclass correlation coefficient. the highest value still did not reach the benchmark for excellent reliability (0. 5. 7. Johnell O. We are unable to offer an explanation for this result. 4.59) 0. 10.Journal of Manipulative and Physiological Therapeutics Volume 26. Dusior TE. Rothstein JM. 1981. Deyo R. Street J. they reported a similar value for agreement on detecting the presence (␬ ⫽ 0. Physiother Pract 1985. However. and provision of an educational booklet for the treatment of patients with low-back pain. management and prevention from an HITA perspective. we could rigorously evaluate factors such as training. 12. Cherkin DC.53 (0. Frequency. ACKNOWLEDGMENTS This study was approved by the Human Research Ethics Committee of the University of Sydney. *Group mean value and 95% CI of the point estimate ICC for each subject.53-0. Intertester reliability of a modified version of McKenzie’s lateral shift assessments obtained on patients with low-back pain. We recommend that this model utilizing photographs of LBP patients be used to further study the features of the lateral shift that influence the rater’s decision as to its presence and direction.19 Our result is consistent with other studies11. Without further investigations to determine which cues are influencing the decision of the raters.55-0.75) suggested by Fleiss.49 (0. Lowback pain.53) 0. whereas the typical reliability study has 2 raters. Foster NE.25-0.39) ICC. REFERENCES 1.74:219-26. Moore AP.63) 0.1:99-105. Hurly DA. Spine 1993. Spine 1986.4) between 2 experienced McKenzie trained physical therapists. Adams. Battie MC. Riddle DL. the reliability of the raters in this study was unacceptable.38 (0. Donahue MS.18:1333-44.56 (0. Nwuga G.23 CONCLUSION Despite the task of judging the presence or absence of a lateral shift being simplified by the removal of biologic variation and test reactivity. New Zealand: Spinal Publication Limited. 9. † Point estimate and 95% CI for a single ICC or Kappa that compares multiple raters. Biopsychosocial screening questionnaire for patients with low-back pain: preliminary report of utility in physiotherapy practice in Northern Ireland.35-0.76: 706-16.22 In contrast to all other reliability studies.

76:579-83. 16. Educ Psychol Meas 1973. DeDombal F. Donelson R. Delitto A. Maher C. Cohen J.71:505-13. Spine 1996. Chiradejnant A. 20. Razmjou H. 18. Alen M. Porter RW. Spine 1994. Stigant M. Biometrics 1977.33:159-74.19:1414. Robert A. movement tests. A comparison of methods for measuring trunk list.70: 480-85. 17. Kankaanpaa M. Reliability and reproducibility of clinical findings in low-back pain. J Orthop Sports Phys Ther 2000. 22. 14. Objective manual assessment of lumbar PA stiffness is now possible. Association between direction of lateral lumbar shift. 21. 21:1667-70. McLean IP. Koch G. 1-32. Videman T. Intertester reliability of the McKenzie evaluation in assessing patients with mechanical low-back pain. Allen P. Spine 1979. Interexaminer relaibility of low-back pain assessment using the McKenzie method. Fleiss JL. Landis J.30:368-89. 23. Gillan MG.4:97-101. Clamp S. Phys Ther 1990. J Manipulative Physiol Ther 2003. using a “McKenzie algorithm.27: E207-E214. Adams. New York: John Wiley. Leminen P. Journal of Manipulative and Physiological Therapeutics October 2003 19. . Ross JC. Aspden RM. 15. The reliability of back pain assessment by physiotherapists.480 Clare.26:34-9. Tenhula J. The design and analysis of clinical experiments. Letter to editor. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Fleiss J. Rose S. Latimer J. Kilby J.” Physiotherapy 1990. Nelson MA. 1986. Spine 2002. The measurement of observer agreement for categorical data. and side of symptoms in patients with low-back pain syndrome. Kilpikoski S. Airaksinen MD. Kramer JF. and Maher Reliability of Shift Detection 13. p. Yamada R.

Sign up to vote on this title
UsefulNot useful