You are on page 1of 30



Measurement Properties of the Neck
Disability Index: A Systematic Review
Downloaded from at University of Washington on December 1, 2014. For personal use only. No other uses without permission.

t is estimated that a third of all adults will experience neck pain However, the episodic, fluctuating nature

I during the course of 1 year,12 and 70% is the approximate lifetime
prevalence.6,28 About 19% of the population may suffer from chronic
neck pain at any given time,6 creating a substantial societal burden.
Monitoring outcomes is a key component of monitoring the effects of
evidence-based care, in justifying services, and in program evaluation.
of neck pain and the lack of clear and con-
sistent physiological findings complicate
this process. A fundamental component
of monitoring outcomes is having reliable
and valid tools with known measurement
properties.27 In the case of neck disorders,
self-reported pain and disability are usu-
TIJK:O:;I?=D0 Systematic review of clinical group differences or absolute reliability (standard ally the primary focus.
measurement. error of measurement [SEM] or minimum detect- Isolated studies on outcomes mea-
Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

TE8@;9J?L;0 To find and synthesize evidence
able change [MDC]). Most studies suggest that the sures provide a context and method-
NDI has acceptable reliability, although intraclass specific view of the measure’s ability to
on the psychometric properties and usefulness of
correlation coefficients (ICCs) range from 0.50 to
the neck disability index (NDI). provide useful clinical information. It is
0.98. Longer test intervals and the definition of sta-
T879A=HEKD:0 The NDI is the most commonly ble can influence reliability estimates. A number
only by synthesizing information from
used outcome measure for neck pain, and a of high-quality published (Korean, Dutch, Spanish, multiple studies that we can understand
synthesis of knowledge should provide a deeper French, Brazilian Portuguese) and commercially how a measurement tool performs across
understanding of its use and limitations. supported translations are available. The NDI is different contexts and applications. The
TC;J>E:I7D:C;7IKH;I0 Using a standard considered a 1-dimensional measure that can be synthesis of larger pools of data is a mech-
search strategy (1966 to September 2008) and interpreted as an interval scale. Some studies
anism to provide more stable estimates of
question these assumptions. The MDC is around
Journal of Orthopaedic & Sports Physical Therapy®

4 databases (Medline, CINAHL, Embase, and
5/50 for uncomplicated neck pain and up to 10/50 measurement errors and benchmarks for
PsychInfo), a structured search was conducted
and supplemented by web and hand searching. In for cervical radiculopathy. The reported clinically change/outcomes. In fact, 2 systematic
total, 37 published primary studies, 3 reviews, and important difference (CID) is inconsistent across reviews have been conducted that ad-
1 in-press paper were analyzed. Pairs of raters con- different studies ranging from 5/50 to 19/50. The dress self-report measures for neck pain.
ducted data extraction and critical appraisal using NDI is strongly correlated (0.70) to a number
Both compared different outcome mea-
structured tools. Ranking of quality and descriptive of similar indices and moderately related to both
physical and mental aspects of general health. sures using a semistructured process and
synthesis were performed.
concluded that the Neck Disability Index
TH;IKBJI0 Horizon estimation suggested the T9ED9BKI?ED0 The NDI has sufficient support
and usefulness to retain its current status as the (NDI) was the most commonly used self-
potential for 1 missed paper. The agreement
most commonly used self-report measure for report instrument for evaluating status
between raters on quality assessments was high
(kappa = 0.82). Half of the studies reached a qual- neck pain. More studies of CID in different clinical in neck pain clinical research.34,37 Both
ity level greater than 70%. Failures to report clear populations and the relationship to subjective/ reviews alluded to the psychometric and
work/function categories are required. J Orthop
psychometric objectives/hypotheses or to rational- clinical properties of the tools, but neither
ize the sample size were the most common design Sports Phys Ther 2009;39(5):400-417. doi:10.2519/
jospt.2009.2930 attempted to deal with specific properties
flaws. Studies often focused on less clinically
or to synthesize this knowledge. The de-
applicable properties, like construct validity or TA;OMEH:I0 cervical spine, outcome measure,
group reliability, than transferable data, like known reliability, validity veloper of the NDI published a summary
paper in 2008 summarizing a 17-year his-

Co-director, Hand and Upper Limb Centre Clinical Research Laboratory, St Joseph’s Health Centre, London, ON, Canada; Associate Professor, School of Rehabilitation Science,
McMaster University, Hamilton, ON, Canada; Funded by a New Investigator Award, Canadian Institutes of Health Research. 2 Lecturer and PhD Candidate, School of Physical
Therapy, The University of Western Ontario, London, ON, Canada; Funded by a Doctoral Fellowship, Canadian Institutes of Health Research. 3 Physiotherapy Student, School of
Physical Therapy, The University of Western Ontario, London, ON, Canada. 4 Emeritus Professor, Department of Clinical Epidemiology and Biostatistics, McMaster University,
Hamilton, ON, Canada. Address correspondence to Dr Joy MacDermid, Co-director, Hand and Upper Limb Centre Clinical Research Lab, Monsignor Roney Ambulatory Care
Centre, 930 Richmond Street, London, ON, N6A 3J4 Canada. E-mail:

400 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy

tory with the NDI.46 The author stated
that “The Neck Disability Index has been Search Strategy
used in over 300 publications, translated Key words: (NDI OR Neck Disability Index) AND (reliability OR validity OR
responsiveness OR clinically important difference OR MCID OR Rasch OR
into 22 languages…and endorsed for use factor analysis OR translation OR validation).
by a number of clinical practice guideline Dates: January 1966 to September 2008
committees…making it the most widely Other: Google searches and hand searches of retrieved study references lists
used and most strongly validated instru-
ment for assessing suffering disability in
patients with neck pain.” Located citations (n = 189):
Downloaded from at University of Washington on December 1, 2014. For personal use only. No other uses without permission.

While previous reviews have claimed D mbase (n = 66)
D "edline (n = 63)
to be systematic, none have included a
D INAHL (n = 66)
key element of systematic review, criti- D %sycInfo (n = 1)
cal appraisal of the quality of individual D and searching (n = 2)
studies. This may reflect inherent dif- D ontact author (n = 1)
ficulties in performing the critical ap-
praisal due to lack of instrumentation.
(3=5/ +,<=;+-=;/?3/@7 
The lead author of this current review
has published a scale and interpretation
guide25,26 for this purpose. Psychometric Unique articles accepted for full A-5>./.+=+,<=;+-=;/?3/@
studies are important to establish mea- review (n =  ): D #8.+=+8;78=;/5/?+7= unique)
Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

surement properties like the relative dif- D mbase (n = 30) D tudy design criteria (n = 0)
ficulty of items, appropriate grouping of D "edline (n = 28)
D #!7

items into subscales, reliability, validity, D and searching (n = 2)
and responsiveness. In addition to psy- Horizon estimation of database
D $ther (n = 1) </+;-2;/=;3/?+5
chometric properties, clinicians are con-
cerned with issues on feasibility, floor/
ceiling effects, availability of different Included: A-5>./.+0=/;0>55=/A=;/?3/@7 = 2):
language/cultural adaptations, and ad- D tructured reviews (data No data on NDI (n = 2)
ministration burden for themselves and extraction) (n = 3) !+71>+1/

  7  the psychometric issues but also need in. as we prefer. “usefulness” have been used in reference to these practical considerations.” or. When thera- pists try to integrate the NDI into their Quality summary of appraised primary papers (n = 36): %88.+3<+5736) kept for data extraction plicability. The purpose of */.+-=7877153<2=/<= their patients. 88.” or “clinical utility/ap.  7   formation on usefulness.  7   clinical practice.B188. they are concerned with +3.<=.3=3-+5+99. D >55=/A=-. Terms like “clinician/pa- D ata extraction (n = 38) excluded from critical appraisal tient friendliness.  7 .@3=27153<2 D %rimary studies Journal of Orthopaedic & Sports Physical Therapy® +.

47 Some evaluators choose to provide a score out of 100%.The systematic review evidence flowchart. In the end. items. this study was to conduct a systematic re. lifting. sleep.J>E:I sex life. <?=KH. personal care. resulting in the 10-item scale. each item is summed for a score varying L the NDI using the Oswestry Low Back Pain Index (OLBPI) as a tem- plate for identifying items and a scoring These 10 items were modified for clarity and relevance based on feedback from 5 individuals with a history of whiplash in- from 0 to 50. The consulting team added 4 a 6-point scale from 0 (no disability) to 5 :[l[befc[dje\j^[D:? items: headache. particularly as a strategy to deal with questions left metric. ing the psychometric properties and use- fulness of the NDI. driving. The numeric response for ernon and Mior47 developed and work. from the original scale: pain intensity.47 The questions are measured on ing team. and 5 new C. NDI contained 5 items from the OLBPI. They initially selected 6 items jury and the review team. and 2 of which were modified. (full disability). concentration. and submitted this to a consult.D:?N7). journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 401 . A-/55/7=( 7  view that would summarize the quality and content of current research regard. reading. the unanswered (7FF.

org at University of Washington on December 1. The most common flaws observed in appraisal). with agreement on indi. 97% yield).13 with additions from clusions and recommendations. The following keywords accompanying guide (7FF. and then Medline. PsychInfo.00.34. with the default compromise of as. gible studies: (NDI OR Neck Disability interpretation guide used to assess indi- Downloaded from www. Index) AND (reliability OR validity OR vidual study quality were developed by H. an adjudicator was required.). and in a previous psychometric review by sidered this ranking when making con- Embase. First.82. the rat. If the of the findings for psychometric proper- process. surement (SEM). Journal of Orthopaedic & Sports Physical Therapy® the following inclusion criteria: reported clarification of the meaning or inter.IKBJI responsiveness OR clinically important the lead author.) for studies crossed different populations. lish abstracts and non-English full text [1 3. abstract/title and full review are noted in vidual items varying from 0. Two neck disabil- by Foster and Goldsmith for SAS soft.26 All authors met for a calibra. studies provided specific conclusions or 402 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .26 The items from the difference OR MCID OR Rasch OR fac. esis/objectives.'). pairs of raters indepen. (). The data extraction form was de. PsychInfo did not contribute any infor- the <?=KH. with 37% of papers reaching or exceeding Dutch.'). We A database search was performed us."J78B. the overall estimated search strategy would have been Embase studies retrieved and the net results of kappa was 0. and on at least 1 psychometric property of the pretation of specific items of the scale addressed different psychometric prop- NDI in patients with neck pain and was if they felt this was contributing to erties.(. Critical appraisal forms and associated source document. All rights reserved. CINAHL. 4.I) through 5. a recent compre- The first stage of study identification ment) were resolved by recheck of the hensive review of the NDI by the develop- was of title/abstracts. denominators for different studies. We used the following consensus process mation after these 3 databases.14 most common source of disagree. conclusions.M. or the extent of compliance with the ity instrument reviews were identified. we conducted Google searches and hand searches of retrieved associated ratings for individual studies. original manuscript. Raters then discussed their percep. (2) inadequate sample review (<?=KH. no meta-analyses terpretation of critical appraisal items. stract data extraction but not full critical manuscript complied with item re. and 1 unpublished documented in the manuscript it was reporting specific psychometric hypoth- in press manuscript proceeded to a full treated as “not done”). which included papers written Euchaute et al. their disagreement. was variable. ranging from 21% to 96%. but few were comprehensive. articles were missed and the efficiency of 1. Few articles using previously developed data follow-up and some psychometric stud. mation to evaluate the potential that any when independent assessments differed: In total. For personal use only. 0-2.43 to 1. who also developed an there was no formal mechanism to weight 2008 (<?=KH. although this was not required during properties evaluated. the study. based on the quality of the were used to search all databases for eli. An optimal study references lists. In addition.25. heterogeneity of study populations and each item to clarify the meaning and in. We conducted a Horizon esti. addressed at least 1 psychometric proper- our search using a procedure developed ment was based on factual content ty of the NDI (J78B. No other uses without permission. veloped by adaptation of categories used rank ordered studies on quality and con- ing Medline.. Quality of the individual studies written in French or English (2 with Eng. were performed. in which they independently resolution or continued to disagree by marized in J78B. 2014. er did not include formal critical appraisal pendently reviewed by at least 2 of the 2. 37 primary studies.EDB?D. [ LITERATURE REVIEW ] B_j[hWjkh[I[WhY^WdZIjkZo extraction forms and quality appraisal ies are cross-sectional. although between January 1966 and September the first author. appraisal.37 Similarly. first. 78B. interventions. The number of scores of rater pairs. Interrater reliability on quality ratings was calculated based on preconsensus J a possibility that our search strategy missed 1 article (95% confidence interval. quality tool are listed in J78B. disagree on an item by 1 point. if the raters continued to size calculations/justification.). At this point. of individual studies. raters clarified if the disagree. then CINAHL. and (3) The data extraction and review pro.jospt. and time intervals. Each paper’s score was converted into a spectrum of psychometric proper- dently evaluated an assigned subset of a percentage because 1 item was based on ties. 41 studies were identified that Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. In total. An article was accepted if it met thor/instrument developer (J.D:?N 8" 7L7?B. quirements (if something was not the psychometric articles were (1) not 3 structured reviews. with the he Horizon analysis14 indicated tor analysis OR translation OR valida- tion). raters were not comfortable with this ties across all identified studies is sum- tion review. leaving unequal ?Z[dj_ÓYWj_ed tools. which were inde. Factual content/oversights (the although neither included formal critical Charlie Goldsmith). Due to the reviewed 1 paper then met and discussed 2 points. Most studies addressed Following this.25. 1 Spanish] were included for ab.46 The primary authors. tions of the extent to which the study a score of 75% on the quality rating (J78B. absence of error estimates such as confi- cess was conducted by pairs of raters and ers decided if they were comfortable dence intervals or standard error of mea- was based on the use of structured instru. Pairs of raters consulted the first au. ware (full details available from author item. A descriptive synthesis ments and a predetermined consensus signing the lesser of the 2 scores.

42% II. responsiveness Intervention: PT 14 wk Retest interval: baseline evaluation and Site: PT Clinics after 5-7 visits Cleland et al 20089 Patients: stable. Intervention: ND Downloaded from www. symptom duration. chronic 23 Primary purpose to evaluate Intervention: 1 of 3 manual therapy uncomplicated NP another scale. mean age. group 2. mean age. 46 d (SRM and MDC) Included: age 18-60 y. construct validity Intervention: PT Included: patients with at least 3 mo of NP Retest interval: 7 d Bolton 20045 Patients: nonspecific NP 125 Responsiveness Intervention: chiropractic Site: 8 chiropractic clinics and 1 teaching clinic Retest interval: pre-Rx and 4-6 wk postinitial Rx Chan et al 20087 Included: patients with chronic nontraumatic NP with a 20 Construct validity Intervention: ND duration of greater than 3 mo Retest interval: Cross-sectional Excluded: cervical radiculopathy. and 20 had no NP but had other within 48 hours. 25% F. Intervention: manual therapy duration. varying severity and duration of WAD (level of WAD: 23% I. 43 y) 203 Brazilian Portuguese translation. and 6% III). and community rheumatology clinic into private PT practice journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 403 . 2014. or pending legal action regarding their NP Cook et al 200611 Patients: 62% M. NDI score greater than 10%. 71 Validity Intervention: none all patients were less than 24 mo since MVA. responsiveness exercise program Retest interval: 4 wk Goolkasian et al 200216 Patients: 12 M. Cleland et al 200610 Patients: 47% F. reliability. Excluded: any signs or symptoms consistent with a nonmusculoskeletal etiology. 83 d. validity Intervention: PT Site: an outpatient setting Retest interval: admission and Included: 2 cohorts. and chronic NP Retest interval: unknown Site: chiropractic college clinics in Vancouver and Toronto Included: English speaking. 51 y. Site: Spain responsiveness Aslan et al 20083 Patients: NP 88 Reliability. Unknown validity Hoving et al 200319 Patients: mean age. retest 19 had chronic NP. at University of Washington on December 1. 38 Reliability. validity. subacute. home consistency. NP. n = 71 Site: patients recruited from PT clinics. responsiveness Retest interval: 10 subjects retested randomly at 24 h and a separate 10 subjects retested at 7 d Gay et al 200715 Patients: 7 M. fied version) Retest interval: 5 occasions. 43 y. 17 y or older Heijmans et al 200218 Patients: chronic WAD 61 Dutch translation. 59% of patients were M. many occupations.' Summary of Studies Addressing Psychometrics of the NDI IjkZo FefkbWj_ed n Fhef[hj_[i. validity. Reliability. 16 F. symptom 138 Reliability (SEM and ICC). 39 y. 2 or more Journal of Orthopaedic & Sports Physical Therapy® signs consistent with nerve root compression. Intervention: ND Site: midsize hospital in southern Brazil validity. neck trauma Chok et al 20008 Patients: NP 46 Reliability. plus heat. GPs. construct validity. prior surgery to the cervical or thoracic spine. 43 y. 21 F. chronic NP 33 Responsiveness Intervention: group 1. 237 Validity Intervention: unknown acute. ankylosing spondylitis. reliability.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 1 Ackelman et al 2002 Patients: 59 Swedish patients (28 M. NP with or without referral of symptoms to the upper extremity or extremities. NDI: internal approaches. 50 y. mean age. responsiveness Intervention: PT patients were in the acute phase after a neck sprain. mean age. responsiveness Retest interval: after 1 session symptom duration. For personal use only. mean age. 40 y. rheumatoid arthritis. mean age. 60% F. twenty 59 (38 for modi.jospt. WAD within the past 6 wk. 18-89 y (mean. No other uses without permission. at 3 w and 3 mo musculoskeletal symptoms Andrade et al 20082 Patients: nonspecific or posttraumatic NP 48 Spanish translation. 31 F). 22 tested twice for reliability and 24 discharge that completed baseline and discharge assessments Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. central nervous system involvement. PT. All rights reserved. 42 y. J78B. aged. Site: PT clinics involved in a study of botulin toxin Included: received PT that they considered inadequate for NP Excluded: surgical candidates Retest interval: within 1 wk Hains et al 199817 Patients: 62% F.

68%. responsiveness Intervention: unknown variety of neck disorders. 23. 88% of patients 50 years or younger. ND late). infection Kovacs et al 200822 Patients: acute. NDI. 250) Convergent validity.' Summary of Studies Addressing Psychometrics of the NDI (continued) IjkZo FefkbWj_ed n Fhef[hj_[i. 1 year 33%. 54% F. 13 wk. 18-69 y. systemic disease. [ LITERATURE REVIEW ] J78B. validity. 66%. neck trauma. gradual onset 185 Iranian translation. aged 18-65 y Excluded: fracture dislocation. central nervous system Kumbhare et al 200523 Patients: the first group comprised of 81 patients from 241 (81 patients. bidity. 534 Convergent validity Intervention: anterior cervical thy. 6/10 Rebbeck et al 200736 Patients: 3 cohorts primary care insurance (early and (99. 183 CID Intervention: random allocation to PT. ness 67%. responsiveness Intervention: PT 5/wk for 3 wk Excluded: inflammatory arthritis. 146 (69 for retest) Validity. responsive. responsiveness Intervention: ND diagnosis of NP Retest interval: first and last visit or Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. pain. 48%. neurological injury. 18-55 y. prior history of chronic pain. No other uses without permission. 2 or more missing items on NDI Pietrobon et al 200234 Structured review of different neck disability scales NA Validity NA Pool et al 200735 Patients: 61% F. For personal use only. NDI score. include a content analysis at item level Riddle et al 199838 Patients: 64% F. Retest interval: 1-3 d without therapy Downloaded from www. nonspecific NP. F: 80%. relative Intervention: ND 78%. 26%. 2014. responsiveness Intervention: PT Ontario Canada who had grade II WAD. age. with a primary 180 Reliability. reliability. dura. adequate mental capacity within 4 wk of surgery decompression and fusion Sites: 23 clinics Excluded: revisions/tumors 404 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . validity. responsiveness Intervention: chiropractic Site: 8 chiropractic clinics and 1 teaching clinic Retest interval: postinitial Rx. Retest interval: none Site: primary care or PT clinics floor/ceiling. Spanish translation. NR. GP care 14. and chronic who visited their 221 (54 from pilot. Intervention: ND physician for NP plus 167) validity. mean age. validity. 6 wk of NP 102 Reliability. second 160 controls) Retest interval: follow-up varied group subjects from a convenience sample who did between 4-12 w not have NP and who were not being treated Site: tertiary outpatient clinic Lee et al 200624 Patients: 17 M. 26%. internal consistency. responsiveness Retest interval: 1-d retest and Site: 9 primary care and 12 specialty services from 9 re-evaluation at 14 d regions in Spain Excluded: functional illiteracy. Site: patients were chosen from 3 hospitals and 7 7th or 8th Rx clinics in 5 Korean cities McCarthy et al 200729 Patients: NP 160 (34 retest) Reliability. baseline VAS. 18 Resnick 200537 An overview of instruments in a structured review by NA Validity NA a single author and evaluator. tion 2-6 wk. 70%. All rights reserved. spine injury. Reliability. 7-12 wk. VA Excluded: under treatment for problems other than c-spine Scolasky et al 200739 Patients: NP cohort cervical radiculopathy or myelopa. 55/100 reliability. convergent validity. 31 F. at University of Washington on December 1. most common neck strain Retest interval: admission and or NP discharge Site: 1 of 4 clinics in Richmond. 134. 4-6 d post-Rx Kose et al 200721 Patients: 83% F. comor.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 20 Humphreys et al 2002 Patients: NA 125 Reliability. translation Intervention: routine Rx than 3 mo following whiplash injury from 10 different practices. 46 y. age. interpretability Journal of Orthopaedic & Sports Physical Therapy® Nieto et al 200833 Patients: Catalan-speaking. manual therapy. responsiveness Intervention: ND Site: Finland Retest interval: 1 and 3 y post-MVA Mousavi et al 200731 Patients: NP 3 mo 30%. employed: 80%. validity Intervention: various Site: spinal clinic Retest interval: 1-2 wk Miettinen et al 200430 Patients: NP who had been involved in an MVA 144 Validity.jospt. with subacute pain of less 150 Factor validity. surgery.

tertiary care teaching hospital and outpatient clinic) Abbreviations: CID. “mildly” 134 Reliability. responsiveness Intervention: 6-wk individualized disabled preinjury and presented for medical care submaximal exercise program. positive upper limb tension test. general practitioner. Neck Disability Index. UK. NA NA NA oper) that includes psychometrics. retest reli- Excluded: symptoms below the elbow. rheumatology department (ie. males. 31 F. M. validity Intervention: none 200253 Site: patients recruited from outpatient clinics. Numeric Pain Rat- ing Scale. RCT. Pain-Specific Functional Scale. contraindicated session weeks 5. NP. 2 follow-up phone calls no neck radiograph since MVA. baseline pain of 5-6/10. 31 Reliability. etc) Westaway et al 199850 Patients: Age. item consistency. intraclass correlation coefficient. PSFS. responsiveness Intervention: none longer than 6 wk.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 42 Stewart et al 2007 Patients: 17 M. tertiary care teaching hospital and outpatient clinic) Wlodyka-Demaille et al Patients: 43 F. United Kingdom. PT. reha. Retest interval: 24 h bilitation department. 6. numeric rating. low blood pressure. GP. and following 1-4 w of Rx by 1 certified Canadian orthopae- dic manipulative PT White et al 200451 Patients: chronic mechanical NP 133 Validity Intervention: in treatment RCT Site: recruited from PT and rheumatology waiting lists Retest interval: before Rx initiated. physiotherapy. All rights reserved. systemic or other major disorder Wlodyka-Demaille et al Patients: mean age. 2 sessions week 3. poor English Stratford et al 199943 Patients: NP 50 Reliability. Intervention: routine Rx Site: primary care in 3 rural centers as stable) internal consistency. 12-80 y. Site: referred by physician to PT 72 h. translation signs. nonchronic NP of musculoskel. responsiveness Intervention: ND 200452 Site: patients recruited from outpatient clinics. MVA. Downloaded from www. pain bother-someness. whiplash-associated disorder. neck pain. responsiveness Intervention: PT Site: 5 PT clinics in North America Retest interval: 1-3 wk Trouli et al 200844 Patients: NP 65 (43 classified Factor validity. Rx. neurological ability. prediction and effect sizes across treatment RCTs that used the NDI. at at 2 hospitals in UK 1 wk and 8 wk Excluded: undergoing litigation. not described.' Summary of Studies Addressing Psychometrics of the NDI (continued) IjkZo FefkbWj_ed n Fhef[hj_[i. standard error of measurement. vascular or neurological disorders. NPRS. SRM. No other uses without permission. with pain-free interval of at least 3 mo Retest interval: 1-wk follow-up Site: referred by GPs in Rotterdam and region Inclusion criteria: aged 18 and sufficient Dutch Exclusion criteria: specific cause of NP (ie. VA. 2. current PT neck Rx. 1 red flags. validity Intervention: none chronic nontraumatic Retest interval: 2 d Site: Clinic of Canadian Memorial Chiropractic College Journal of Orthopaedic & Sports Physical Therapy® Vos et al 200649 Patients: 119 F.jospt. WAD. responsiveness Intervention: routine Rx etal origin “cervical dysfunction” Retest interval: initial assessment. chronic WAD (>3 mo). a score of at least 20% on primary supervised by PT outcome measures: NPRS. Intervention: ND studies of n = 188 and the other n = 333 Rasch modeling of item. VAS. severe or greater depressive symptoms. NP 101 Reliability. 6 cases removed because of 2 missing items on NDI van der Velde et al45 Patients: 55% or more F in 2 cohorts from previous 521 Factor analysis. rehab Retest interval: 11 2 mo department. females. within 1 mo of MVA. 70% WAD in past 4-6 wk. F. NDI. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 405 . LBP. rheumatology department (ie. minimal detectable change. 28 M. clinically important difference. Site: different clinics items Vernon 200048 Combination of instrument review (NDI and other). 31 F. plus NA Content validity (comparison of NA some impairment sincerity of effort testing items) Vernon 200846 A structured review by a single author (the NDI devel. Virginia. standardized response mean. content validity. validity. ICC. 30% 48 Reliability. treatment. MDC. mean age. 68 at University of Washington on December 1. to exercise. For personal use only. visual analog scale. PSFS Retest interval: 3 sessions in week Excluded: previous neck surgery. randomized controlled trial. nerve root compromise. evalua- Included: pain for 5-6 wk. 49 y 71 Validity. recurrent acute NP lasting no 187 Reliability. 49 y. 2014. and tion of reduced 8-item scale and total NDI of 13-14 differential item functioning of Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 4. SEM. no formal critical appraisal of studies Vernon et al 199147 Patients: 17 M. validity. NR. J78B. known or suspected 1. ND. motor vehicle accident.

lWbkWj_ed9h_j[h_W IjkZo† ' ( ) * + . 5. Ap- propriate statistical error estimates. comprised of less clinically useful data. Measurement techniques were standardized. Appropriate statistics-point estimates. 9. Specific inclusion/exclusion criteria. Riddle et al 199838 2 2 2 2 0 0 2 1 2 2 1 2 75 Lee et al24 2 1 2 2 0 0 2 2 1 2 2 2 75 42 Stewart et al 2007 1 2 2 1 0 NA 2 2 2 2 1 1 73 Pool 200735 2 2 1 0 1 1 1 1 2 2 2 2 71 Kose et al 200721 1 2 1 2 1 2 1 1 2 2 1 1 71 Kovacs et al 200822 1 2 1 2 1 1 1 2 1 2 2 1 71 Hains et al 199817 2 1 2 1 0 NA 2 1 2 1 1 2 68 Ackelman et al 20021 1 2 1 2 0 2 1 2 2 2 0 1 67 Vos et al 200649 2 2 1 1 0 0 2 2 1 2 2 1 66 Journal of Orthopaedic & Sports Physical Therapy® Vernon et al 199147 2 1 1 2 0 2 1 1 2 2 0 2 66 52 Wlododycka et al 2 2 0 0 0 1 1 1 2 2 2 2 63 White et al51 1 2 1 1 0 1 1 1 2 2 1 2 63 Goolkasian et al 200216 1 1 2 1 0 1 1 1 2 1 2 2 63 Gay et al 200715 2 2 1 2 0 1 1 0 1 2 2 1 63 Nieto et al 200833 2 2 1 1 1 NA 1 1 2 2 0 1 61 Pietrobon et al 200234 2 0 1 2 1 0 2 NA 2 1 0 2 59 Mousavi et al 200731 2 1 1 2 1 0 1 0 1 2 1 1 59 Scolasky 200739 1 2 0 0 2 .jospt. The authors referenced specific procedures for administration. Thorough literature review to define the research question. ing NDI validation studies was typically into clinical practice but. 2014. The type of data collected dur. For personal use only. so no full critical appraisal. Sample size. Data were presented for each hypothesis. Quality scores were rated in content for the NDI-related methods. 12.37. / '& '' '( JejWb Downloaded from www. and interpretation of procedures. No other uses without permission.2.46 406 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . † Papers where NDI evaluations were performed while evaluating a different measure as the primary purpose. 4. 8. 0 0 1 1 1 0 36 Chok et al 20058 1 1 0 1 0 0 1 1 1 0 1 1 33 Miettenen et al 200430 0 2 0 0 0 0 0 0 1 0 1 1 21 Abbreviations: NA. 11. 2. Follow-up. 10. Specific hypotheses. because they were in non-English text. Appropri- ate scope of psychometric properties. - .46 Two papers had data extraction from the English abstract and study tables but no critical appraisal. [ LITERATURE REVIEW ] numbers that could readily be integrated to rely on generalities when making con.18 One paper had item comparison but no NDI data. 6. 2 2 1 1 0 0 50 Bolton 20045 1 1 0 1 0 0 2 1 2 1 1 2 50 7 Chan et al 2008 1 2 0 0 0 NA 1 1 2 2 0 1 43 Humphreys et al 200220 1 1 1 1 0 0 0 0 2 2 0 2 42 Rebbeck 200736 0 1 1 1 2 .( Quality of Studies on the Psychometric of the Neck Disability Index ?j[c. scoring. 7. at University of Washington on December 1. Cleland et a 20089 2 2 1 2 2 2 2 2 2 2 2 2 96 Van der Velde et al45 2 2 2 1 2 2 2 2 2 2 2 2 96 Kumbhare et al 200523 2 2 2 2 2 2 2 2 1 2 2 2 96 Westway et al 199850 2 2 2 2 0 2 2 1 2 2 2 2 88 Cook et al 200611 2 2 2 2 0 2 1 2 2 2 2 2 88 10 Cleland et al 2006 2 2 2 2 0 2 1 2 2 2 2 2 88 Trouli et al 2008 1 2 1 2 1 2 2 1 2 2 2 2 83 Hoving et al 200319 2 2 1 2 0 NA 2 2 2 2 1 2 82 Aslan et al 20083 2 2 1 2 0 1 2 2 2 2 1 2 79 Stratford et al 199943 2 1 1 1 1 2 2 1 2 2 2 2 79 McCarthy et al 200729 2 1 1 2 1 1 2 2 1 2 2 2 79 Wlodyka-Demaille et al 200253 1 2 1 2 1 2 1 1 2 2 1 2 75 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. tended clusions.34. 3. All rights reserved. J78B. Three structured reviews underwent data extraction but no critical appraisal. not applicable to paper. Valid conclusions and clinical recommendations. * Evaluation criteria: 1.

0. 0. It was suggested that ficient. Again.. SEM. (specific-to-neck) scales and 3 (nonspe- important limitations. versus more useful J78B. These authors concluded that 0. Eight English-language of this review suggested other scales had for the cervical spine published up un. or minimal de- MDC. 0. 7 d. but without use of coefficients 1 d.49 absolute measurement error (like SEMs.njhWYj[Z data for clinical comparisons. where reliability is assessed as mean difference be- group reliability. while changes in individual patients. whiplash- associated disorder.89 (acute)1 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Downloaded from www.96 Journal of Orthopaedic & Sports Physical Therapy® formal quality evaluation was performed.9024 databases.9329 Five standardized neck pain scales were 7-21 d. like correlations indicating construct convergent validity.9211 multiple raters or formal critical ap.6444 more often than more useful indicators of Patients classified as stable over 1 wk (GRC. 4. 5 with 90% CI in patients with neck pain48 IkccWhoe\Fh[l_eki MDC. ICC = 0. GRC. tween retests and the 2 standard deviation brackets around that difference.8621 view was based on a formal search of 2 3-7 d. SEM. global rating of change.7 with 90% CI43 IjhkYjkh[ZH[l_[mi Patients classified as stable over 1 wk (GRC. ci.90 in first cohort8 tional Disability Scale. No other uses without permission.94-0. the NDI had the strength of being most studied amongst these 3 and was at that time the only instrument revalidated in The results of a subsequent systematic vidual studies. or a formal process to syn- different study populations.941 properties.5  344 relation coefficients (ICCs). This re. and the North.6. standard error of the measurement. like known group differ- ences that could be used as comparative H[b_WX_b_jo :WjW. Similarly. 10. 3 to –3/7). 2014. The instruments (and year must be read to the patient and the increased number of neck pain scales (a published) are listed in J78B. ICC. 2 d. SEM. 7 d..971 strengths and weaknesses of the NDI 3 d.7844 Three papers were identified where the MDC. This item analysis indicated that the most journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 407 . All rights reserved. The authors review of studies on outcome measures thesize results. –0. minimal detectable change. The psy- Patient-Specific Functional Scale was total of 11 scales).22 3 of the scales were similar in terms of 0.5 in patients with neck pain35 tectable change [MDC]). 10. in patients with 6 wk of neck pain 0.66 in WAD49 authors performed a comprehensive structured review with use of a formal Test-retest reliability 1 d.9353 praisal. intraclass correlation coef- wick Park Scale. 1 d. 7 d. the content of instruments was compared parisons between patients are virtually formed.9049 hand searching of relevant journals. 0. patients in acute condition. 0. and 7 d.97 (n = 30/185)31 structure and psychometric properties: Patients classified as stable on GRC.99 (chronic) to 0. was reported SEM Patients classified as stable. 7-14 d. MDC. r = 0.92 (chronic) to 0. MDC. impossible. 0. 3 to –3/7). 1. cific validated for the neck) scales were that the Neck Pain and Disability Scale velopment is ongoing as it identified an reviewed. using GRC scores of 3 to –3/7. such as intraclass cor. 0. For personal use only. 0. r = 0.90 (acute)11 tation tracking using the citation index. r = 0. ments of systematic review were not per. 0. MDC.982. 1. These included til 2004 suggested that instrument de. reviewed in a qualitative manner.7350 when compared to other scales. 0. confidence interval. MDC Patients classified as stable (GRC 3 to –3/7). other ele.93 (acute)11 search for articles. Mean retest difference As per Altman and Bland technique.9 mean retest differences. 1-3 d.81-0.509 the at University of Washington on December 1. 8. r = 0. the Copenhagen Neck Func.) Summary of Reliability Properties information. MDC.951 in terms of content and measurement 90 d. –1. tant because it established the relative 2 d.jospt.”34 data extractors.983 ditional papers (up to year 2000). quality ratings for indi. No 7-14 d. k = 0. r = 0. 19.48 (chronic) to 0. Abbreviations: CI. The first review34 is impor. MEDLINE and CINAHL. in healthy controls.9443 identified and compared qualitatively 21 d. including use of multiple raters/ more quantitatively in a summary table.2 in patients with cervical radiculopathy10 MDC. 0.93 correspondence with experts to find ad. WAD. while databases chometric properties of each scale were considered “very sensitive to functional were used to identify papers. but com.

and sitting upright. Because of previous reports of a 1-factor solution.19. personal care. and lifting.53 š L7IZ_iWX_b_jo"F9II"C9II"DFGXWi[b_d["9IGXWi[b_d[ š L7IZ_iWX_b_jo"f^oi_YWbhWj_d]"ckiYb[ifWic"d[YaHEC"d[Yai[di_j_l_jo21 š =[d[hWb^[Wbj^"l_jWb_jo"ieY_Wb\kdYj_ed"f^oi_YWb\kdYj_ed31 š 9EI0fW_d_dd[Ya"Whci"f^oi_YWbiocfjeci"\kdYj_ed$FioY^ebe]_YWbZ_ijh[ii"^[Wbj^YWh[ki["39 a Patient-Specific Problem Elicitation Technique7. For personal use only.F9I"I<#). and frustration.1c_bZ"(-1ceZ[hWj["*&1i[l[h["**WdZl[hoi[l[h["-&$ š FW_d0defW_d"((1c_bZ")&1ceZ[hWj["). It was determined that 75% of patients identify problems with sleep.19.WdZl[hoi[l[h[".jospt.21L7IfW_d"9IG"DFG22 š L7IXeZ_bofW_d31 Correlation with other scales reported as moderate (0. concentration.1_cfehjWdjY^Wd]["/35 š H[Yel[h[Z".$+e\fWj_[dji^WZ_d_j_WbiYeh[im_j^_d'C:9Z_ijWdY[\hecj^[X[ijfeii_Xb[Wdim[h&h[l[Wb_d]deY[_b_d][÷[YjiWYYehZ_d]jeW'+Yh_j[h_ed$44 š '&%+&Wia[Z\ehYbWh_ÓYWj_edZkh_d]IfWd_i^WZWfjWj_ed$22 š CeijYeccedboc_ii_d]_j[ciedXWi[b_d[Wii[iic[djWdZ'mabWj[h_dYbkZ[Zb_\j_d]''"h[WZ_d]/"Zh_l_d]*+"h[Yh[Wj_ed'44 Internal consistency š 9edi_ij[djbo^_]^9hedXWY^Wbf^W0&$-&#&$/.0)53 š <WYjeh'"\kdYj_edWdZZ_iWX_b_jo_j[ci0(". role activity. missing matic neck pain. and sleeping. The correlation between these factors was 0. More than half of subjects identified difficulties with frustration.17 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.53 VAS pain.623 Construct. sleep disturbance. The second factor on functional disability included the items personal care.1i[l[h[".'22 š 7cekdje\Y^Wd][0'&$+"Yecfb[j[boh[Yel[h[Z1ckY^_cfhel[Z". [ LITERATURE REVIEW ] J78B.16.70 Journal of Orthopaedic & Sports Physical Therapy® š FI<I"50DFG"DF:I":H?"L7IWYj_l_jo#Y^hed_Y"L7IfW_dWdZWYj_l_jo"WYkj[$1.47 Factor validity Support for 1 factor š '\WYjehYedÓhc[Z. š '\WYjeh2 Support for 2 factors š <WYjehWdWboi_i0(\WYjehi[njhWYj[Z[_][dlWbk[i"1.66. work.e\_j[cic_ii_d]"Zh_l_d]ceije\j[db[\jc_ii_d]Zk[jedejWffb_YWXb[$21 š ?d?hWd"*(Z_ZdejWdim[hZh_l_d]"/h[WZ_d]$31 š Beea[Z\ehY[_b_d][÷[YjiXkjdejZ[j[Yj[Z$31 š . driving. convergent Correlations with other scales reported 0. and general exercise. analyses were completed to deter- mine differences between the 1. the latter being significantly better.17. concentration.1ib_]^jbo_cfhel[Z")$-1deY^Wd]["&$(. No other uses without permission. lifting. š 7kj^ehiki[ZWfWj_[dj_dj[hl_[mje[b_Y_jfheXb[ciYWbb[Zj^[fheXb[c[b_Y_jWj_edj[Y^d_gk[je_Z[dj_\o\kdYj_edWbfheXb[ci_dfWj_[djim_j^Y^hed_YdedjhWk# ness. work.51 and 0. driving.38.7 š D:?mWideji[di_j_l[jej^[_j[ciYedY[djhWj_ed'$*"'$'ehf[hiedWbYWh[&$-"&$. Individual problems ranked most important by subjects were driving. Sleep disturbance had items.njhWYj[Z Content (including š 9ecfWh_iede\_j[ciWYheiiD:?"DFG"WdZj^[9ef[d^W][dD[Ya<kdYj_edWb:_iWX_b_jo?dZ[n1_j[ciedWbb)0ib[[f_d]WdZh[WZ_d]1_j[cied(iYWb[i0fW_d" analyses of ques. and recreation. mobility."'&53 š <WYjeh("d[YafW_d_j[ci0'"*"+"/53 š <WYjeh'"fW_dWdZ_dj[h\[h[dY[m_j^Ye]d_j_l[\kdYj_ed_d]33 š <WYjeh("\kdYj_edWbZ_iWX_b_jo š Ki_d]WdeXb_gk[hejWj_edj[Y^d_gk["'\WYjehÇfW_dWdZ_dj[h\[h[dY[e\Ye]d_j_l[\kdYj_ed_d]"È[nfbW_d[Z*(e\j^[lWh_WdY[WdZYedjW_d[Zj^[\ebbem_d]_j[ci0 pain intensity.#.29. ceiling/floor the highest prevalence. gardening.70): š I<#). The highest mean severity scores were found Downloaded from www. housework.24.50.$19 š D:?Z_Zdej_dYbkZ[j^[[njh[c[ie\l[ho[Wioehl[hoZ_øYkbj_j[ci"cWdom[h[i_c_bWh_dZ_øYkbjoWdZj^[h[\eh[fej[dj_Wbboh[ZkdZWdj$24 š IeY_WbYedi[gk[dY[iWh[Wd_cfehjWdjfWhje\ikX`[YjÊii_jkWj_edj^Wj_idejc[dj_ed[Z_dD:?$1 š LWb_ZWj[Z\ehckbj_fb[eh_]_die\d[YafW_d"\kdYj_ed"WdZYb_d_YWbi_]diWdZiocfjeci$34 š . reading.and 2-factor solutions. cooking. driving. A scree plot confirmed a 2-factor solution as optimal. headaches. emotion. at University of Washington on December 1.*lWh_WdY[edÓhij\WYjeh11. Less common but mentioned by more than 30% of effects) patients were looking into cupboards. 2014. r for VAS test and retest groups of 0. 6 were included on the NDI. recreation48 tion. Of the 11 problems identified by most subjects. appropriate.43.C9I">7:I"DF7:"L7IfW_d"Y^hed_Y1. in depression. working overhead. headaches. known Detected differences between group differences š :_iWX_b_job[l[bi0deZ_iWX_b_jo"'.* Summary of Validity Properties LWb_Z_jo :WjW. and symptoms.15. Construct.$11. All rights reserved.30-0.

mobilization. Pain Coping Strategy Scale. appraisal. PCSS.37 from the MAPI website. This publication provided the Most recently. SF-36 MCS.46 This review includ. a summary of therapy. Neck Pain and Disability Score. common content items on neck disability ed a historical perspective.46 Abbreviations: CSQ. DRI. NPQ. 408 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Short-Form 36 mental component scale. the developer of the results from prognostic studies that use most extensive review of the NDI to date NDI published a comprehensive 17-year the NDI. mild disability. standing/running. 3041 š 7j(*ma"h[Yel[h[Z"'*1f[hi_ij[djfW_d"(. and driving. exercise. acupuncture. medication. MCSS. Hospital Anxiety and Depression Scale. NPAD. a list of translations available cervical pillow. a summary of ments. laser. and relaxation intensity. Patient-specific Functional Scale. VAS. Short-Form 36 physical component scale. scales were self-care/activities of daily liv. work. psychometric properties from 22 differ.32. visual analogue scale. PSFS. ROM. pain ent studies. moderate/severe disability. Multi- country Survey Study. Cervical Spine Outcomes Questionnaire. ing (ADL). and tables listing mean change but did not include independent critical review of the NDI. NPDS. SF-36 PCS. including manipulation. HADS. Neck Pain and Disability Scale. Northwick Park Neck Pain Questionnaire.8. 10-28. range of motion. scores in effect sizes with different treat. Disability Rating Index.

One group of authors set reliability9. demonstrated a ceiling effect when using Ackelman and Lindgren1 changed each inally.1 if more than 2 items were left objectively distinguish further decline in patients require support to understand unscored. cess retained the sound measurement 5 minutes to analyze the (Dutch) NDI. if a subject population. clinical studies to define patients who had Translation A number of papers have Spanish (Spain/US).org) to produce additional rived and no validation of these categories large high-quality studies indicated lower translations through standardized was performed. (Australian/UK/US). sure” and “social activities” had differ. In the same study. Nederland32 found that pa- ent meanings in French and American al43 reported that the time for patients to tients defined as recovered at 24 weeks Downloaded from www. NDI scores vary from 0 to 50. moderate disability. with a severe pathology scored 48 out English and all translated versions is swered. Polish.9 for scores that fall in the “severe disabil- a high rate of nonresponse in an Iranian provide a percentage score as a strategy to ity” interval.8) min. NDI. and nonresponse is more com. some authors1. They reached consensus on dis. amount of time is required. and none have crepancies in 4 areas. as well as ers identified that the driving item had of a total of 50. where 0 is over the spectrum of possible NDI scores. Cook et al11 questionnaire could be completed in the as they represent patients for whom ensured good readability of the Brazilian/ waiting room and thus did not add any pain and disability estimates may be Portuguese version of the NDI through additional time to the patient’s visit. no process was efficients above 0. Danish. Wlodyka-Demaille measured the time taken to complete the those with moderate to severe disability et al53 reported that the concepts of “lei. and conceptual considered “no activity limitation” and 50 Riddle et al38 reported that 6% of patients equivalence. IkccWhoe\Fh_cWhoIjkZ_[i French (Canadian/Switzerland/ validated. it may be difficult to deemed to be acceptable. However. described. is considered “complete disability.4 (6. Reliability There is considerable evi- lesser extent. recovered as having NDI scores of less addressed issues around readability. In of missing items.”47 Orig. as they may have attained a the items. Possible reasons translation methodology (although the “normal limit” of the NDI between 0 for variation in reliability between stud- psychometric testing was not performed). mon for task items like driving and.” Some authors report that it comprehension difficulties. reading. that is. The majority of authors report the NDI out both “low” or “normal” scores. For that reason. (). 5 to 14. near maximum score. All rights reserved.jospt.1. ac- These translations included English behind setting this benchmark was not tual difference in clinical subpopulations.30 Again. For personal use only. Italian. made by the developers on the handling al49 reported a floor-ceiling effect on the ability due to neck pain was of interest.10 (J78B. the score may not be valid. the subjects were not included. and cultural translations. Few studies have specifi- reports. Miettinen30 used a recovery al24 concluded that their translation pro. described for how these ratings were de. or minimum number of items regarding “personal care” and “con- Journal of Orthopaedic & Sports Physical Therapy® the Spanish version. However. no disability. in a number of studies.31 Overall readability in the account for questions that are left unan. 16% of patients had items required for validity. Finnish. whereas Hains et al17 and Vos et item of the tool to specify that only dis. those with mild disability as usually in the context of language and specifically address or state how they having scores of 5 to 14 (10%-28%). although some Lindgren. no explicit recommendations were the NDI. More recently. they provided complete the NDI was about 3 minutes. that it takes 8-10 minutes to complete and cutoff of 20/50 at 3 years. a committee who reviewed the translator be measurable. but these did has been suggested that if 3 or more items is difficult to observe change in scores at not appear to be related to educational are missing.46 the extreme ends of the scale. Administration Burden Few authors than 4 (8%). and definitions of “stable. cutoff. Norwegian. Sterling et al40 used data from Readability/Language and Cultural Germany).10 In the study by Ackelman and of 50 on the NDI. recent proqolid. ?dj[hfh[jWX_b_jo%IkX]hekf_d]e\ cally addressed this issue. where reliability co- The developer reported that he worked plete disability. and this cutoff has not been study process. 2014.34.” journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 409 .2 Oth. 25 to 34. patients defined this at University of Washington on December 1. mild disability. se. experiential equivalence. No other uses without permission.53 Through a series of utes (range. to a ing interpretation of NDI scores: 0 to 4. liability of the NDI in both acute and published in peer-reviewed literature. for level and reliability was excellent. Stratford et al43 commented that the effects have practical clinical relevance. Vernon and Mior47 offered the follow. and subsequently examples in these areas to make the while Wlodyka-Demaille et al53 reported defined a score of less than 15 as a recovery questionnaire more understandable for that it took a mean (SD) of 7. 15 dence to support appropriate retest re- Not all translations have been to 24. and greater than 35.38 For example. cultures. chronic populations. the methodology ies are random differences in samples. sion in the Korean version. including seman. com. however.90 have been reported with an independent organization (www. :_÷[h[dj_WbEkjYec[i specifically analyzed how the MDC varies tic equivalence. Lee et a single report of therapist burden stating 28 or greater. In contrast. Stratford et Conversely. There was only as having persistent pain had a score of translating and back translating. it centration. all agree that only a short as having scores of greater than 15 (30%). their status. had a score of 14 or less.18 Floor-Ceiling Effect Floor and ceiling properties of the original English ver. idiomatic equivalence. 1-60 minutes). vere disability.11 In the Swedish version. and 20 points. invalid and for whom changes may not Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.

I 9^Wd][_dD:?IYeh[i%+& pain (J78B. ES (improved cohort): 0.20.89 to 0. respec.1 2.50 or 0.04 1.4 8-10 Park Neck Pain Questionnaire.njhWYj[Z information on acuity and found test-re.92 Long term 1.46. 84% CI42 tration.25 –0.805 studies by Cleland et al9.67 a –3 to +3 global rating of change.66 ally high (0.80/0. the Neck Mobilization 7 Acupuncture 0.93. All rights reserved. For personal use only.19.38.49 De. the more common Overall 1.53 Lower was reli. many studies did not sepa.1.88 at 4-6 wk5 tively. 84% CI42 SRM (total cohort): 0. J78B. These Mean change –4. high-quality ES (total cohort): 0.)). 2014.16.90).19.92 7 Pain and Disability Score.5.10. making it difficult to deter.10 who reported Improved 0.53 It was also reported by Hains et adequate convergent validity has been detected between the Hospital Anxiety al17 that the single pain item or the total demonstrated for a variety of measures and Depression Scale and the NDI indi.87-1. 0.91 study sample.50.I IHC comes from Cleland et al. which included stable pa. *).2 points for their patient Stable –0.77 0. weaker correla.53 The Patient- Manipulation 1.38.98 on NDI versus 0.68). analogue scale (VAS) pain ratings. and an r of 0. 84% CI42 terval was 72 hours from initial adminis. The highest estimate .47.85-0.04 0.jospt. The developer suggests that   DFG3 MDC be 5 points.5/4.77 at University of Washington on December 1.25 0.10 suggested that Mehi[ KdY^Wd][Z ?cfhel[Z reliability may be lower than generally    FW_d%:_iWX_b_jo FW_d%:_iWX_b_jo FW_d%:_iWX_b_jo reported (ICCs of 0.72)9 with acute and chronic neck pain.60/0. 84% CI24 SRM: 1.0 12.55 0. Responsiveness test reliability varied from an r of Short term 0.30. [ LITERATURE REVIEW ] Some studies did not define these impor- tant factors. In a study of the Neck Pain and Disability Index response to botulin toxin Validity Construct convergent validity measured by Cohen d was 0.93 0. convergent validity status.04.77 the lowest comes from Voss et al.1. Improved 0.73 0.11.44 for patient and physician GRC16 has been established between the NDI and a variety of other questionnaires In a review of randomized controlled trials using the NDI46 that can be used for patients with neck . SRM (improved cohort): 1.50 Two well-powered.40 for former and 0.I IHC in a different order have been exception.49.03 1. tients’ perceived pain and psychological purported constructs (J78B. Only a few studies presented the SEM Change in group after 11 mo (no standardized Rx during gap) (compared   DF:?"DFG"L7IfW_d"L7I^WdZ_YWf"ceh[i_]d_ÓYWdjP value than all but or the MDC.50.16 tients with recurrent neck pain.I IHC . ES or SRM ES: 0.50.2/12.17 estimate for MDC is around 5/50 (10%).11. No other uses without permission. 0. the Northwick Exercise 0. 84% CI42 acute condition when the test-retest in. For example.10.83.3/–4.24.93 Journal of Orthopaedic & Sports Physical Therapy® an MDC of 10. For example. 84% CI24 ability reported for individuals with an ES (total cohort): 0. Response Burden.66 points for their Overall 0. and Disabil- Medication 5 ity Rating Index have the strongest cor- relation with the NDI.15/0.24/–0. The 410 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .51.73 to CID CID: 5 points43 CID: 7 points10 0.95 1. score of the NDI both predicted visual with related constructs. reported when the retest was based on a 7Ykj[ ?dikhWdY[BWj[ single occasion but administered in dif- ferent languages31 or presented questions .6 9-10 Specific Functional Scale.0 studies based their definition of stable on ES –0.53 Others provided some FioY^ec[j_YFhef[hj_[i :WjW.53 which is consistent with the similarity in their tions were observed with less similar in. Change24 ES SRM spite a range of 2 to 10.1. mine why variations in reliability exist.24. Deteriorated –0. cates that there is a link between the pa. specificity. Summary of Responsiveness.48. CID: 19 (sensitivity.67 –0.95.+ Cultural Adaptation. Although dices.99 in patients Downloaded from www.40 and 0.34 population with cervical radiculopathy. ES: 1. Administrative rate patients with acute or chronic neck pain.13 0.49 who Change42 ES SRM reported an MDC of 1. ICCs Change in 2 cohorts36 (similar to CWOM or FRS) Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.1-1.

0.5450 Correlation to PSFS.22 5 min (stated. r = 0. and items of “personal FioY^ec[j_Yfhef[hj_[i :WjW. VAS in 150 patients with whiplash injury and activity. the items Administrative (continued) of “work” and “driving” had the strongest correlation (0.77).33).njhWYj[Z care” and “headaches” had the weakest Longitudinal validity correlation Comparisons: correlation (0. Wlodyka-  m_j^ej^[hY^Wd][iYeh[i š =F.38. Similarly. Correlation of English and Turkish NDI the r for test and retest was 0.733 the NDI (n = 521 patients) was available Response burden Time to complete: 9 min. and in 7 min 59 s (1 min 26 s) by those with low one (p . Italian for Switzerland. All rights reserved. 0. 0.17 Conversely. Translated into Greek.49   š HE9Ykhl["&$-. –0.” suggesting that the   š 9ehh[bWj_edD:?Y^Wd][jeD8G&$. tent to which pain and disability are sepa- Dutch.proqoulid.21 4 min. MID 10. French Canadian.” This 1.601 found 2 factors that they termed “pain Correlation of NDI change to GRC change. Interitem corre- J78B.344 and interference with cognitive function- Correlation to clinicians prediction of change. total score (0. 90%.83 change scores50 ing” and “functional disability. French for Switzerland.95. For personal use only. English for the UK.jospt.   š F[Whiedr.45 “fulfilled in 6 min 08 s (54 s) by those with middle-high cultural level. lations perform similarly well. at University of Washington on December 1.5.WdZL7I&$+ scale may have 2 dimensions (pain and AUC. 70% Nieto et al33 conducted a factor analysis With COM. 0. specificity. Spanish for Spain. Norwe- gian.+ Cultural Adaptation. based on a manuscript now measured)29 in press that was shared by the English for Australia.88 tions about pain and disability treated as Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.76 Correlation to Disability Rating Index.31 Portugese-Brazilian. Finnish. Polish.21 Spanish. with input from the developer44 one-dimensional and may reflect the ex- Translations available on the MAPI website www.84).22 Iranian. Response Burden.839 function) in some subgroups. No other uses without permission.5 (pooled). Danish. 3. items are positively correlated with the Summary of Responsiveness. Italian.36 0."&$*&"/+9?53 Demaille et al53 extracted 2 domains in   š 7K9"&$-/"/+9?53 the French version. rate concepts across different pathologies German. sensitivity. not clear how/if for appraisal. SF-36 physical.66 and A highly powered Rasch analysis of 0. 0.$49 ity” and “Neck pain. Portuguese.73-0. VAS pain. Spanish for the United States and samples. English for the United States. 2014. ICC = 0. 0.11 French53 brief disability scales which include ques- Agreement between translated version 69% identical answers.88. 0. ROC.or 2-factor effect has been observed on other Cultural adaptation Language/cultural translation/validation Turkish.54-0. German for Switzerland. “Functional Disabil- Downloaded from www.

In particular. sional scale. CWOM. ROC. Dif- demonstrated that there is a higher cor. able to determine whether response op- ing characteristic. Overall. personal care. ferential item functioning was detected. NDI to the model and that 4 items had sions of the NDI. or nontraumatic injury. whether have supported the NDI as a 1-dimen. effect size. scales. disordered response thresholds. that the NDI items did not contribute to from a musculoskeletal or neural source. VAS. an acute population.001)”2 An advantage of this approach is that it Journal of Orthopaedic & Sports Physical Therapy® formally assesses whether an instrument Adminstrative burden 5 min to analyze18 satisfies rules for interval scale mea- Abbreviations: AUC. Items on changing the order of the questions does sional disability instrument for patients headaches and recreation had significant not change the validity of the tool.85. Short-Form 36 physical component scale. treatment. visual analog scale ably interpreted as numbers. 0. with neck pain. relation between the VAS and items on vides a relatively weak indication of struc. global perceived effect. operating characteristic curve. NDI. However. Cronbach’s B has gener. COM. receiver operat. GPE. core surement by examining fit with a Rasch outcome measure. which is consistent with on lifting. the content validity has also been evaluated using fac. this approach is Northwick Park Neck Pain Questionnaire. GRC. This study determined that there was a lack of fit of same study administered 7 different ver. Items man and Lindgren1 included individuals ally exceeded . and work also with acute and chronic conditions. tions are ordered and might be reason- standardized response mean. indicating that some items did not have the NDI pertaining to pain and activity in tural validity or dimensionality. NPQ. and values reported for other unidimensional demonstrated disordered thresholds. Ackel. clinically important difference. global rating of change. SF-36 physical. neck disability index. Rx. and concluded that The NDI was developed as a unidimen. Structural the same meaning across subgroups. deviation from model expectations. PSFS. Pain-Specific Functional Scale. CID. Unidimensionality tests showed that dif- of the NDI has demonstrated validity in tor analysis in a small number of studies ferent subsets of items gave significantly evaluation of pain and disability in acute with inconsistent findings. SRM. suggesting and chronic neck conditions. Neck Pain Disability Index. NPDI. model. ES. Some studies different person estimates. Hains et al17 observed that all on headaches did not fit with the trait journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 411 . this statistic alone pro. core whiplash outcome measure. while others suggest it has a single underlying construct. The item and whether stemming from a traumatic 2 factors.

10 researchers often assume that most un.. self-efficacy. it should be appreciated that an increasing number of factors may captured by other items. There is adequate evidence that the with longer test-retest intervals may be ference (CID) (J78B. For example.10. (Nonspecific Validated for the Neck) Scales on a single occasion. a CID intervals (0-3 days). Only a few studies presented more cal recommendations regarding its use. the absolute measurement error Recalibration of the disability experience varying from 0. estimates that reflect different definitions tervals it is more necessary to establish of stable. pose. ate self-reported changes are defined as clinically relevant indicators of measure- 412 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . most relevant. Conversely. However. In longer test-retest in. changes in Responsiveness The NDI appears to properties will vary according to pur. cal studies.23. ICCs English Language (Specific-to-Neck) and 3 reported when the test-retest was based J78B.jospt. Even if a patient’s severity of Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Thus. The relative calibrated against multiple internal and to measurements of neck pain intensity. re-evaluations. and how “sta- Neck Pain and Disability Visual Analog Scale 1999 ble” is defined within a given time win- Functional Rating Index 2001 dow. to 0. The reliability data cur- treated patients remain relatively stable rently published in the literature provide :?I9KII?ED over 1 to 3 days. This suggests Dehj^m_YaFWhaD[YaFW_dGk[ij_eddW_h['//* that lower test-retest reliability across Disability Rating Index 1994 longer test intervals may reflect changes in scores due to the episodic nature of Downloaded from www. factors like coping. in different clinical subgroups. mediated by psychosocial influences and item version and had similar correlations eletal or neurogenic origin. that a revised 8-item NDI would satisfy to strong evidence for a spectrum of psy. This is with different test-retest intervals allow- J his study synthesized current research in 37 studies addressing often based on a –3 to +3 score on global ing users to select estimates most similar the psychometric properties of the rating of change instrument. affect scores as test-retest intervals are that the NDI is not unidimensional. In short-term test-retest intervals estimating error across shorter clinical with neck pain of neural origin. patients. known group validity. when using ty is a plausible explanation for lower es- abling chronic whiplash-associated dis. 9[hl_YWbIf_d[EkjYec[iGk[ij_eddW_h[(&&( While it is important to understand short- M^_fbWi^:_iWX_b_joGk[ij_eddW_h[(&&* term and long-term stability of clinical measurements. early at University of Washington on December 1.90). All rights reserved. importance of different psychometric external references. pain or task difficulty remains consis- these requirements and provide interval. although it is less clinical research studies and reliability can be said to have occurred when there likely that patients defined as stable will studies. there is moderate extended. whereas shorter intervals may is a change of greater than 5 points. be appropriate for consideration when compared to a 7-point change in patients tervals. when using the NDI support may contribute to alterations in ering the typical reported effect sizes or to evaluate clinical change in individual perceived disability over a 5-week period. neck pain with symptoms of musculosk. reliability studies ies calculated the clinically important dif. Only a few stud. a form of test-retest intervals. patients with small to moder. would contribute to measurement error in clini- detect change over time. the level measurement of neck disability. have been ex- Neck Disability Index 1991 ceptionally high (0. +).60 when measured in the and responsiveness should be considered in the absence of changes in pain intensi- total cohort of subjects with “mildly” dis.43. the patient has remained stable. No other uses without permission. Some studies have used as much as Extended Aberdeen Spine Pain Scale 2001 a 5-week interval in patients defined as 8ekhd[cekj^D[YaGk[ij_eddW_h[(&&( stable to determine test-retest reliability. but able evidence.47 When neck NDI is stable over short test-retest time important to consider when designing pain is of musculoskeletal origin. be more important. it is impor- the improved cohort (J78B.+). For personal use only. Conversely.95 when measured in only disability. The the NDI in patients with acute or chronic experience of pain and disability will be 8-item NDI was consistent with the 10. the NDI to differentiate different levels of timates of reliability in studies with larger orders. chometric properties supporting use of tent over the course of several weeks. definition. Overall. tant to consider that these variables do indicates that the NDI has good ability to ences between known subgroups. with the NDI ad- Reviewed in a Previous Systematic Review 37 ministered in different languages31 or different question order. NDI and was able to provide some clini. [ LITERATURE REVIEW ] stable in this analysis.5. 2014.42 This discriminative validation that tests differ. They concluded within the limits prescribed by the avail. Patient-Specific Functional Scale 1998 Copenhagen Neck Functional Disability Scale 1998 neck pain. by to their situation and use. Thus.43 have stable scores in longer test-retest in. Therefore. Journal of Orthopaedic & Sports Physical Therapy® standardized response mean (SRM). or social have good responsiveness when consid.

although evidence review did not attempt to evaluate differ. there are no levels of evidence that It is an important consideration that formed head-to-head comparisons of create clear categories for study quality. Although the value of 5/50 or 10% commonly used cal presentation. a mechanism to place greater emphasis adaptation and knowledge translation. published translations used at ers. chometrics and usefulness by adapting the NDI itself should be balanced with as well as in subsequent translations. teria for synthesis process for psychomet. MDC tional Scale. more. However. estimates come from studies with lower struments. like the Patient-Specific Func. unclear.10 who reported an MDC of for confusion. The highest estimate comes from scoring variations can create potential sults from individual studies into global Cleland et al. If full-text papers written in only English or an MDC of 1. ently subjective elements. Although the first author of this review It is certainly true that no other in- ber of fully powered studies that vary on has addressed the critical appraisal of strument has undergone sufficient de- factors such as the type of intervention. For these reasons. Despite a range of 2 to 10. and around 5/50 (10%). and the for this purpose. In and expanding a framework used by oth. est was from Voss et al. clinicians should consider cal utility. in both its original English format.10. among studies. Neither previous systematic these variations are reflected on the NDI responsive. These include its brevity and is not surprising that the larger MDC supplementing the NDI with other in. No other uses without permission. like NDI is the most responsive neck scale.jospt. Ceiling/floor effects have recommendations. The reasons for this ing worsening or improvement. we rank ordered studies by cal practice and research requires tre- that another instrument was superior to quality to allow the reader and ourselves mendous efforts in translation/cultural the NDI in terms of responsiveness. synthesized. the SEM or the MDC. Despite variability. this consideration. comorbidity. change to different instruments or in stand. making it difficult to synthesize re- points. The quired to provide more precise estimates chometric evidence. Across the different studies. nature of the questionnaire. There was agreement across studies on the findings from high-quality studies. and the at University of Washington on December 1.25. We summarized the information on psy. therefore. 2014. both least some of the recommended proce. For personal use only. the NDI has study population. this cross-validation is well under way.4 Due to the brief must be taken within the context of the cient and the sample variability. This reviews only high-quality studies are research and practice. we were able to extract data from English large range of MDC values are largely tively. Further. so it these ranges. the NDI has a number of fea- deviations or lower reliability coefficients scale. the fact that it has been translated into a Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.2 points for their patient population been suggested as a potential problem. tural or language subgroups. estimates are highly variable review incorporated critical appraisal. tine evaluation of neck disability.50 which is able to sample ness to detection of clinical change. although pose. of 40 or higher and 10 or lower might as the majority of translation and valida- the more common estimate for MDC is be considered problematic for detect. 10. The most recent sug- general. there are inher- being expressed in unit of the original dures for valid translation and demon. making the scale more amenable to that can be applied across different cul- reliability estimates. All rights reserved. and pur- veloper of the NDI suggests that MDC is 5 minimal administrative burden. strated equivalence. any suggestions of that the NDI is easy to read and under. is important to have outcome measures interval and the definition of stability on thus. introducing new instruments into clini- different neck disability scales indicated Therefore. method to synthesize the extracted psy. varies across the subgroups and how Journal of Orthopaedic & Sports Physical Therapy® While the NDI has been shown to be ric studies. individual studies by developing a tool velopment or validation to replace the length of follow-up. In some systematic instrument is suitable for both clinical in different clinical circumstances. However. although either larger standard changes are relevant at these ends of the Overall. as study results score and based on the reliability coeffi. suggesting that a num. pain/disability experience in neck pain tend to occur over 2 weeks or less. ment error. respec. responses.66 for their study. One limitation of our review stems future studies may focus on whether the priate for most clinical comparisons that from the lack of agreed upon quality cri. although there is no consistent definition that the scope of our search retrieved Downloaded from www. A second limitation in our review is with cervical radiculopathy.49 who reported of what constitutes a ceiling or floor. none of the studies that per. then scores have a substantial impact on our results. was not an outcome measure. neck pain. the Patient-Specific Functional Scale. Evidence suggests that smaller abstracts in non-English text. which one assumes that the MDC is usually at 5 French. While we tried to make the process gestion is that an 8-item version of the journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 413 . It variations reflect the effects of test-retest items that are of high or low difficulty. as objective as possible. when evaluating and measurement principles suggest that ent instruments and. We would also suggest that within tures that suggest that it has good clini- can increase the value of the MDC. it is important to patients who score at the extremes may designed to comment on whether the see how the instrument performs across benefit from supplemental scales. in clinical situations appears to be appro. change for patients with this type of clini. The de. We don’t expect this limitation to included stable patients with recurrent points (and up to 10 points). number of languages and its responsive- ICCs. different contexts and purposes. interventions.26 there is no clear NDI as the instrument of choice in rou- nature of the neck condition will be re. tion articles were printed in English.

Published cultural validation has been as these are not usually published with state hypotheses that are to be evalu. mechanisms for these to be obtained ish. clinical contexts and in longitudinal be anticipated (2 weeks). Norwegian. suggested that removal of items might im. as Dutch.49. as well as those Scale should be considered. and in the literature. a ficiently validated nor occupational 414 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .1. tients respond to items of the NDI benefits. tion in score to demonstrate treatment gaps in defining clinically useful compar. original pilot testing was conducted on a official manuscripts related to the 4. commercial cian to seek out these tools by contacting 2. should be considered as approaching a and important difference for different ceiling/floor effect (45/5.BEFC. These sive in numerous patient populations. and up to 10 points might enhance performance. 6. Finnish. moves to improve and CID in different subgroups are for Australia. to the subjective categories reported current benchmarks set for normality whiplash-associated disorders. literature and provide more accurate Dutch. Because these are not ated.DJ answered questions must be divided An advantage of this suggestion is that E<J>. sites provide translations in English developers.24.48. most importantly. The MDC is accepted as 5 points. The CID is approximately 7 points. French. supplementation of the NDI would be important in assisting clinicians including patients with acute and with a Patient-Specific Functional in using the NDI to set short. French.L. variations visions to the 8-item NDI in different timeframe where this change could in scoring exist in the literature. veloped using a clinimetric process and by interested users when publishing and Spanish for the United States. as recommended by the developer.and long. Ital- this review suggests further investigation ity of their translations and provide ian for Switzerland. Studies that determine SEM. benefit clinical practice. French for Switzerland.23. Spanish for Spain. When the allow clinicians to provide more accurate 1. Defining detectable change on the NDI. are needed. at University of Washington on December 1. very small sample. French Cana- While the NDI may be considered the prognostication/outcome evaluation. thus. properties in different clinical groups ric was used. The NDI is reliable.D:7J?EDI<EH percentage score used to adjust for un- and can be considered as interval data. goals should require a minimum of 5 their relative priority.10. Others have return to work. like disorder I and II.:.D:? by 2 to restore the expected metric. cervical radiculopathy. guese. and Portu- script. Short-term therapy weighs pain and disability according to confidence to indicate “normal” ver. the with musculoskeletal dysfunction.9ECC. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. For example. chronic conditions. and points change for whiplash-associated the possibility that the addition of items targets for important milestones. and respon. baseline score is outside of the 10-to- prognosis and outcome evaluation. so or comparing scores as to whether scores need for this revision. published with the validation manu. No other uses without permission. Because the NDI was not de. Pol- is needed. 9ED9BKI?ED%A. 40 range. [ LITERATURE REVIEW ] NDI performs as well as the original NDI H.45 <KJKH. 3. 5. but should include and for levels of disability are arbitrary. Studies that determine NDI bench. MDC. languages: English. Clinicians may relate an NDI score effectively with payers. This leaves open sus different levels of disability. Future studies on the psychometric clinical reports to ascertain what met- a practical issue for clinicians is the dif. it is not clear whether translation process.49 tion should be exercised when presenting studies are required to determine the 5.11. cau. Qualitative studies and cognitive be set for a minimum 7-point reduc- Perhaps. Italian. Patients who score in the range of ei- the work on MDC and CID is sparse and self-reported disability as reflected ther 40 to 50 or 0 to 10 on the NDI inconsistent. Downloaded from www. respectively). well as the expected outcome.jospt. and to communicate more suffering from neck pain associated 7. for whiplash-associated disorder III. Authors should consider the availabil.53 In addition. longer-term treatment goals should are out of 50 points or 100%. Portuguese. However. valid. clinical subgroups and the impact of dif. Iranian. it is easier to change to a reduced-item Caution should be used when reading instrument than a different one.43 ative data and benchmarks. All rights reserved. there are interviewing that evaluate how pa. German for Switzerland. this should be re-evaluated within a Journal of Orthopaedic & Sports Physical Therapy® prove performance. 2014.OFE?DJI making it difficult to detect subsequent ferent prognostic variables on these would worsening or improvement. “gold standard” among outcome measures. English for the United the accessibility of translations would needed to resolve confusion in the States. ficulty of obtaining official translations. German.45 Currently. The NDI should be scored out of 50 ability benchmarks have not been suf- establish valid benchmarks. but varies considerably across stud- the NDI captures all of the important 4. metric property being analyzed. Studies that compare the proposed re. performed for the NDI in the following validation studies. English for the UK. For personal use only.43 a statement reflecting that these dis- and specific data analyses are needed to 2.10.1. term goals. dian. would inform our understanding of 6. are warranted and should specifically 3. ies and may be up to 10 for cervical concepts for patients with neck pain or marks that clinicians could use with radiculopathy. it is the responsibility of the clini. Danish. Finally. Korean. 1. including the specific psycho. Importantly.1.

Chiro Med. Al- '.doi.07. cultural adaptation and validation of the Brazil.doi. Risk guide.90914. Stratford P. Pain. )'$ Mousavi SJ. et al. Psychometric proper. http://dx.75367. Thorofare. Maher CG. Prevalence.53069. Nicholson LL. 5.doi. http://dx. 2004.32:3047-3051. 15 brs.29:2410-2417.16:2111-2117. Atamaz F. (/$ McCarthy MJ. non-traumatic neck pain. Elvers JWH. J '&$ Cleland JA.brs.17:165-173.1186/1471-2474-9-42 ). 2007. evaluation form. Waalen J.$ Heijmans WPG.doi.39:358-362.02.08. Karaduwman A. Parkinson WL. Montazeri A. Hobbs H.jospt.33:E362-365. dx. those with mild assessed instruments for measuring chronic J Hand Ther. Royuela A. MacDermid JC.0000201241. 1994. Sterling et al41 suggest ')$ Eechaute C.doi.85:496-501.33:E630-635. http://dx.1016/j. Leino E. Bolton JE. 2004. http:// 2004. thy. Bolton JE. Downloaded from www. http://dx. Vaes P.doi.2340/16501977-0060 Vet HC.1097/ http://dx. http://dx. 2002. J Whiplash Rel )*$ Pietrobon R. 2008.2007.1097/01. Spine.doi. 2008:387-388.1097/01. A “normal” score of between 0 to 20 org/10. http://dx.27:801-807. Huguet A. et  )$ Aslan E.$ Chok B. Use of generic versus Disability Index and patient specific functional metric testing of Korean language versions region-specific functional status measures on journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 415 . general population.doi. Coeytaux RR. Turkish patients with neck pain. 1991. Stratford PW. J Rehabil 2003. Documenting out.doi. plete disability. and its validity compared with 8ekhd[cekj^Gk[ij_eddW_h[_dWiWcfb[e\ the short form-36 health survey questionnaire. ated with chronic. disability associated with whiplash-associated whiplash patients: usefulness of the neck dis- ing clinically significant improvement. consequences for clinical decision making. Sensitivity and specificity of outcome problem elicitation technique for measuring ))$ Nieto R. Aras B. The clinimetric qualities of patient.2008. No other uses without permission.0b013e31817144e1 patients with chronic whiplash.doi.http://dx.doi. http:// BRS. Guidelines for the process of cross-cultural '/$ Hoving JL. Comparison G. 2005. Sievers K. Development and psycho.30 Vernon and Mior48 suggest ian Portuguese version of the Neck Disability M. 2008:389-392. Yakut Y. construct validity. Neck disability index Dutch 2007. 2002. Madson TJ. reliability Physiol Ther. Horizon Estimation and consequences of chronic neck pain in Fin- Using SAS Software.31:1841-1845. Wheeler AH. Lewis M.89:69-74. Bago J. Med Clin (Barc). Van Aerschot L.1080/09638280400020615 for cervical spine pathology: a narrative review. 2002.  ($ Andrade Ortega JA.0000257595. http://dx.1097/01. 2000.1186/1471-2474-8-6 Knekt P. Northwick Park neck pain questionnaire. In: Law M. Nederlands Tijd. '$ Ackelman BH. Ijzerman MJ. Niere KR. 2001.30:259-262. The reliability and construct validity of the Neck Halaki M.005 Musculoskelet Disord. 25 and 34 se. (-$ MacDermid JC.29:E47-51. Disabil org/10. of 4 neck pain and disability questionnaires. Refshauge KM. 2007. T '*$ Foster GA. Sand T. MacDermid JC. Bu.doi. Spine. 2002. Bouter LM. 2006. on outcome measures to hand therapy practice. Rehabil. 2006.32:696-702.126 org/10. for psychometric articles. [Validation of a Spanish version pain and disability scale: test-retest reliability and naires to predict long-term health problems of the Neck Disability Index]. O’Leary EF. All rights reserved.1:5-22. Eur Spine J. Iranian versions of the Neck Disability Index and validity of neck disability index in patients '.134:1356-1367. Gomez E. J Manipulative al.1097/ Physiotherapy Singapore.jmpt.$ MacDermid (+$ MacDermid JC. bil Med. Psycho. Simsek ties of the neck disability index. 2007. J Reha.112:94-99. In: Law tinen et al. Schrader H. Minimal clinically important change of math. Spine. Carey TS. Standard scales for measure-  -$ Chan Ci En M.doi. Lindgren U. For personal use only. BMC Musculoskelet Disord. 2008.1016/j.005 (($ Kovacs FM. interpretation and 24 moderate disability. Spine.21:75-80.0b013e31815ce6dd BRS. 5 and 14 mild disability. BMC org/10. Stewart metric properties of the Neck Disability Index ()$ Kumbhare DA. Spine.4:113-134. Papageorgiou AC.2004. Aromaa A. The reliability of the Vernon and Mior neck of the Neck Disability Index and the Neck disability index. Validity ('$ Kose G. )+$ Pool JJ.jht. et al.D9. Critical appraisal of study quality disability. those with moderate to severe disability org/10. org/10. Clair DA. DeVellis RF. M. Spine.93:317-325. Evidence-Based Rehabili- Index and Neck Pain and Disability Scale. Disorders. org/10.35035. mechanical neck pain. 2004. Guillemin F. 2007. Adams RD. Grevitt MP. Subjective outcome assessments apmr. Applying evidence that individuals who have recovered have Duquet W.a5 points. San Antonio.$ Riddle DL.52 2008. Hermens HJ. demands considered. Palmer JA. Spine.102:273-281. Spine. Lindgren KA. and ankle instability: a systematic review. )($ Nederhand MJ. http://dx. 2002. NJ: Slack Inc. Disability in subacute measures in patients with neck pain: detect.27:515-522.03. The cultural adaptation. (. 2008.32:E825-831. with neck pain: a Turkish version study. (&$ Humphreys H. Schipholt HJA. Predictive value of fear avoid- MB.16 quality for psychometric articles. 1998.3:16-19. Clin J Spine.” has been suggested by Miet. Delgado Martinez AD. chbinder R. of a modified version of the neck disability in- Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.brs.1097/BRS. Goldsmith CH. determinants. Measurement of cervical flexor whiplash. Richardson JK. which represents “none to mild ''$ Cook C. 2007. ability index. WJ. Zilvold G. Richardson general population. Silcocks P. Spine. Gretz SS. and the Neck Pain and Disability Scale.34:284-287. Ostelo RW.2007. DC.doi.$ Bovim G. discussion 2418. Ferraz schrift Voor Fysiotherpie. (.18:245-250. Critical appraisal of study no disability. Oder dx. 2008. http://dx. Mior S.doi. 2008. scale in patients with cervical radiculopa. Yagly N. of instruments to measure neck pain disability. http://dx.0b013e31815cf75b  /$ Cleland JA. The neck The possibility to use simple validated question- mecija Ruiz R. Childs JD. and Phys Med Rehabil. )-$ Resnick DN.1097/ version (NDI-DV): investigation of reliability in BRS.1016/ SAS Global Forum 2008. land. Validity of the neck disability index. disorders. Translation and validation study of the IE.doi. ance in developing chronic neck pain disability: adaptation of self-report measures. Oostendorp RAB. and greater than 35 com. http://dx. Spine. factors for neck pain: a longitudinal study in the Based Rehabilitation. TX: 2008. Braga L. Arch Phys Med Rehabil.22 org/10. '($ Croft PR. Asman S. ).31:1621-1627.25:3186-3191. Validity and reliability patients with chronic uncomplicated neck pain. Arch 2000. de Man Ther. Cross. Balsor B. NJ: Slack Inc.0000221989. Parnianpour M. Miro J.I '+$ Gay RE. Whitman JM. The reliability and application metric characteristics of the Spanish version Rating Scale for patients with neck pain. Neck pain in the comes in neck pain patients. 2005.H.1197/j. that a score between 0 and 4 represents 2006. et al. Disability Scale for measuring disability associ. endurance following whiplash.doi. Maher CG. Thorofare.31:598-602.<. Cieslak KR. Spine.0b013e31817eb836  . after whiplash injury.$ Rebbeck TJ. http://dx. Am J Epidemiol. Hoving JL.$ Makela M. disability have a score of 10 to 28. Green S.brs.19:1307-1309. Impivaara O.$ Goolkasian P. 2014. Turk Journal of Orthopaedic & Sports Physical Therapy®  *$ Beaton DE. Evidence- vere disability.009 )&$ Miettinen T.1007/s00586-007-0503-y dex. a NDI score of 8 or less. 2007. http://dx. Heliovaara M. the Neck Disability Index and the Numerical  . Airaksinen O. Bombardier C. Hepguler S. Evaluation of the core outcome measure in and Numeric Pain Rating Scale in patients with et al. Whitman JM.9:42.0000227268. Psycho.  '-$ Hains F. have a score greater than 30. A ment of functional outcome for cervical pain of the Neck Disability Index and Neck Pain and comparison of four disability scales for or dysfunction: a systematic review. J Manipulative Physiol at University of Washington on December 1. Spine. Edmondston SJ. Bae SS. (*$ Lee H. eds. Fritz JM. Pain. of the Neck Disability Index in physiotherapy.

Bogduk study of reliability and validity. Rasch analysis provides +'$ White P. Arthritis Rheum. Kakavelakis KN.doi. Physical and new insights into the measurement properties for neck pain: validation of a new outcome mea- psychological factors maintain long-term of the neck disability index. Eur Spine J. All rights reserved. Albert TJ.12598. Jull G. The abil- *'$ Sterling M. ability measures for chronic whiplash. The Neck Disability Index: state-of. the-art. pain.1016/ Koes BW. http://dx. Lionis CD.doi. 4. predictive capacity post-whiplash injury. Assessment of self-rated dis.27:331-338.doi. Evaluate the RFC experience and receive a personalized certificate of continuing education credits.jbspin. 2. 3rd. Char. Hogg-Johnston S. The chometric properties of the Cervical Spine Greek version in a sample of neck pain patients. Choose an article to study and when ready.122:102-108.0000105535.004 org/10. Poiraudeau S. For personal use only. No other uses without permission. Westaway MD. and the Journal website maintains a history of the quizzes you have taken and the credits and certificates you have been awarded in the “My CEUs” section of your “My JOSPT” account.AE *-$ Vernon H. Translation 0119-7 )/$ Skolasky RL.04. http://dx.spinee. Physiother at University of Washington on December 1. http:// to standard assessment tools used in spine dx. impairment and sincerity of effort in ity scales for neck pain.1002/ Downloaded from www. [ LITERATURE REVIEW ] patients with cervical spine disorders.006 dx. Catanzariti *($ Stewart M. 2004. Nicholas M.2003. Spine.2006.005 Hurwitz E. 416 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .jospt. 2002. Maher CG. http:// orders. You may review all of your answers—including the questions you missed. The RFC program offers you 2 opportunities to pass the quiz.ORG EARN CEUs With JOSPT’s Read for Credit Program Journal of Orthopaedic & Sports Physical Therapy® JOSPT’s Read for Credit (RFC) program invites Journal readers to study and analyze selected JOSPT articles and successfully complete online quizzes about them for continuing education credit. Spadoni 2000. 2008. ing individual patients. http://dx. Spine.1016/j. art.7:174-179. Paganas AN.78:951-963. translation and validation of 3 functional disabil- 2007. 2008.2006.$ Vernon H. Catanzariti 2006. Go to Pain. ability. French N.9:106. http://dx. Beaton D.71:317-326.8:155-167.$ Vernon HT. Poiraudeau S. Using the Neck */$ Vos CJ. http://dx. *.doi. G. WWW. Refshauge KM. J Muscskel Pain. Tennant A. Kenardy J.1097/ patient-specific functional scale: validation of its EkjYec[iGk[ij_eddW_h[WdZ_jih[bWj_edi^_f BMC Musculoskelet Disord.doi.61:544-551. 1736. J Manipulative Physiol Ther. Revel M. Sports Phys Ther. Padfield B.6d whiplash-associated disorder. Prescott P. in general practice.31:491-502. Reliability and @ Disability Index to make decisions concern. 1991-2008.?D<EHC7J?ED Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. click “Take Exam” for that article. Jull Binkley JM. 2014. Spine. Psy. J Manipulative JF. 2004. org/10. Lewith G. 2004. Verhagen AP. Take the quiz.1016/j.1186/1471-2474-9-106 use in persons with neck dysfunction. 2006. Spine J.24399 1998. Riley LH. +($ Wlodyka-Demaille S. Phys **$ Trouli Vicenzino B. Arch Phys Med Rehabil. JF.doi.JOSPT. sure. of the Neck Disability Index and validation of the +&$ Westaway MD. Revel M.1007/s00586-006- Ther. 1991.doi. 2007. Stratford PW. http://dx. and click the link in the “Read for Credit” box in the right-hand column of the home page.08. Antono. J Orthop research. 3. Riddle DL. You receive 0.014 *. poulou MD. Fermanian J. Fermanian J.14:409-415.2 CEUs for each quiz passed. 2009. Mior S. *)$ Stratford PW.doi. Login and pay for the quiz by credit card. Rannou F. Responsiveness of pain and dis.71056.BRS.32:580-585. brs.29:1923-1930.01. The Neck Disability Index: a +)$ Wlodyka-Demaille S.51:107-112. The core outcomes *&$ Sterling M. *+$ van der Velde G. Vernon HT. Kenardy J. To participate in the program: 1.83:376-382. ity to change of three questionnaires for neck acterization of acute whiplash-associated dis. 1998.0000256380. responsiveness of the Dutch version of the Neck CEH.2008. http://dx. Binkley JM. Physiol Ther.doi.1097/01. Rannou F.1016/j. Joint Bone Spine. Disability Index in patients with acute neck pain 1999.

Q The pain is very mild at the moment. ie. but no more. Q I can lift heavy weights. Q The pain is very severe at the moment. Q I have no neck pain at the moment.jospt. Dr. Q The pain is fairly severe at the moment. but no more. I wash with difficulty and stay in bed. Q I can hardly do recreational activities due to neck at University of Washington on December 1. 2014. Q My sleep is greatly disturbed for up to 3-5 hours. but it causes extra neck pain. Q I need help every day in most aspects of self -care. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 417 . Q I have a great deal of difficulty concentrating. Q My sleep is mildly disturbed for up to 1-2 hours. Q I have some neck pain with all recreational activities. Section 3: Lifting Q I can’t drive my car at all because of neck pain. Q I have a lot of difficulty concentrating. Section 4: Work Section 10: Recreation Q I can do as much work as I want. Q I can lift only very light weights. Q I can look after myself normally without causing extra neck pain. No other uses without permission. please contact Q I have moderate headaches that come frequently. and I am slow and careful Q I can drive my car without neck pain. Downloaded from www. Q I have moderate headaches that come infrequently. Section 6: Concentration fects your ability to manage everyday-life activities. Section 5: Headaches Patient Name _______________________________________ Date _____________ Q I have no headaches at all. Journal of Orthopaedic & Sports Physical Therapy® they are conveniently positioned Q I can’t read as much as I want because of moderate neck pain. one box that applies to you. Q My sleep is slightly disturbed for less than 1 hour. Q My sleep is moderately disturbed for up to 2-3 hours. Q I can drive my car with only slight neck pain. Q I can lift heavy weights without causing extra neck pain. but it gives me extra neck pain. Q I can look after myself normally. Q I can read as much as I want with slight neck pain. Q I can hardly drive at all because of severe neck pain. APPENDIX A NECK DISABILITY INDEX This questionnaire is designed to help us better understand how your neck pain af. Q I can’t read at all. Q I can only do my usual work. Q Neck pain prevents me from lifting heavy weights. Q I have neck pain with most recreational activities. Section 1: Pain Intensity Q I can’t concentrate at Q I have severe headaches that come frequently. Please mark the box that most closely describes your Q I have a fair degree of difficulty concentrating. Score __________ [50] Q I have slight headaches that come infrequently. Section 7: Sleeping Q The pain is moderate at the moment. 1991. although you may consider that two of the statements in Q I can concentrate fully with slight difficulty. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Q I can’t do any recreational activities due to neck pain. Copyright: Vernon H & Hagino C. are conveniently positioned. Section 9: Reading Q Neck pain prevents me from lifting heavy weights off the floor but I can manage if items Q I can read as much as I want with no neck pain. Q I can’t do any work at all. Section 2: Personal Care Q My sleep is completely disturbed for up to 5-7 hours. Section 8: Driving Q It is painful to look after myself. Q I do not get dressed. Q I can hardly do any work at all. All rights reserved. For permission to use the NDI. Howard Vernon at hvernon@cmcc. but I can manage light weights if Q I can read as much as I want with moderate neck pain. Q I have some neck pain with a few recreational activities. Q I cannot lift or carry anything at all. Q I have no neck pain during all recreational activities. Q I can drive as long as I want with moderate neck pain. present-day situation. Q I can’t drive as long as I want because of moderate neck pain. Please mark in each section the Q I can concentrate fully without difficulty. Q I have no trouble sleeping. any one section relate to you. Q I need some help but manage most of my personal care. Q I can’t do my usual work. Q I have headaches almost all the time. For personal use only. on a table. Q The pain is the worst imaginable at the moment. Q I can’t read as much as I want because of severe neck pain. Q I can do most of my usual work.

jospt. Data Extracted Population studied Population Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. To make data extraction as useful as possible and to avoid the need for repeated data extractions. Intervention Reliability Reliability (relative) Journal of Orthopaedic & Sports Physical Therapy® Reliability (absolute) Minimum detectable change (MDC) Content/structural validity Internal consistency Content validity Floor/ceiling effects Factorial validity Item response/Rasch analyses C1 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . [ LITERATURE REVIEW ] APPENDIX B DATA EXTRACTION FORM FOR STUDIES EVALUATING THE PSYCHOMETRICS OF OUTCOME MEASURES Authors: _______________________ Year: ____________ Rater: ___________ Downloaded from www. All rights reserved. 2014. read the accompanying guide and be as specific as possible when extracting information. No other uses without permission. Instructions When using the data extraction at University of Washington on December 1. it is important to realize that the purpose of data extraction is to remove or extract the specific information reported by authors within a study. not to evaluate the validity or value of that piece of information. For personal use only.

jospt. No other uses without permission. Construct/criterion validity Known groups Convergent Downloaded from www. 2014. Divergent Longitudinal validity Concurrent criterion Predictive criterion Responsiveness/clinical change Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Responsiveness Minimally clinical important difference (MCID) Usefulness/practicality Journal of Orthopaedic & Sports Physical Therapy® Readability Interpretability Time to administer Administration burden Cultural applicability © JC MacDermid 2008 journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C2 . For personal use only. All rights at University of Washington on December 1.

Evaluation of the quality of articles (also called critical appraisal) is performed in a separate step. No other uses without permission. it is important to realize that the purpose of data extraction is to remove or extract the specific information reported by authors within a study. The accompanying extraction form can then be used to collect specific information about these psychometric or utility properties from particular studies. demographics. not to evaluate the validity or value of that information. Some consider that sufficient time between tests is within 48 hours for acute conditions. frequency. and the follow-up interval Reliability Reliability description The extent to which the same results are obtained Test procedures or measures are typically reapplied on repeated administrations of the same measure when on repeated occasions in individuals considered to no change in status has occurred (reliability) or how have a stable condition during that timeframe in precise the scores are on repeated measurements which repeated testing occurs. OR by the same individual compare to the variability between rater (intrarater). Studies will not address every aspect. you may be able to conduct a meta-analysis of some properties such as minimal detectable change (MDC). Methods That Might Be Used to Collect This Property Information/Relevant Statistics Population studied Population A description of the study population Report meaningful demographics and indicators of the population studied: sample size. how and where subjects were chosen. Relative reliability evaluates the extent to may be performed on different occasions (test- which variations on repeated assessment of the same retest) for self-report measures. server-based scale is used. There is no clear or easy method to synthesize this information. Reliability (relative) The extent to which variability in test scores on repeated ICCs and their associated confidence intervals tests of the same person are related to variability overall (including between people) C3 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . you may elect to present the synthesized data in a descriptive way by creating a summary table in each category. and 4–14 days for chronic conditions. Repeated testing (agreement). Report the type of reliability evaluated and coefficients obtained. In some cases. SEM is used to pres- ent a quantitative estimate of the reliability in the original units of measure. Listed here are an explanation of each property Downloaded from www. 2014. the following guide has been devised to describe the separate properties that might be evaluated in a psychometric study. The most common statistic used is the intraclass correlation coefficient (ICC) for quantitative data and Kappa for nominal data. To make data extraction as useful as possible and to avoid the need for repeated data extractions. To this end. [ LITERATURE REVIEW ] DATA EXTRACTION GUIDE FOR STUDIES EVALUATING THE PSYCHOMETRICS OF OUTCOME MEASURES Instructions Psychometric studies may evaluate a wide spectrum of potential psychometric properties and aspects that relate to the utility or usability of outcome measures. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. it is useful to collect information within standardized domains so that this information might be more easily synthesized to provide a summary of our collective knowledge about psychometrics and the utility of any given outcome measure. clinically important difference (CID). it is best to be as specific as possible when extracting information. For personal use at University of Washington on December 1. In summarizing what is known about the psychometrics and utility of outcome measures. acute versus chron- ic. different test instruments (inter-instrument) are evaluated. Intervention Interventions (if applicable) applied for longitudinal Description of the nature. setting. Based on the findings of extraction. or different raters (interrater) if a individuals. When using the data extraction form. If you find some studies with similar designs. pathology/ disorder. or standard error of measurement (SEM). and intensity Journal of Orthopaedic & Sports Physical Therapy® studies of the intervention.jospt. All rights reserved. and how it might be measured within a study.

SEM (possibly expressed as coefficient of varia- no change in status has occurred (reliability) or how tion in older articles) precise the scores are on repeated measurements (agreement). During translation. Cognitive interviews were done with patients to determine the meaning of items. Report the number of factors derived and the extent to which these findings agree with the instrument structure or original factor structure. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C4 . listed. the completeness of the interest. so extract how these attributes were determined as well as the percentage. Extract what was done and what was found. Factorial validity The extent to which factor analysis supports assump. Expert panels or Delphi procedures were used Journal of Orthopaedic & Sports Physical Therapy® to select items or evaluate the validity of the instrument. it is important to consider the population the necessary content to capture the concept of to which the measure applies. No other uses without permission.jospt. 5. Note that different studies may define floor/ceiling differently. and limits of 2 standard deviations are shown) Downloaded from www. content. extent to which items on a given measure reflect tent validity. Content/structural validity Internal consistency The extent to which items on a test or subscale are Cronbach’s alpha is the inter-item correlation usu- related (an indication of the consistency of the concept ally reported. Reliability (absolute) The extent to which the same results are obtained on This may be reported as repeated administrations of the same measure when 1. Report alpha and whether it relates to measured) the entire instrument or specific subscales. Patients and experts were involved during item selection/reduction. Altman and Bland graphical technique where the ability in individual units portray the actual amount of difference on repeated tests for each individual is variation expected between repeated applications— plotted versus their mean score (and the overall reported in the original units of measure. MDC Based on the reliability and defined level of confidence. 4. 1. Content validity The extent to which the domain of interest is adequately A variety of techniques can be used to assess the sampled by the items in the measure. Some of the techniques typically found are Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Floor-ceiling effects When the measure fails to demonstrate a worse score in Descriptive statistics of the distribution of scores patients who clinically deteriorated and/or an improved were presented graphically or numerically and score in patients who clinically improved indicate at University of Washington on December 1.Factor analysis is conducted and compared to the tions surrounding constructs measured as defined by inherent structure of the instrument or factor analy- the measure or as indicated by subscale structure sis on which its construction is based. Specific quantitative estimates of reli. specific study of the meaning of the questions to another cultural or language group was studied. For personal use only. and emphasis. Patients were consulted for reading and comprehension. Report the percentage of patients who sustained a floor or ceiling effect. Extract the number and level of confidence this measure reflects the amount required to change before achieving a level of confidence that exceeds the random error that occurs in stable patients. 2014. 3. In assessing con. 2. the content’s relevance. 2. All rights reserved.

All rights reserved. Concurrent criterion Comparing a scale and its criterion at a single point in Extract the test names and correlations time assesses concurrent validity. Predictive validity is evaluated by determining the extent to which the results of administering an outcome measure at one point in time is associated with a future status or outcome. consider grouping into strong (>0. [ LITERATURE REVIEW ] Item response/ Rasch The extent to which items cross a range of difficulty. In other cases. their mean scores. Extract the test names and correlations. Results are validity where it is assumed they measure different evaluated to determine whether they are acceptable constructs. In thought to represent similar constructs or divergent each case. particularly if one is considered more accepted or rigorous. individual ability curves. the order of item placement in a test and/or the discriminative ability of specific items are defined. in accordance with the hypotheses. Constructed hypotheses “known groups” OR in relation to other indices that can assess convergent validity where measures are are similar (convergent) or different (divergent). it is sometimes argued that there is no criterion and.jospt. and level of tions that they should be different. Construct validity can be cross-sectional measured OR the expected differences between or longitudinal (predictive).70) correlations Divergent Relationship to scales/tests assumed unrelated Extract test names and correlations Longitudinal validity Relationship between change scores among similar Extract test names and correlations Journal of Orthopaedic & Sports Physical Therapy® scales/tests measured on different time points where change is expected Criterion validity description Criterion validation is determined by comparing a given Authors will state that their measure is being com- outcome measure to an accepted standard of measure. when summa- rizing. No other uses without permission. 2014. therefore. Report whether these analyses were conducted and if they determined that items crossed a range of difficulty. Construct/criterion validity Construct validity Constructs are artificial frameworks that are not The extent to which scores relate to other mea- description directly observable. or Items can be placed along a spectrum using item analyses a spectrum of the concept measured response theory or Rasch analysis.40–0. Known groups Known groups are constructed on theoretical assump. For personal use only. correlation or agreement between the measures. Concurrent validity is assessed by comparing a scale and its criterion at a single point in time. and the difference significance is assessed. Predictive criterion Predictive validity is evaluated by determining the Extract the test names and correlations and time extent to which the results of administering an outcome interval measure at 1 point in time is associated with a future status or outcome. differential item functioning (DIF)—report if done and whether items demonstrated DIF. the term comparability is used when scales measure the same subjective construct.Extract the groups. Convergent Relationship between similar scales/tests Extract test names and correlations.70) and moderate (0. hypotheses are formulated. Most commonly. pared against a specific instrument and report the For subjective constructs such as pain and at University of Washington on December 1. validation focuses on construct validity. C5 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. and comparison of ability estimation. Rasch analysis can be used to evaluate Downloaded from www. Analyses might address item difficulty. Construct validity is the extent to sures in a manner consistent with theoretically which measures perform according to a priori defined derived hypothesis concerning the domains that are constructs.

Comparative data for relevant subgroups (acute versus chronic. If this is not specifically reported in the text of the study. Hypotheses are formulated and results are in agreement. in patients’ management. and CID are 3 different terms for How clinically meaningful is defined may vary (global the same concept. Responsiveness/clinical change Responsiveness The ability to detect important change over time in the Extract indicators of responsiveness. standard response mean. patients perceive to be beneficial mandates a change minimally clinical important difference (MCID). For example. Information is provided about or CID and the method/cut-off used to define impor- what difference in score would be clinically meaningful. For personal use only.jospt. etc) 2. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C6 . Subjects are evaluated at size. rating of change with different methods of rating or establishing cutoffs for importance are often used). MCID. However. Authors State method and results obtained provide specific information on readability as evaluated by the target population or specific tools are applied to evaluate readability (eg. during a systematic review. Usefulness/practicality Readability The questionnaire is understandable by all patients. stable. This must be done in a systematic way and the method should be documented. tance. grade level can be assigned in some software Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. normal/abnormal Journal of Orthopaedic & Sports Physical Therapy® Time to administer Authors provide specific information on time to administer. improved (or have improved an important amount) is or worse. Authors analyze subgroup differences provide information on the interpretation of scores: 1. CID The difference in scores in the domain of interests that Extract the minimally important difference (MID). compared across different measures (and/or to patients who have not improved). MID. compare scoring algorithms and administrative burden and report this. No other uses without at University of Washington on December 1. Extract time Administration burden Ease of method used to calculate the questionnaire’s score. packages) Interpretability State the type of categorization and relevant scores The degree to which one can assign qualitative meaning to and. if appropriate. it cannot be extracted. Validation of subjective categories of outcome: excellent/ good/fair. All rights reserved. the method used to validate/ quantitative scores or define subgroups by scores. 2014. and the method for 2 points in time and change in patients who had assessing whether patients were improved. including effect concept being measured. multiple raters should evaluate the relative com- plexity of these scoring algorithms. Downloaded from www. Extract study data on scoring method burden Authors provide specific information on the ease of scoring. to extract the specific scoring algorithm for an instrument and compare it to others.

In cross-cultural adaptation. including a structured translation process that includes forward and backward translation and retesting Downloaded from www. Highlight the results of cross-cultural adaptation as well as any languages/cultural adaptations that have been reported. For personal use only. No other uses without permission.jospt. however. an annotation should be made to note that the psychometric property applies to an adapted version. 2014. Cross- cultural adaptation may be used to translate measures into different languages and to convert any items having specific meaning to culturally relevant items. they can be documented in the appropriate data collection boxes above. reliability of the translated at University of Washington on December 1. Journal of Orthopaedic & Sports Physical Therapy® C7 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . [ LITERATURE REVIEW ] Cultural applicability The extent to which an instrument contains items that will Extract language/cultural translation performed be meaningful across different cultural groups or subgroups (and relevant findings) for which the measure is relevant or may be applied. All rights reserved. a number of analyses may be performed. and a formal evalua- tion of the meaning of items in different cultures. If specific reliability and validity values are reported. © JC MacDermid 2008 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.

Was appropriate retention/follow-up obtained? (Studies involving retesting or follow-up only) Journal of Orthopaedic & Sports Physical Therapy® Measurements 7. 2014. Were appropriate statistical tests conducted to obtain point estimates of the psychometric property? 11. For personal use only. Documentation: Were specific descriptions provided or referenced that explain the measures and their correct application/interpretation (to a standard that would allow replication)? 8. Were analyses conducted for each specific hypothesis or purpose? 10. Standardized methods: Were administration and application of measurement techniques within the study standardized. Was an appropriate sample size used? 6. Was an appropriate scope of psychometric properties considered? 5. No other uses without permission. benchmark comparisons. All rights reserved. Study question 2 1 0 1. Were specific psychometric hypotheses identified? 4. Were appropriate inclusion/exclusion criteria defined? Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. minimally important difference [MID])? journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C8 . standard error of measurement [SEM].Were appropriate ancillary analyses done to describe properties beyond the point estimates (confidence intervals [CI]. and did they consider potential sources of error/misinterpretation? Analyses 9. 3. Was the relevant background research cited to define what is currently known about the psychometric properties of the measures under study and the need or potential contributions of the current research question? Study design 2. CRITICAL APPRAISAL OF STUDY DESIGN FOR PSYCHOMETRIC ARTICLES: EVALUATION FORM Authors: _______________________ Year: ____________ Rater: ___________ Evaluation Criteria Score Downloaded from at University of Washington on December 1.

© JC MacDermid 2008 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. then sum items. For personal use only.jospt. if for a specific paper an item is deemed at University of Washington on December 1. No other uses without permission. analysis. [ LITERATURE REVIEW ] Recommendations 12. Journal of Orthopaedic & Sports Physical Therapy® C9 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . or. 2014. and results? Subtotals (of columns 1 and 2) Downloaded from www. and multiply by 100 to get the percentage score. Were the conclusions/clinical recommendations supported by the study objectives. divide by 2 times the number of items. Total score % (sum of subtotals/24 × 100). All rights reserved.

and specific hypotheses on reli- ability and validity were not stated. 3 2 Authors identified specific hypotheses. For personal use only. For example. If there is no documentation of an action. No other uses without permission. internal consistency and 1 test/form of validity). and established predictive. Study design 2 2 Specific inclusion/exclusion criteria for the study were defined. 2014. 1 All of the above criteria were not fulfilled (little reference to previous research and present gaps in knowledge).jospt. eg. Information on the type of patients is briefly defined. Some aspects of both reliability and validity were examined concurrently. CRITICAL APPRAISAL OF STUDY DESIGN FOR PSYCHOMETRIC ARTICLES Interpretation Guide To decide which score to provide for each item on a quality checklist. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C10 . as well as expected relationships (strength Journal of Orthopaedic & Sports Physical Therapy® of associations) or constructs. longitudinal/concurrent. however. interrater. read the following descriptors. but not all information on participants and place is provided. but not clearly defined in terms of specific hypotheses. 1 Some. evaluative or discriminative properties 3. scope was narrow and did not meet the above criteria (eg. 2. Presented a critical and unbiased view of the current state of knowledge. and the rationale was not founded on previous literature. construct (known group differences. expert review/survey or qualitative interviews) or statistical (eg. Performed a thorough literature review. test- retest). responsiveness. and minimally important difference (MID)] 2. A detailed focus on validity that includes multiple forms of validity (content) judgmental (structured.) 4 2 An appropriate scope of psychometric properties would be indicated by: 1. which included the specific type of reliability (intra/interrater or test- retest) or validity (construct/criterion/content. All rights reserved. intraclass correlation coefficients (ICC). treat it as not done. 0 A foundation for the current research question was not clear. 1 Two or more psychometric properties were evaluated. but without additional information. yielding a study group generalizable to a clinical situation. age/sex/diagnosis and the name or type of the practice is given. indicating what is currently known about the psychometric properties of the instruments or tests under study from previous research studies. and appropriate Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 3. Pick the descriptor that sounds most like the study being evaluated with respect to a given item. (“The purpose of this study was to investigate the reliability and validity of…” can be rated as zero if no further detail on the types of reliability and validity or the nature of specific hypotheses is provided. 0 Specific types of reliability or validity under evaluation were not clearly defined. criterion (concurrent/predictive). but a clear rationale was provided for the research question. 0 The scope of psychometric properties was narrow as indicated by evaluation of only 1 form of reliability or validity. convergent/divergent) being tested. 4. demographic information was at University of Washington on December 1. A priori hypothesis definitions are provided of level of reliability and for validity. Indicated how the current research question evolves from a gap in the current knowledge base. the practice setting was described. 0 No information on the type of clinical settings or study participants is provided. factor analyses). convergent/divergent associations). Question Score Descriptors Study question 1 2 The authors: Downloaded from www. standard error of measurement (SEM). Established a research question based on the above. 1. but is insufficient to allow the reader to generalize the study to a specific population. as well as both relative and absolute reliability [eg. 1 Types of reliability and validity being tested were stated. A detailed focus on reliability that includes multiple forms of reliability (at least 2 of intrarater.

For impairment measures. Downloaded from www. C11 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . including test administration and scoring. 0 Less than 70% of the patients eligible for study were reevaluated. then the text describes the procedures in sufficient detail so they can be replicated. or a limited description of procedures is included in the text. were performed in a standardized way. or by demarcat- ing which analyses addressed specific psychometric properties. too many missing items] or individuals [language. this includes calibration of any equipment. If no manual is referenced. but did not present specific sample size calculations or post-hoc power analyses. For simple reliability/validity statistics if sample is greater than 100 but without justification for sample size. 6 2 Ninety percent or more of the patients enrolled for study were reevaluated. This may be accomplished through organization of the results under specific subheadings. any special equipment required. 0 Minimal description of procedures is provided and without appropriate references. and test procedures. training required.jospt. 8 2 All of the measurements. Data were presented for each hypothesis/research question. use of consistent measurement tools and scoring. and interpretation of test procedures that includes any necessary information about positioning/active participation of the subject. Post-hoc power analyses and/or confidence intervals (CI) confirm that the sample size was sufficient to define relatively precise estimates of reliability or validity. but authors did not clearly link analyses to hypotheses. this is characterized by a statement of who administered the forms and by what process (definition of rules for exclusion of forms [eg. No other uses without permission. Measurements 7 2 The authors provided or referenced a published manual/article that outlines specific procedures for administration. use of standardized instructions. Journal of Orthopaedic & Sports Physical Therapy® 1 No obvious sources of bias in how tests were performed/administered. scoring (including how scoring algorithms handle missing data).org at University of Washington on December 1. 0 No description of the extent to which the above standards were maintained or an obvious source of bias in data collection methods. 1 Data were presented for each hypothesis. 0 Data were not presented for each hypothesis or psychometric property outlined in the purposes or methods. 1 Procedures are referenced without any details. [ LITERATURE REVIEW ] 5 2 Authors performed a sample size calculation and obtained their recruitment targets. a priori exclusion of any participants likely to give invalid results or unable to complete testing (not exclusion after enrollment). 1 The authors provided a rationale for the number of subjects included in the study. Analyses 9 2 Authors clearly defined which specific analyses were conducted for each of the stated specific hypotheses of the study. All rights reserved. but minimal attention or description of the extent to which the above standards were maintained. and examiner procedures/actions. For personal use only. etc]). calibration of equipment if necessary. comprehension. 1 More than 70% of the patients eligible for study were reevaluated. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 2014. 0 Size of the sample was not rationalized or is clearly underpowered. cost. For self-report.

Clinical relevance—minimal detectable change (MDC). Validity associations—Pearson correlations for normally distributed data. SEM Correlation matrices for validity analysis may not require that each individual correlation be presented with its Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.org at University of Washington on December 1. Reliability (Relative. For personal use only. 1 Authors made conclusions and clinical recommendations that were general. but not both. All rights reserved.jospt. Journal of Orthopaedic & Sports Physical Therapy® 0 Authors made vague conclusions without any clinical recommendations. at least 2 of the following were presented: 1. 10 2 Appropriate statistical tests were conducted: 1. Responsiveness—standardized response means or effect sizes or other recognized responsiveness indices were used 1 Appropriate statistical tests were used in some instances. Recommendations 12 2 Authors made specific conclusions and clinical recommendations that were clearly related to the specific hypotheses stated at the beginning of the study and supported by the data presented. 2014. Comparison to appropriate benchmarks or standards 3. SEM or plot of score differences versus average score showing mean and 2 standard deviation (SD) limit—Altman and Bland) 2. data. however. MID 3. with post-hoc tests that adjusted for multiple testing 4. or other correlations if appropriate b. ICCs for quantitative data. Kappa for nominal data). 1 Either CIs or appropriate benchmarks were used. but basically supported by the study data. Appropriate confidence intervals (CI) 2. Validity tests of significant difference—an appropriate global test such as analysis of variance was used where indicated. Spearman rank correlations for ordinal Downloaded from www. associated CI. but suboptimal choices were made in other analyses. 0 Benchmarks or CIs were either used inappropriately or not included. © JC MacDermid 2008 journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C12 . and conclusions OR recommendations contradicted the actual data presented. Validity a. CIs and benchmarks should be used according to standards for this type of analysis. No other uses without permission. absolute . 0 Inappropriate use of statistical tests 11 2 For key indicators such as reliability coefficients indices. OR authors made conclusions and clinical recommendations for only some of the study hypotheses.