Measurement Properties of the Neck
Disability Index: A Systematic Review
Downloaded from www.jospt.org at University of Washington on December 1, 2014. For personal use only. No other uses without permission.

t is estimated that a third of all adults will experience neck pain However, the episodic, fluctuating nature

I during the course of 1 year,12 and 70% is the approximate lifetime
prevalence.6,28 About 19% of the population may suffer from chronic
neck pain at any given time,6 creating a substantial societal burden.
Monitoring outcomes is a key component of monitoring the effects of
evidence-based care, in justifying services, and in program evaluation.
of neck pain and the lack of clear and con-
sistent physiological findings complicate
this process. A fundamental component
of monitoring outcomes is having reliable
and valid tools with known measurement
properties.27 In the case of neck disorders,
self-reported pain and disability are usu-
TIJK:O:;I?=D0 Systematic review of clinical group differences or absolute reliability (standard ally the primary focus.
measurement. error of measurement [SEM] or minimum detect- Isolated studies on outcomes mea-
Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

TE8@;9J?L;0 To find and synthesize evidence
able change [MDC]). Most studies suggest that the sures provide a context and method-
NDI has acceptable reliability, although intraclass specific view of the measure’s ability to
on the psychometric properties and usefulness of
correlation coefficients (ICCs) range from 0.50 to
the neck disability index (NDI). provide useful clinical information. It is
0.98. Longer test intervals and the definition of sta-
T879A=HEKD:0 The NDI is the most commonly ble can influence reliability estimates. A number
only by synthesizing information from
used outcome measure for neck pain, and a of high-quality published (Korean, Dutch, Spanish, multiple studies that we can understand
synthesis of knowledge should provide a deeper French, Brazilian Portuguese) and commercially how a measurement tool performs across
understanding of its use and limitations. supported translations are available. The NDI is different contexts and applications. The
TC;J>E:I7D:C;7IKH;I0 Using a standard considered a 1-dimensional measure that can be synthesis of larger pools of data is a mech-
search strategy (1966 to September 2008) and interpreted as an interval scale. Some studies
anism to provide more stable estimates of
question these assumptions. The MDC is around
Journal of Orthopaedic & Sports Physical Therapy®

4 databases (Medline, CINAHL, Embase, and
5/50 for uncomplicated neck pain and up to 10/50 measurement errors and benchmarks for
PsychInfo), a structured search was conducted
and supplemented by web and hand searching. In for cervical radiculopathy. The reported clinically change/outcomes. In fact, 2 systematic
total, 37 published primary studies, 3 reviews, and important difference (CID) is inconsistent across reviews have been conducted that ad-
1 in-press paper were analyzed. Pairs of raters con- different studies ranging from 5/50 to 19/50. The dress self-report measures for neck pain.
ducted data extraction and critical appraisal using NDI is strongly correlated (0.70) to a number
Both compared different outcome mea-
structured tools. Ranking of quality and descriptive of similar indices and moderately related to both
physical and mental aspects of general health. sures using a semistructured process and
synthesis were performed.
concluded that the Neck Disability Index
TH;IKBJI0 Horizon estimation suggested the T9ED9BKI?ED0 The NDI has sufficient support
and usefulness to retain its current status as the (NDI) was the most commonly used self-
potential for 1 missed paper. The agreement
most commonly used self-report measure for report instrument for evaluating status
between raters on quality assessments was high
(kappa = 0.82). Half of the studies reached a qual- neck pain. More studies of CID in different clinical in neck pain clinical research.34,37 Both
ity level greater than 70%. Failures to report clear populations and the relationship to subjective/ reviews alluded to the psychometric and
work/function categories are required. J Orthop
psychometric objectives/hypotheses or to rational- clinical properties of the tools, but neither
ize the sample size were the most common design Sports Phys Ther 2009;39(5):400-417. doi:10.2519/
jospt.2009.2930 attempted to deal with specific properties
flaws. Studies often focused on less clinically
or to synthesize this knowledge. The de-
applicable properties, like construct validity or TA;OMEH:I0 cervical spine, outcome measure,
group reliability, than transferable data, like known reliability, validity veloper of the NDI published a summary
paper in 2008 summarizing a 17-year his-

Co-director, Hand and Upper Limb Centre Clinical Research Laboratory, St Joseph’s Health Centre, London, ON, Canada; Associate Professor, School of Rehabilitation Science,
McMaster University, Hamilton, ON, Canada; Funded by a New Investigator Award, Canadian Institutes of Health Research. 2 Lecturer and PhD Candidate, School of Physical
Therapy, The University of Western Ontario, London, ON, Canada; Funded by a Doctoral Fellowship, Canadian Institutes of Health Research. 3 Physiotherapy Student, School of
Physical Therapy, The University of Western Ontario, London, ON, Canada. 4 Emeritus Professor, Department of Clinical Epidemiology and Biostatistics, McMaster University,
Hamilton, ON, Canada. Address correspondence to Dr Joy MacDermid, Co-director, Hand and Upper Limb Centre Clinical Research Lab, Monsignor Roney Ambulatory Care
Centre, 930 Richmond Street, London, ON, N6A 3J4 Canada. E-mail: jmacderm@uwo.ca

400 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy

tory with the NDI.46 The author stated
that “The Neck Disability Index has been Search Strategy
used in over 300 publications, translated Key words: (NDI OR Neck Disability Index) AND (reliability OR validity OR
responsiveness OR clinically important difference OR MCID OR Rasch OR
into 22 languages…and endorsed for use factor analysis OR translation OR validation).
by a number of clinical practice guideline Dates: January 1966 to September 2008
committees…making it the most widely Other: Google searches and hand searches of retrieved study references lists
used and most strongly validated instru-
ment for assessing suffering disability in
patients with neck pain.” Located citations (n = 189):
Downloaded from www.jospt.org at University of Washington on December 1, 2014. For personal use only. No other uses without permission.

While previous reviews have claimed D mbase (n = 66)
D "edline (n = 63)
to be systematic, none have included a
D INAHL (n = 66)
key element of systematic review, criti- D %sycInfo (n = 1)
cal appraisal of the quality of individual D and searching (n = 2)
studies. This may reflect inherent dif- D ontact author (n = 1)
ficulties in performing the critical ap-
praisal due to lack of instrumentation.
(3=5/ +,<=;+-=;/?3/@7 
The lead author of this current review
has published a scale and interpretation
guide25,26 for this purpose. Psychometric Unique articles accepted for full A-5>./.+=+,<=;+-=;/?3/@
studies are important to establish mea- review (n =  ): D #8.+=+8;78=;/5/?+7= unique)
Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

surement properties like the relative dif- D mbase (n = 30) D tudy design criteria (n = 0)
ficulty of items, appropriate grouping of D "edline (n = 28)
D #!7

items into subscales, reliability, validity, D and searching (n = 2)
and responsiveness. In addition to psy- Horizon estimation of database
D $ther (n = 1) </+;-2;/=;3/?+5
chometric properties, clinicians are con-
cerned with issues on feasibility, floor/
ceiling effects, availability of different Included: A-5>./.+0=/;0>55=/A=;/?3/@7 = 2):
language/cultural adaptations, and ad- D tructured reviews (data No data on NDI (n = 2)
ministration burden for themselves and extraction) (n = 3) !+71>+1/

B188.” or. “usefulness” have been used in reference to these practical considerations. 88. as we prefer.+3<+5736) kept for data extraction plicability.@3=27153<2 D %rimary studies Journal of Orthopaedic & Sports Physical Therapy® +.” or “clinical utility/ap.  7 . Terms like “clinician/pa- D ata extraction (n = 38) excluded from critical appraisal tient friendliness. D >55=/A=-. When thera- pists try to integrate the NDI into their Quality summary of appraised primary papers (n = 36): %88.  7   formation on usefulness.<=.  7   clinical practice. they are concerned with +3.3=3-+5+99.+-=7877153<2=/<= their patients. The purpose of */.  7  the psychometric issues but also need in.

D:?N7). The numeric response for ernon and Mior47 developed and work. NDI contained 5 items from the OLBPI. resulting in the 10-item scale. A-/55/7=( 7  view that would summarize the quality and content of current research regard. personal care. from the original scale: pain intensity. and 5 new C. ing the psychometric properties and use- fulness of the NDI.The systematic review evidence flowchart. the unanswered (7FF. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 401 . lifting. items. They initially selected 6 items jury and the review team.47 Some evaluators choose to provide a score out of 100%. In the end. this study was to conduct a systematic re. particularly as a strategy to deal with questions left metric. <?=KH. sleep. and 2 of which were modified. each item is summed for a score varying L the NDI using the Oswestry Low Back Pain Index (OLBPI) as a tem- plate for identifying items and a scoring These 10 items were modified for clarity and relevance based on feedback from 5 individuals with a history of whiplash in- from 0 to 50. and submitted this to a consult. reading.47 The questions are measured on ing team. concentration. The consulting team added 4 a 6-point scale from 0 (no disability) to 5 :[l[befc[dje\j^[D:? items: headache. driving.J>E:I sex life. (full disability).

). raters were not comfortable with this ties across all identified studies is sum- tion review.jospt. mation to evaluate the potential that any when independent assessments differed: In total. although between January 1966 and September the first author. veloped by adaptation of categories used rank ordered studies on quality and con- ing Medline. their disagreement. an adjudicator was required. addressed at least 1 psychometric proper- our search using a procedure developed ment was based on factual content ty of the NDI (J78B. A descriptive synthesis ments and a predetermined consensus signing the lesser of the 2 scores.'). An article was accepted if it met thor/instrument developer (J. No other uses without permission. 4.I) through 5. disagree on an item by 1 point."J78B.D:?N 8" 7L7?B. Raters then discussed their percep. The data extraction form was de.82. Pairs of raters consulted the first au. An optimal study references lists. esis/objectives. stract data extraction but not full critical manuscript complied with item re. Journal of Orthopaedic & Sports Physical Therapy® the following inclusion criteria: reported clarification of the meaning or inter. if the raters continued to size calculations/justification.EDB?D. although this was not required during properties evaluated. Index) AND (reliability OR validity OR vidual study quality were developed by H. CINAHL.. Most studies addressed Following this.) for studies crossed different populations. [ LITERATURE REVIEW ] B_j[hWjkh[I[WhY^WdZIjkZo extraction forms and quality appraisal ies are cross-sectional. and on at least 1 psychometric property of the pretation of specific items of the scale addressed different psychometric prop- NDI in patients with neck pain and was if they felt this was contributing to erties. the rat. studies provided specific conclusions or 402 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . If the of the findings for psychometric proper- process. 41 studies were identified that Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Quality of the individual studies written in French or English (2 with Eng. For personal use only. and (3) The data extraction and review pro.34. who also developed an there was no formal mechanism to weight 2008 (<?=KH. lish abstracts and non-English full text [1 3. of individual studies. 0-2. The following keywords accompanying guide (7FF. pairs of raters indepen.25.37 Similarly.25. The most common flaws observed in appraisal). absence of error estimates such as confi- cess was conducted by pairs of raters and ers decided if they were comfortable dence intervals or standard error of mea- was based on the use of structured instru.00.26 All authors met for a calibra.14 most common source of disagree.43 to 1.46 The primary authors.IKBJI responsiveness OR clinically important the lead author. and time intervals. (2) inadequate sample review (<?=KH. were performed.13 with additions from clusions and recommendations. we conducted Google searches and hand searches of retrieved associated ratings for individual studies. Each paper’s score was converted into a spectrum of psychometric proper- dently evaluated an assigned subset of a percentage because 1 item was based on ties.(. with the default compromise of as. We conducted a Horizon esti. We A database search was performed us. Few articles using previously developed data follow-up and some psychometric stud. Interrater reliability on quality ratings was calculated based on preconsensus J a possibility that our search strategy missed 1 article (95% confidence interval. original manuscript. quirements (if something was not the psychometric articles were (1) not 3 structured reviews. surement (SEM). At this point.). PsychInfo. and then Medline. denominators for different studies. with agreement on indi. In total. er did not include formal critical appraisal pendently reviewed by at least 2 of the 2.'). tions of the extent to which the study a score of 75% on the quality rating (J78B. PsychInfo did not contribute any infor- the <?=KH. The number of scores of rater pairs. 78B. abstract/title and full review are noted in vidual items varying from 0. In addition. Due to the reviewed 1 paper then met and discussed 2 points. the overall estimated search strategy would have been Embase studies retrieved and the net results of kappa was 0. We used the following consensus process mation after these 3 databases. the study. quality tool are listed in J78B. which included papers written Euchaute et al. and in a previous psychometric review by sidered this ranking when making con- Embase. no meta-analyses terpretation of critical appraisal items. leaving unequal ?Z[dj_ÓYWj_ed tools. (). conclusions. in which they independently resolution or continued to disagree by marized in J78B. 97% yield). raters clarified if the disagree. interventions. based on the quality of the were used to search all databases for eli. Two neck disabil- by Foster and Goldsmith for SAS soft. appraisal.26 The items from the difference OR MCID OR Rasch OR fac. with the he Horizon analysis14 indicated tor analysis OR translation OR valida- tion).org at University of Washington on December 1. and 1 unpublished documented in the manuscript it was reporting specific psychometric hypoth- in press manuscript proceeded to a full treated as “not done”). 2014. 1 Spanish] were included for ab. then CINAHL. ware (full details available from author item. All rights reserved. but few were comprehensive. 37 primary studies. with 37% of papers reaching or exceeding Dutch. a recent compre- The first stage of study identification ment) were resolved by recheck of the hensive review of the NDI by the develop- was of title/abstracts. First. Factual content/oversights (the although neither included formal critical Charlie Goldsmith). gible studies: (NDI OR Neck Disability interpretation guide used to assess indi- Downloaded from www. or the extent of compliance with the ity instrument reviews were identified. ranging from 21% to 96%. Critical appraisal forms and associated source document. was variable. which were inde. first. articles were missed and the efficiency of 1. heterogeneity of study populations and each item to clarify the meaning and in.M.

group 2. Site: PT clinics involved in a study of botulin toxin Included: received PT that they considered inadequate for NP Excluded: surgical candidates Retest interval: within 1 wk Hains et al 199817 Patients: 62% F. responsiveness Intervention: PT 14 wk Retest interval: baseline evaluation and Site: PT Clinics after 5-7 visits Cleland et al 20089 Patients: stable. Intervention: ND Site: midsize hospital in southern Brazil validity. rheumatoid arthritis. neck trauma Chok et al 20008 Patients: NP 46 Reliability. varying severity and duration of WAD (level of WAD: 23% I.jospt. 38 Reliability.org at University of Washington on December 1. mean age. responsiveness Retest interval: 10 subjects retested randomly at 24 h and a separate 10 subjects retested at 7 d Gay et al 200715 Patients: 7 M. at 3 w and 3 mo musculoskeletal symptoms Andrade et al 20082 Patients: nonspecific or posttraumatic NP 48 Spanish translation. mean age. reliability. 39 y. NP with or without referral of symptoms to the upper extremity or extremities.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 1 Ackelman et al 2002 Patients: 59 Swedish patients (28 M. symptom duration. validity. and community rheumatology clinic into private PT practice journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 403 . construct validity. 16 F. 42% II. For personal use only. 2 or more Journal of Orthopaedic & Sports Physical Therapy® signs consistent with nerve root compression. 22 tested twice for reliability and 24 discharge that completed baseline and discharge assessments Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved. Unknown validity Hoving et al 200319 Patients: mean age. fied version) Retest interval: 5 occasions. chronic NP 33 Responsiveness Intervention: group 1. Reliability. Site: Spain responsiveness Aslan et al 20083 Patients: NP 88 Reliability.' Summary of Studies Addressing Psychometrics of the NDI IjkZo FefkbWj_ed n Fhef[hj_[i. plus heat. 43 y. mean age. home consistency. mean age. ankylosing spondylitis. NP. retest 19 had chronic NP. 43 y) 203 Brazilian Portuguese translation. construct validity Intervention: PT Included: patients with at least 3 mo of NP Retest interval: 7 d Bolton 20045 Patients: nonspecific NP 125 Responsiveness Intervention: chiropractic Site: 8 chiropractic clinics and 1 teaching clinic Retest interval: pre-Rx and 4-6 wk postinitial Rx Chan et al 20087 Included: patients with chronic nontraumatic NP with a 20 Construct validity Intervention: ND duration of greater than 3 mo Retest interval: Cross-sectional Excluded: cervical radiculopathy. 18-89 y (mean. mean age. twenty 59 (38 for modi. central nervous system involvement. responsiveness Retest interval: after 1 session symptom duration. NDI: internal approaches. 25% F. 43 y. prior surgery to the cervical or thoracic spine. Intervention: ND Downloaded from www. 42 y. 40 y. No other uses without permission. and 6% III). 31 F). many occupations. validity. 71 Validity Intervention: none all patients were less than 24 mo since MVA. 21 F. responsiveness exercise program Retest interval: 4 wk Goolkasian et al 200216 Patients: 12 M. 51 y. 2014. reliability. validity Intervention: PT Site: an outpatient setting Retest interval: admission and Included: 2 cohorts. chronic 23 Primary purpose to evaluate Intervention: 1 of 3 manual therapy uncomplicated NP another scale. Intervention: manual therapy duration. NDI score greater than 10%. PT. 83 d. n = 71 Site: patients recruited from PT clinics. and 20 had no NP but had other within 48 hours. responsiveness Intervention: PT patients were in the acute phase after a neck sprain. symptom 138 Reliability (SEM and ICC). 59% of patients were M. or pending legal action regarding their NP Cook et al 200611 Patients: 62% M. J78B. and chronic NP Retest interval: unknown Site: chiropractic college clinics in Vancouver and Toronto Included: English speaking. improved. 60% F. 46 d (SRM and MDC) Included: age 18-60 y. subacute. aged. 50 y. GPs. WAD within the past 6 wk. Cleland et al 200610 Patients: 47% F. mean age. Excluded: any signs or symptoms consistent with a nonmusculoskeletal etiology. 237 Validity Intervention: unknown acute. 17 y or older Heijmans et al 200218 Patients: chronic WAD 61 Dutch translation.

reliability. 66%. 2014. 18-55 y.jospt. 26%. NR. mean age. nonspecific NP. 54% F. neck trauma. 2 or more missing items on NDI Pietrobon et al 200234 Structured review of different neck disability scales NA Validity NA Pool et al 200735 Patients: 61% F. validity Intervention: various Site: spinal clinic Retest interval: 1-2 wk Miettinen et al 200430 Patients: NP who had been involved in an MVA 144 Validity.' Summary of Studies Addressing Psychometrics of the NDI (continued) IjkZo FefkbWj_ed n Fhef[hj_[i. NDI. spine injury. 6 wk of NP 102 Reliability.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 20 Humphreys et al 2002 Patients: NA 125 Reliability. NDI score. responsiveness Intervention: ND diagnosis of NP Retest interval: first and last visit or Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 31 F. No other uses without permission. Site: patients were chosen from 3 hospitals and 7 7th or 8th Rx clinics in 5 Korean cities McCarthy et al 200729 Patients: NP 160 (34 retest) Reliability. aged 18-65 y Excluded: fracture dislocation. and chronic who visited their 221 (54 from pilot. second 160 controls) Retest interval: follow-up varied group subjects from a convenience sample who did between 4-12 w not have NP and who were not being treated Site: tertiary outpatient clinic Lee et al 200624 Patients: 17 M. systemic disease. relative Intervention: ND 78%. 146 (69 for retest) Validity. 68%. with subacute pain of less 150 Factor validity. comor. bidity. manual therapy. validity. employed: 80%. 183 CID Intervention: random allocation to PT. include a content analysis at item level Riddle et al 199838 Patients: 64% F. validity. central nervous system Kumbhare et al 200523 Patients: the first group comprised of 81 patients from 241 (81 patients. GP care 14. pain.org at University of Washington on December 1. baseline VAS. For personal use only. adequate mental capacity within 4 wk of surgery decompression and fusion Sites: 23 clinics Excluded: revisions/tumors 404 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . subacute. 26%. age. neurological injury. gradual onset 185 Iranian translation. prior history of chronic pain. responsiveness Intervention: chiropractic Site: 8 chiropractic clinics and 1 teaching clinic Retest interval: postinitial Rx. 48%. [ LITERATURE REVIEW ] J78B. surgery. F: 80%. convergent validity. validity. Retest interval: none Site: primary care or PT clinics floor/ceiling. ness 67%. 88% of patients 50 years or younger. infection Kovacs et al 200822 Patients: acute. 134. Spanish translation. Reliability. 6/10 Rebbeck et al 200736 Patients: 3 cohorts primary care insurance (early and (99. 46 y. age. with a primary 180 Reliability. responsiveness Intervention: unknown variety of neck disorders. 18-69 y. 7-12 wk. 23. Intervention: ND physician for NP plus 167) validity. internal consistency. 18 Resnick 200537 An overview of instruments in a structured review by NA Validity NA a single author and evaluator. responsiveness Intervention: PT 5/wk for 3 wk Excluded: inflammatory arthritis. responsiveness Retest interval: 1-d retest and Site: 9 primary care and 12 specialty services from 9 re-evaluation at 14 d regions in Spain Excluded: functional illiteracy. most common neck strain Retest interval: admission and or NP discharge Site: 1 of 4 clinics in Richmond. 1 year 33%. 250) Convergent validity. translation Intervention: routine Rx than 3 mo following whiplash injury from 10 different practices. 55/100 reliability. All rights reserved. responsive. 4-6 d post-Rx Kose et al 200721 Patients: 83% F. 534 Convergent validity Intervention: anterior cervical thy. tion 2-6 wk. dura. Retest interval: 1-3 d without therapy Downloaded from www. ND late). responsiveness Intervention: ND Site: Finland Retest interval: 1 and 3 y post-MVA Mousavi et al 200731 Patients: NP 3 mo 30%. interpretability Journal of Orthopaedic & Sports Physical Therapy® Nieto et al 200833 Patients: Catalan-speaking. 70%. 13 wk. VA Excluded: under treatment for problems other than c-spine Scolasky et al 200739 Patients: NP cohort cervical radiculopathy or myelopa. responsiveness Intervention: PT Ontario Canada who had grade II WAD.

Virginia.org at University of Washington on December 1. nonchronic NP of musculoskel. Pain-Specific Functional Scale. F. neurological ability. no formal critical appraisal of studies Vernon et al 199147 Patients: 17 M. Site: different clinics items Vernon 200048 Combination of instrument review (NDI and other). PSFS. responsiveness Intervention: none longer than 6 wk. responsiveness Intervention: 6-wk individualized disabled preinjury and presented for medical care submaximal exercise program. translation signs. and tion of reduced 8-item scale and total NDI of 13-14 differential item functioning of Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. validity Intervention: none chronic nontraumatic Retest interval: 2 d Site: Clinic of Canadian Memorial Chiropractic College Journal of Orthopaedic & Sports Physical Therapy® Vos et al 200649 Patients: 119 F. Intervention: routine Rx Site: primary care in 3 rural centers as stable) internal consistency. SRM. 31 Reliability. Intervention: ND studies of n = 188 and the other n = 333 Rasch modeling of item. NR. ND. GP. UK. visual analog scale. 6 cases removed because of 2 missing items on NDI van der Velde et al45 Patients: 55% or more F in 2 cohorts from previous 521 Factor analysis. treatment. males. 70% WAD in past 4-6 wk. pain bother-someness. randomized controlled trial. MVA.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 42 Stewart et al 2007 Patients: 17 M. and following 1-4 w of Rx by 1 certified Canadian orthopae- dic manipulative PT White et al 200451 Patients: chronic mechanical NP 133 Validity Intervention: in treatment RCT Site: recruited from PT and rheumatology waiting lists Retest interval: before Rx initiated. contraindicated session weeks 5. baseline pain of 5-6/10. females. within 1 mo of MVA. NPRS. Numeric Pain Rat- ing Scale. VA. clinically important difference. 12-80 y. LBP. 49 y 71 Validity. 30% 48 Reliability. 68 M. low blood pressure. chronic WAD (>3 mo). For personal use only. evalua- Included: pain for 5-6 wk.jospt. 4. severe or greater depressive symptoms. validity Intervention: none 200253 Site: patients recruited from outpatient clinics. VAS. numeric rating. poor English Stratford et al 199943 Patients: NP 50 Reliability. M. Retest interval: 24 h bilitation department. NA NA NA oper) that includes psychometrics. content validity. positive upper limb tension test. Rx. 31 F. a score of at least 20% on primary supervised by PT outcome measures: NPRS. retest reli- Excluded: symptoms below the elbow. etc) Westaway et al 199850 Patients: Age. 2014. ICC. J78B. minimal detectable change. whiplash-associated disorder. intraclass correlation coefficient. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 405 . mean age. motor vehicle accident. All rights reserved. rheumatology department (ie. 1 red flags. rheumatology department (ie. tertiary care teaching hospital and outpatient clinic) Abbreviations: CID. not described. NP 101 Reliability. validity. to exercise. Neck Disability Index. responsiveness Intervention: PT Site: 5 PT clinics in North America Retest interval: 1-3 wk Trouli et al 200844 Patients: NP 65 (43 classified Factor validity.' Summary of Studies Addressing Psychometrics of the NDI (continued) IjkZo FefkbWj_ed n Fhef[hj_[i. systemic or other major disorder Wlodyka-Demaille et al Patients: mean age. No other uses without permission. responsiveness Intervention: routine Rx etal origin “cervical dysfunction” Retest interval: initial assessment. plus NA Content validity (comparison of NA some impairment sincerity of effort testing items) Vernon 200846 A structured review by a single author (the NDI devel. 6. responsiveness Intervention: ND 200452 Site: patients recruited from outpatient clinics. 2 follow-up phone calls no neck radiograph since MVA. reha. recurrent acute NP lasting no 187 Reliability. prediction and effect sizes across treatment RCTs that used the NDI. nerve root compromise. 2. 2 sessions week 3. “mildly” 134 Reliability. NDI. general practitioner. PT. tertiary care teaching hospital and outpatient clinic) Wlodyka-Demaille et al Patients: 43 F. neck pain. vascular or neurological disorders. NP. standardized response mean. rehab Retest interval: 11 2 mo department. 49 y. physiotherapy. RCT. current PT neck Rx. Site: referred by physician to PT 72 h. known or suspected 1. 28 M. 31 F. SEM. item consistency. at at 2 hospitals in UK 1 wk and 8 wk Excluded: undergoing litigation. validity. United Kingdom. standard error of measurement. Downloaded from www. with pain-free interval of at least 3 mo Retest interval: 1-wk follow-up Site: referred by GPs in Rotterdam and region Inclusion criteria: aged 18 and sufficient Dutch Exclusion criteria: specific cause of NP (ie. WAD. MDC. PSFS Retest interval: 3 sessions in week Excluded: previous neck surgery.

[ LITERATURE REVIEW ] numbers that could readily be integrated to rely on generalities when making con. 8. Measurement techniques were standardized. Valid conclusions and clinical recommendations. comprised of less clinically useful data.18 One paper had item comparison but no NDI data. Data were presented for each hypothesis. 7. 11.jospt. - . ing NDI validation studies was typically into clinical practice but. All rights reserved. 2 2 1 1 0 0 50 Bolton 20045 1 1 0 1 0 0 2 1 2 1 1 2 50 7 Chan et al 2008 1 2 0 0 0 NA 1 1 2 2 0 1 43 Humphreys et al 200220 1 1 1 1 0 0 0 0 2 2 0 2 42 Rebbeck 200736 0 1 1 1 2 . Specific inclusion/exclusion criteria. Thorough literature review to define the research question. The type of data collected dur. For personal use only.34. 0 0 1 1 1 0 36 Chok et al 20058 1 1 0 1 0 0 1 1 1 0 1 1 33 Miettenen et al 200430 0 2 0 0 0 0 0 0 1 0 1 1 21 Abbreviations: NA. J78B. 2. 12. not applicable to paper. Quality scores were rated in content for the NDI-related methods. Appropri- ate scope of psychometric properties. and interpretation of procedures. Riddle et al 199838 2 2 2 2 0 0 2 1 2 2 1 2 75 Lee et al24 2 1 2 2 0 0 2 2 1 2 2 2 75 42 Stewart et al 2007 1 2 2 1 0 NA 2 2 2 2 1 1 73 Pool 200735 2 2 1 0 1 1 1 1 2 2 2 2 71 Kose et al 200721 1 2 1 2 1 2 1 1 2 2 1 1 71 Kovacs et al 200822 1 2 1 2 1 1 1 2 1 2 2 1 71 Hains et al 199817 2 1 2 1 0 NA 2 1 2 1 1 2 68 Ackelman et al 20021 1 2 1 2 0 2 1 2 2 2 0 1 67 Vos et al 200649 2 2 1 1 0 0 2 2 1 2 2 1 66 Journal of Orthopaedic & Sports Physical Therapy® Vernon et al 199147 2 1 1 2 0 2 1 1 2 2 0 2 66 52 Wlododycka et al 2 2 0 0 0 1 1 1 2 2 2 2 63 White et al51 1 2 1 1 0 1 1 1 2 2 1 2 63 Goolkasian et al 200216 1 1 2 1 0 1 1 1 2 1 2 2 63 Gay et al 200715 2 2 1 2 0 1 1 0 1 2 2 1 63 Nieto et al 200833 2 2 1 1 1 NA 1 1 2 2 0 1 61 Pietrobon et al 200234 2 0 1 2 1 0 2 NA 2 1 0 2 59 Mousavi et al 200731 2 1 1 2 1 0 1 0 1 2 1 1 59 Scolasky 200739 1 2 0 0 2 . 3. 10.46 Two papers had data extraction from the English abstract and study tables but no critical appraisal.2.lWbkWj_ed9h_j[h_W IjkZo† ' ( ) * + . scoring. Follow-up. * Evaluation criteria: 1. 5. Specific hypotheses. rather. Sample size. because they were in non-English text. 2014. Three structured reviews underwent data extraction but no critical appraisal. 4.37. tended clusions.org at University of Washington on December 1. † Papers where NDI evaluations were performed while evaluating a different measure as the primary purpose. so no full critical appraisal. 6. Ap- propriate statistical error estimates.( Quality of Studies on the Psychometric of the Neck Disability Index ?j[c. Appropriate statistics-point estimates.46 406 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . No other uses without permission. 9. The authors referenced specific procedures for administration. / '& '' '( JejWb Downloaded from www. Cleland et a 20089 2 2 1 2 2 2 2 2 2 2 2 2 96 Van der Velde et al45 2 2 2 1 2 2 2 2 2 2 2 2 96 Kumbhare et al 200523 2 2 2 2 2 2 2 2 1 2 2 2 96 Westway et al 199850 2 2 2 2 0 2 2 1 2 2 2 2 88 Cook et al 200611 2 2 2 2 0 2 1 2 2 2 2 2 88 10 Cleland et al 2006 2 2 2 2 0 2 1 2 2 2 2 2 88 Trouli et al 2008 1 2 1 2 1 2 2 1 2 2 2 2 83 Hoving et al 200319 2 2 1 2 0 NA 2 2 2 2 1 2 82 Aslan et al 20083 2 2 1 2 0 1 2 2 2 2 1 2 79 Stratford et al 199943 2 1 1 1 1 2 2 1 2 2 2 2 79 McCarthy et al 200729 2 1 1 2 1 1 2 2 1 2 2 2 79 Wlodyka-Demaille et al 200253 1 2 1 2 1 2 1 1 2 2 1 2 75 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.

the NDI had the strength of being most studied amongst these 3 and was at that time the only instrument revalidated in The results of a subsequent systematic vidual studies. GRC. 0.9024 databases. Abbreviations: CI.92 (chronic) to 0. 3 to –3/7).5 in patients with neck pain35 tectable change [MDC]). 0.96 Journal of Orthopaedic & Sports Physical Therapy® formal quality evaluation was performed.983 ditional papers (up to year 2000). 1 d.7350 when compared to other scales. 2014. 19. Eight English-language of this review suggested other scales had for the cervical spine published up un. the Copenhagen Neck Func.509 the NDI. It was suggested that ficient. reviewed in a qualitative manner. 0. impossible. in healthy controls. like known group differ- ences that could be used as comparative H[b_WX_b_jo :WjW. Mean retest difference As per Altman and Bland technique.81-0. 0.99 (chronic) to 0. the content of instruments was compared parisons between patients are virtually formed. 5 with 90% CI in patients with neck pain48 IkccWhoe\Fh[l_eki MDC.7 with 90% CI43 IjhkYjkh[ZH[l_[mi Patients classified as stable over 1 wk (GRC. No other uses without permission.9211 multiple raters or formal critical ap. or a formal process to syn- different study populations. 2 d.22 3 of the scales were similar in terms of 0.97 (n = 30/185)31 structure and psychometric properties: Patients classified as stable on GRC. 10.7844 Three papers were identified where the MDC. k = 0. SEM. like correlations indicating construct convergent validity.90 (acute)11 tation tracking using the citation index. The first review34 is impor. Downloaded from www. r = 0. whiplash- associated disorder.90 in first cohort8 tional Disability Scale. 7-14 d. SEM. WAD. Again.) Summary of Reliability Properties information. Similarly.jospt. These included til 2004 suggested that instrument de. while databases chometric properties of each scale were considered “very sensitive to functional were used to identify papers. r = 0. MDC. The instruments (and year must be read to the patient and the increased number of neck pain scales (a published) are listed in J78B. r = 0. 1-3 d. 4. 0. confidence interval.941 properties. MDC Patients classified as stable (GRC 3 to –3/7).951 in terms of content and measurement 90 d. ICC = 0. cific validated for the neck) scales were that the Neck Pain and Disability Scale velopment is ongoing as it identified an reviewed.982. 1. These authors concluded that 0. SEM. All rights reserved.2 in patients with cervical radiculopathy10 MDC.. and the North. but without use of coefficients 1 d. 0. such as intraclass cor. quality ratings for indi.org at University of Washington on December 1. 1.6. MDC. The psy- Patient-Specific Functional Scale was total of 11 scales). This re. was reported SEM Patients classified as stable. r = 0. intraclass correlation coef- wick Park Scale. tween retests and the 2 standard deviation brackets around that difference. 0. in patients with 6 wk of neck pain 0. 7 d. 0. (specific-to-neck) scales and 3 (nonspe- important limitations. ci. 3 to –3/7).66 in WAD49 authors performed a comprehensive structured review with use of a formal Test-retest reliability 1 d. –1. 10. standard error of the measurement. minimal detectable change.9329 Five standardized neck pain scales were 7-21 d.48 (chronic) to 0. global rating of change.5  344 relation coefficients (ICCs).9353 praisal.93 correspondence with experts to find ad. 0. The authors review of studies on outcome measures thesize results. For personal use only. 0.9049 hand searching of relevant journals. ICC. where reliability is assessed as mean difference be- group reliability. MEDLINE and CINAHL.93 (acute)11 search for articles.. MDC.9443 identified and compared qualitatively 21 d. 0. patients in acute condition. using GRC scores of 3 to –3/7. while changes in individual patients. This item analysis indicated that the most journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 407 . tant because it established the relative 2 d. 7 d. including use of multiple raters/ more quantitatively in a summary table. but com. No 7-14 d. or minimal de- MDC. r = 0.89 (acute)1 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. versus more useful J78B. –0. other ele.njhWYj[Z data for clinical comparisons.49 absolute measurement error (like SEMs.6444 more often than more useful indicators of Patients classified as stable over 1 wk (GRC.94-0.971 strengths and weaknesses of the NDI 3 d. ments of systematic review were not per. MDC.9 mean retest differences. 7 d. 8. and 7 d.”34 data extractors.8621 view was based on a formal search of 2 3-7 d.

47 Factor validity Support for 1 factor š '\WYjehYedÓhc[Z. mobility. A scree plot confirmed a 2-factor solution as optimal. The highest mean severity scores were found Downloaded from www. personal care. It was determined that 75% of patients identify problems with sleep. gardening. appropriate. and general exercise.$+e\fWj_[dji^WZ_d_j_WbiYeh[im_j^_d'C:9Z_ijWdY[\hecj^[X[ijfeii_Xb[Wdim[h&h[l[Wb_d]deY[_b_d][÷[YjiWYYehZ_d]jeW'+Yh_j[h_ed$44 š '&%+&Wia[Z\ehYbWh_ÓYWj_edZkh_d]IfWd_i^WZWfjWj_ed$22 š CeijYeccedboc_ii_d]_j[ciedXWi[b_d[Wii[iic[djWdZ'mabWj[h_dYbkZ[Zb_\j_d]''"h[WZ_d]/"Zh_l_d]*+"h[Yh[Wj_ed'44 Internal consistency š 9edi_ij[djbo^_]^9hedXWY^Wbf^W0&$-&#&$/.0)53 š <WYjeh'"\kdYj_edWdZZ_iWX_b_jo_j[ci0(". [ LITERATURE REVIEW ] J78B. cooking. in depression. and recreation.1c_bZ"(-1ceZ[hWj["*&1i[l[h["**WdZl[hoi[l[h["-&$ š FW_d0defW_d"((1c_bZ")&1ceZ[hWj["). and lifting.jospt.29. ceiling/floor the highest prevalence.70): š I<#). sleep disturbance. 6 were included on the NDI. recreation48 tion. known Detected differences between group differences š :_iWX_b_job[l[bi0deZ_iWX_b_jo"'.43. The correlation between these factors was 0. convergent Correlations with other scales reported 0. 2014.'22 š 7cekdje\Y^Wd][0'&$+"Yecfb[j[boh[Yel[h[Z1ckY^_cfhel[Z". driving. and symptoms. lifting. All rights reserved. Individual problems ranked most important by subjects were driving. and sleeping. driving. driving.*lWh_WdY[edÓhij\WYjeh11. concentration.$11.1_cfehjWdjY^Wd]["/35 š H[Yel[h[Z".* Summary of Validity Properties LWb_Z_jo :WjW.38. š 7kj^ehiki[ZWfWj_[dj_dj[hl_[mje[b_Y_jfheXb[ciYWbb[Zj^[fheXb[c[b_Y_jWj_edj[Y^d_gk[je_Z[dj_\o\kdYj_edWbfheXb[ci_dfWj_[djim_j^Y^hed_YdedjhWk# ness.24.19.623 Construct.#.21L7IfW_d"9IG"DFG22 š L7IXeZ_bofW_d31 Correlation with other scales reported as moderate (0. missing matic neck pain.C9I">7:I"DF7:"L7IfW_d"Y^hed_Y1."'&53 š <WYjeh("d[YafW_d_j[ci0'"*"+"/53 š <WYjeh'"fW_dWdZ_dj[h\[h[dY[m_j^Ye]d_j_l[\kdYj_ed_d]33 š <WYjeh("\kdYj_edWbZ_iWX_b_jo š Ki_d]WdeXb_gk[hejWj_edj[Y^d_gk["'\WYjehÇfW_dWdZ_dj[h\[h[dY[e\Ye]d_j_l[\kdYj_ed_d]"È[nfbW_d[Z*(e\j^[lWh_WdY[WdZYedjW_d[Zj^[\ebbem_d]_j[ci0 pain intensity. concentration. headaches. emotion. role activity.70 Journal of Orthopaedic & Sports Physical Therapy® š FI<I"50DFG"DF:I":H?"L7IWYj_l_jo#Y^hed_Y"L7IfW_dWdZWYj_l_jo"WYkj[$1.17 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. No other uses without permission.53 VAS pain. analyses were completed to deter- mine differences between the 1.50.org at University of Washington on December 1.1i[l[h[". Of the 11 problems identified by most subjects. For personal use only.njhWYj[Z Content (including š 9ecfWh_iede\_j[ciWYheiiD:?"DFG"WdZj^[9ef[d^W][dD[Ya<kdYj_edWb:_iWX_b_jo?dZ[n1_j[ciedWbb)0ib[[f_d]WdZh[WZ_d]1_j[cied(iYWb[i0fW_d" analyses of ques. Sleep disturbance had items. lifting. housework. More than half of subjects identified difficulties with frustration.7 š D:?mWideji[di_j_l[jej^[_j[ciYedY[djhWj_ed'$*"'$'ehf[hiedWbYWh[&$-"&$. and frustration. the latter being significantly better. The second factor on functional disability included the items personal care.17. š '\WYjeh2 Support for 2 factors š <WYjehWdWboi_i0(\WYjehi[njhWYj[Z[_][dlWbk[i"1.53 š L7IZ_iWX_b_jo"F9II"C9II"DFGXWi[b_d["9IGXWi[b_d[ š L7IZ_iWX_b_jo"f^oi_YWbhWj_d]"ckiYb[ifWic"d[YaHEC"d[Yai[di_j_l_jo21 š =[d[hWb^[Wbj^"l_jWb_jo"ieY_Wb\kdYj_ed"f^oi_YWb\kdYj_ed31 š 9EI0fW_d_dd[Ya"Whci"f^oi_YWbiocfjeci"\kdYj_ed$FioY^ebe]_YWbZ_ijh[ii"^[Wbj^YWh[ki["39 a Patient-Specific Problem Elicitation Technique7. reading.19.WdZl[hoi[l[h[". Less common but mentioned by more than 30% of effects) patients were looking into cupboards. work.and 2-factor solutions. working overhead. Because of previous reports of a 1-factor solution.1ib_]^jbo_cfhel[Z")$-1deY^Wd]["&$(. r for VAS test and retest groups of 0.e\_j[cic_ii_d]"Zh_l_d]ceije\j[db[\jc_ii_d]Zk[jedejWffb_YWXb[$21 š ?d?hWd"*(Z_ZdejWdim[hZh_l_d]"/h[WZ_d]$31 š Beea[Z\ehY[_b_d][÷[YjiXkjdejZ[j[Yj[Z$31 š .15.F9I"I<#).30-0.$19 š D:?Z_Zdej_dYbkZ[j^[[njh[c[ie\l[ho[Wioehl[hoZ_øYkbj_j[ci"cWdom[h[i_c_bWh_dZ_øYkbjoWdZj^[h[\eh[fej[dj_Wbboh[ZkdZWdj$24 š IeY_WbYedi[gk[dY[iWh[Wd_cfehjWdjfWhje\ikX`[YjÊii_jkWj_edj^Wj_idejc[dj_ed[Z_dD:?$1 š LWb_ZWj[Z\ehckbj_fb[eh_]_die\d[YafW_d"\kdYj_ed"WdZYb_d_YWbi_]diWdZiocfjeci$34 š .51 and 0. headaches.16. Construct. and sitting upright.66. work.

a summary of ments. SF-36 MCS. acupuncture. SF-36 PCS. a list of translations available cervical pillow. and tables listing mean change but did not include independent critical review of the NDI. Pain Coping Strategy Scale. 10-28. mobilization. Cervical Spine Outcomes Questionnaire. scales were self-care/activities of daily liv. VAS. This publication provided the Most recently. medication. Neck Pain and Disability Score. including manipulation. Disability Rating Index. range of motion. MCSS. ROM. HADS. the developer of the results from prognostic studies that use most extensive review of the NDI to date NDI published a comprehensive 17-year the NDI. Hospital Anxiety and Depression Scale. standing/running. and relaxation intensity. Short-Form 36 physical component scale. scores in effect sizes with different treat. ing (ADL).46 Abbreviations: CSQ. Multi- country Survey Study. laser.8. common content items on neck disability ed a historical perspective. exercise. NPAD. Neck Pain and Disability Scale. visual analogue scale. PSFS. work. PCSS. mild disability.32. Patient-specific Functional Scale. pain ent studies. 408 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Northwick Park Neck Pain Questionnaire. NPDS. DRI. Short-Form 36 mental component scale. and driving. appraisal. NPQ. psychometric properties from 22 differ. moderate/severe disability. 3041 š 7j(*ma"h[Yel[h[Z"'*1f[hi_ij[djfW_d"(.46 This review includ. a summary of therapy.37 from the MAPI website.

the methodology ies are random differences in samples. Sterling et al40 used data from Readability/Language and Cultural Germany). 16% of patients had items required for validity. There was only as having persistent pain had a score of translating and back translating. as they may have attained a the items. se. amount of time is required. had a score of 14 or less.” journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 409 . chronic populations. Danish. Finnish. (). to a ing interpretation of NDI scores: 0 to 4. although some Lindgren. Reliability There is considerable evi- lesser extent. near maximum score. For that reason. Few studies have specifi- reports. no disability. com. moderate disability. no explicit recommendations were the NDI. Wlodyka-Demaille measured the time taken to complete the those with moderate to severe disability et al53 reported that the concepts of “lei. and subsequently examples in these areas to make the while Wlodyka-Demaille et al53 reported defined a score of less than 15 as a recovery questionnaire more understandable for that it took a mean (SD) of 7.90 have been reported with an independent organization (www.46 the extreme ends of the scale. They reached consensus on dis. as well as ers identified that the driving item had of a total of 50.53 Through a series of utes (range. where 0 is over the spectrum of possible NDI scores. and greater than 35. NDI scores vary from 0 to 50. it may be difficult to deemed to be acceptable. idiomatic equivalence. a committee who reviewed the translator be measurable. However. the score may not be valid.1 if more than 2 items were left objectively distinguish further decline in patients require support to understand unscored. Stratford et Conversely. made by the developers on the handling al49 reported a floor-ceiling effect on the ability due to neck pain was of interest.31 Overall readability in the account for questions that are left unan.9 for scores that fall in the “severe disabil- a high rate of nonresponse in an Iranian provide a percentage score as a strategy to ity” interval. and nonresponse is more com. Italian.11 In the Swedish version. cultures. mon for task items like driving and. Lee et a single report of therapist burden stating 28 or greater. Possible reasons translation methodology (although the “normal limit” of the NDI between 0 for variation in reliability between stud- psychometric testing was not performed). For personal use only. or minimum number of items regarding “personal care” and “con- Journal of Orthopaedic & Sports Physical Therapy® the Spanish version.30 Again. their status.4 (6.” Some authors report that it comprehension difficulties.38 For example. No other uses without permission. no process was efficients above 0. (Australian/UK/US). mild disability. described. ?dj[hfh[jWX_b_jo%IkX]hekf_d]e\ cally addressed this issue. Nederland32 found that pa- ent meanings in French and American al43 reported that the time for patients to tients defined as recovered at 24 weeks Downloaded from www. however. 5 to 14.”47 Orig. with a severe pathology scored 48 out English and all translated versions is swered. Norwegian. for level and reliability was excellent. cess retained the sound measurement 5 minutes to analyze the (Dutch) NDI. recovered as having NDI scores of less addressed issues around readability.1. In the same study. where reliability co- The developer reported that he worked plete disability. and none have crepancies in 4 areas. Administration Burden Few authors than 4 (8%). The majority of authors report the NDI out both “low” or “normal” scores. that is.org at University of Washington on December 1. 15 dence to support appropriate retest re- Not all translations have been to 24.org) to produce additional rived and no validation of these categories large high-quality studies indicated lower translations through standardized was performed. some authors1. All rights reserved. they provided complete the NDI was about 3 minutes. IkccWhoe\Fh_cWhoIjkZ_[i French (Canadian/Switzerland/ validated. cutoff. experiential equivalence. ac- These translations included English behind setting this benchmark was not tual difference in clinical subpopulations. but these did has been suggested that if 3 or more items is difficult to observe change in scores at not appear to be related to educational are missing. if a subject population. 1-60 minutes). patients defined this population.10 In the study by Ackelman and of 50 on the NDI. including seman. that it takes 8-10 minutes to complete and cutoff of 20/50 at 3 years. invalid and for whom changes may not Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. and 20 points. and conceptual considered “no activity limitation” and 50 Riddle et al38 reported that 6% of patients equivalence. 25 to 34. Cook et al11 questionnaire could be completed in the as they represent patients for whom ensured good readability of the Brazilian/ waiting room and thus did not add any pain and disability estimates may be Portuguese version of the NDI through additional time to the patient’s visit. Vernon and Mior47 offered the follow.2 Oth. all agree that only a short as having scores of greater than 15 (30%). Miettinen30 used a recovery al24 concluded that their translation pro. clinical studies to define patients who had Translation A number of papers have Spanish (Spain/US). and this cutoff has not been study process.8) min.jospt. Stratford et al43 commented that the effects have practical clinical relevance. described for how these ratings were de. sure” and “social activities” had differ. Polish. One group of authors set reliability9. recent proqolid. reading. the subjects were not included. liability of the NDI in both acute and published in peer-reviewed literature. it centration.18 Floor-Ceiling Effect Floor and ceiling properties of the original English ver. demonstrated a ceiling effect when using Ackelman and Lindgren1 changed each inally.10 (J78B. those with mild disability as usually in the context of language and specifically address or state how they having scores of 5 to 14 (10%-28%). NDI. is considered “complete disability. In of missing items. 2014. However. whereas Hains et al17 and Vos et item of the tool to specify that only dis.34. and definitions of “stable. in a number of studies. :_÷[h[dj_WbEkjYec[i specifically analyzed how the MDC varies tic equivalence. and cultural translations. sion in the Korean version. More recently. vere disability. In contrast.

njhWYj[Z information on acuity and found test-re.48. 0.92 Long term 1.19. J78B.49 De.1-1.11.83.I IHC .85-0. tients’ perceived pain and psychological purported constructs (J78B. SRM (improved cohort): 1. 84% CI24 ability reported for individuals with an ES (total cohort): 0.98 on NDI versus 1. the Northwick Exercise 0. The highest estimate . which included stable pa. Change24 ES SRM spite a range of 2 to 10.80/0.0 studies based their definition of stable on ES –0.49.03 1.95 1. weaker correla. Responsiveness test reliability varied from an r of specificity.77 0.50.16. Only a few studies presented the SEM Change in group after 11 mo (no standardized Rx during gap) (compared   DF:?"DFG"L7IfW_d"L7I^WdZ_YWf"ceh[i_]d_ÓYWdjP value than all but or the MDC. convergent validity status.47.)).13 0.I IHC comes from Cleland et al.org at University of Washington on December 1. ES: 1.53 The Patient- Manipulation 1. making it difficult to deter. In a study of the Neck Pain and Disability Index response to botulin toxin Validity Construct convergent validity measured by Cohen d was 0.87-1. For example.38. No other uses without permission. For example.55 0. Response Burden.60/0.53 Others provided some FioY^ec[j_YFhef[hj_[i :WjW.66 ally high (0.40 and 0.53 Lower was reli.50 or 0.51.3/–4. The 410 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Summary of Responsiveness. 0. ES (improved cohort): 0. These Mean change –4.24/–0.2 points for their patient Stable –0.15/0.67 a –3 to +3 global rating of change.04 0. and Disabil- Medication 5 ity Rating Index have the strongest cor- relation with the NDI. Deteriorated –0.95.93 0. 84% CI42 terval was 72 hours from initial adminis. Administrative rate patients with acute or chronic neck pain. mine why variations in reliability exist. and an r of 0. the more common Overall 1.24.2/12.I IHC in a different order have been exception. the Neck Mobilization 7 Acupuncture 0. 84% CI42 tration. ES or SRM ES: 0.93 Journal of Orthopaedic & Sports Physical Therapy® an MDC of 10.17 estimate for MDC is around 5/50 (10%).10 who reported Improved 0. Improved 0.92 7 Pain and Disability Score.1.50. reported when the retest was based on a 7Ykj[ ?dikhWdY[BWj[ single occasion but administered in dif- ferent languages31 or presented questions .53 It was also reported by Hains et adequate convergent validity has been detected between the Hospital Anxiety al17 that the single pain item or the total demonstrated for a variety of measures and Depression Scale and the NDI indi.34 population with cervical radiculopathy.16.19.72)9 with acute and chronic neck pain. All rights reserved. For personal use only.67 –0.25 0.1 2. 84% CI42 acute condition when the test-retest in.66 points for their Overall 0. ICCs Change in 2 cohorts36 (similar to CWOM or FRS) Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.77 the lowest comes from Voss et al.88 at 4-6 wk5 tively. 2014. high-quality ES (total cohort): 0. Although dices.0 12.+ Cultural Adaptation.38.20. analogue scale (VAS) pain ratings.73 0.805 studies by Cleland et al9.I 9^Wd][_dD:?IYeh[i%+& pain (J78B.43.73 to CID CID: 5 points43 CID: 7 points10 0.6 9-10 Specific Functional Scale.5/4.91 study sample.30. 84% CI24 SRM: 1.04 1. Short term 0. cates that there is a link between the pa.1.40 for former and 0.91. score of the NDI both predicted visual with related constructs.1.50. The developer suggests that   DFG3 MDC be 5 points.49 who Change42 ES SRM reported an MDC of 1. many studies did not sepa. CID: 19 (sensitivity.25 –0.4 8-10 Park Neck Pain Questionnaire.11. respec.90).47.44 for patient and physician GRC16 has been established between the NDI and a variety of other questionnaires In a review of randomized controlled trials using the NDI46 that can be used for patients with neck .77.53 which is consistent with the similarity in their tions were observed with less similar in.24.5. *).10. [ LITERATURE REVIEW ] Some studies did not define these impor- tant factors.89 to 0. 84% CI42 SRM (total cohort): 0.10 suggested that Mehi[ KdY^Wd][Z ?cfhel[Z reliability may be lower than generally    FW_d%:_iWX_b_jo FW_d%:_iWX_b_jo FW_d%:_iWX_b_jo reported (ICCs of 0.68).99 in patients Downloaded from www.77 0.50 Two well-powered.17.73 0.jospt.16 tients with recurrent neck pain.

njhWYj[Z care” and “headaches” had the weakest Longitudinal validity correlation Comparisons: correlation (0.org at University of Washington on December 1. the items Administrative (continued) of “work” and “driving” had the strongest correlation (0.73-0.proqoulid. Spanish for Spain.” This 1. with input from the developer44 one-dimensional and may reflect the ex- Translations available on the MAPI website www. r = 0. Interitem corre- J78B.77). items are positively correlated with the Summary of Responsiveness. Italian. 0. Correlation of English and Turkish NDI the r for test and retest was 0.49   š HE9Ykhl["&$-.36 0. ICC = 0.17 Conversely. SF-36 physical.11 French53 brief disability scales which include ques- Agreement between translated version 69% identical answers. German for Switzerland.5. Finnish.5450 Correlation to PSFS. and items of “personal FioY^ec[j_Yfhef[hj_[i :WjW. English for the UK. Wlodyka-  m_j^ej^[hY^Wd][iYeh[i š =F.86.or 2-factor effect has been observed on other Cultural adaptation Language/cultural translation/validation Turkish.88. Response Burden. Similarly. 90%. French. Norwe- gian. 70% Nieto et al33 conducted a factor analysis With COM.45 “fulfilled in 6 min 08 s (54 s) by those with middle-high cultural level.$49 ity” and “Neck pain.jospt. For personal use only. Spanish for the United States and samples.601 found 2 factors that they termed “pain Correlation of NDI change to GRC change.733 the NDI (n = 521 patients) was available Response burden Time to complete: 9 min. sensitivity. 0. 3. No other uses without permission. Polish.WdZL7I&$+ scale may have 2 dimensions (pain and AUC.22 Iranian.33).54-0. and in 7 min 59 s (1 min 26 s) by those with low one (p .76 Correlation to Disability Rating Index. English for the United States. 0. –0.21 4 min.22 5 min (stated.5 (pooled). 0. 0.+ Cultural Adaptation. lations perform similarly well.83 change scores50 ing” and “functional disability.84). 2014. All rights reserved. based on a manuscript now measured)29 in press that was shared by the authors."&$*&"/+9?53 Demaille et al53 extracted 2 domains in   š 7K9"&$-/"/+9?53 the French version. VAS pain. ROC.com: English for Australia. total score (0. French Canadian. “Functional Disabil- Downloaded from www. 0. MID 10. Italian for Switzerland. specificity.344 and interference with cognitive function- Correlation to clinicians prediction of change.839 function) in some subgroups. not clear how/if for appraisal.   š F[Whiedr. tent to which pain and disability are sepa- Dutch. French for Switzerland. VAS in 150 patients with whiplash injury and activity.66 and A highly powered Rasch analysis of 0.21 Spanish. Portuguese. rate concepts across different pathologies German.88 tions about pain and disability treated as Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.95.38. Danish. 0.31 Portugese-Brazilian.” suggesting that the   š 9ehh[bWj_edD:?Y^Wd][jeD8G&$. Translated into Greek.

and work also with acute and chronic conditions. personal care. effect size. model. clinically important difference. treatment. COM. ROC. The item and whether stemming from a traumatic 2 factors. Short-Form 36 physical component scale. an acute population. with neck pain. PSFS. Rx. SRM. relation between the VAS and items on vides a relatively weak indication of struc. global rating of change. Dif- demonstrated that there is a higher cor. tions are ordered and might be reason- standardized response mean. CID. core whiplash outcome measure. while others suggest it has a single underlying construct. Neck Pain Disability Index. this statistic alone pro. GRC. CWOM. that the NDI items did not contribute to from a musculoskeletal or neural source. SF-36 physical. core surement by examining fit with a Rasch outcome measure. suggesting and chronic neck conditions. Some studies different person estimates. This study determined that there was a lack of fit of same study administered 7 different ver. ES.001)”2 An advantage of this approach is that it Journal of Orthopaedic & Sports Physical Therapy® formally assesses whether an instrument Adminstrative burden 5 min to analyze18 satisfies rules for interval scale mea- Abbreviations: AUC. the content validity has also been evaluated using fac.85. and values reported for other unidimensional demonstrated disordered thresholds. Items man and Lindgren1 included individuals ally exceeded . Ackel. global perceived effect. 0. disordered response thresholds. Cronbach’s B has gener. sional scale. However. NDI. able to determine whether response op- ing characteristic. neck disability index. Hains et al17 observed that all on headaches did not fit with the trait journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 411 . ferential item functioning was detected. GPE. NDI to the model and that 4 items had sions of the NDI. scales. Structural the same meaning across subgroups. In particular. Overall. VAS. which is consistent with on lifting. this approach is Northwick Park Neck Pain Questionnaire. NPDI. visual analog scale ably interpreted as numbers. deviation from model expectations. operating characteristic curve. Unidimensionality tests showed that dif- of the NDI has demonstrated validity in tor analysis in a small number of studies ferent subsets of items gave significantly evaluation of pain and disability in acute with inconsistent findings. and concluded that The NDI was developed as a unidimen. NPQ. Items on changing the order of the questions does sional disability instrument for patients headaches and recreation had significant not change the validity of the tool. receiver operat. whether have supported the NDI as a 1-dimen. indicating that some items did not have the NDI pertaining to pain and activity in tural validity or dimensionality. or nontraumatic injury. Pain-Specific Functional Scale.

90). [ LITERATURE REVIEW ] stable in this analysis. whereas shorter intervals may is a change of greater than 5 points. and how “sta- Neck Pain and Disability Visual Analog Scale 1999 ble” is defined within a given time win- Functional Rating Index 2001 dow. affect scores as test-retest intervals are that the NDI is not unidimensional. +). Only a few stud. In short-term test-retest intervals estimating error across shorter clinical with neck pain of neural origin. factors like coping. would contribute to measurement error in clini- detect change over time. Overall. pain or task difficulty remains consis- these requirements and provide interval. estimates that reflect different definitions tervals it is more necessary to establish of stable. Even if a patient’s severity of Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. ate self-reported changes are defined as clinically relevant indicators of measure- 412 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . by to their situation and use. have been ex- Neck Disability Index 1991 ceptionally high (0. known group validity. In longer test-retest in. self-efficacy. This suggests Dehj^m_YaFWhaD[YaFW_dGk[ij_eddW_h['//* that lower test-retest reliability across Disability Rating Index 1994 longer test intervals may reflect changes in scores due to the episodic nature of Downloaded from www. there is moderate extended. the level measurement of neck disability.23. but able evidence. The relative calibrated against multiple internal and to measurements of neck pain intensity. Only a few studies presented more cal recommendations regarding its use. re-evaluations. Some studies have used as much as Extended Aberdeen Spine Pain Scale 2001 a 5-week interval in patients defined as 8ekhd[cekj^D[YaGk[ij_eddW_h[(&&( stable to determine test-retest reliability. For example. patients with small to moder. This is with different test-retest intervals allow- J his study synthesized current research in 37 studies addressing often based on a –3 to +3 score on global ing users to select estimates most similar the psychometric properties of the rating of change instrument. importance of different psychometric external references. the absolute measurement error Recalibration of the disability experience varying from 0. be appropriate for consideration when compared to a 7-point change in patients tervals.. Therefore. that a revised 8-item NDI would satisfy to strong evidence for a spectrum of psy. chometric properties supporting use of tent over the course of several weeks. 9[hl_YWbIf_d[EkjYec[iGk[ij_eddW_h[(&&( While it is important to understand short- M^_fbWi^:_iWX_b_joGk[ij_eddW_h[(&&* term and long-term stability of clinical measurements. or social have good responsiveness when consid. early recovery.43 have stable scores in longer test-retest in. There is adequate evidence that the with longer test-retest intervals may be ference (CID) (J78B. most relevant. Journal of Orthopaedic & Sports Physical Therapy® standardized response mean (SRM). patients. with the NDI ad- Reviewed in a Previous Systematic Review 37 ministered in different languages31 or different question order. The reliability data cur- treated patients remain relatively stable rently published in the literature provide :?I9KII?ED over 1 to 3 days. definition.5. in different clinical subgroups. it is impor- the improved cohort (J78B. to 0.org at University of Washington on December 1. No other uses without permission. 2014. (Nonspecific Validated for the Neck) Scales on a single occasion. pose. when using the NDI support may contribute to alterations in ering the typical reported effect sizes or to evaluate clinical change in individual perceived disability over a 5-week period.42 This discriminative validation that tests differ.+). tant to consider that these variables do indicates that the NDI has good ability to ences between known subgroups.43. ICCs English Language (Specific-to-Neck) and 3 reported when the test-retest was based J78B.95 when measured in only disability.60 when measured in the and responsiveness should be considered in the absence of changes in pain intensi- total cohort of subjects with “mildly” dis. the NDI to differentiate different levels of timates of reliability in studies with larger orders. it should be appreciated that an increasing number of factors may captured by other items. All rights reserved.47 When neck NDI is stable over short test-retest time important to consider when designing pain is of musculoskeletal origin. Conversely. Thus.10 researchers often assume that most un.10. Thus. reliability studies ies calculated the clinically important dif. However. mediated by psychosocial influences and item version and had similar correlations eletal or neurogenic origin. cal studies. a CID intervals (0-3 days). a form of test-retest intervals.jospt. They concluded within the limits prescribed by the avail. Patient-Specific Functional Scale 1998 Copenhagen Neck Functional Disability Scale 1998 neck pain. when using ty is a plausible explanation for lower es- abling chronic whiplash-associated dis. neck pain with symptoms of musculosk. be more important. although it is less clinical research studies and reliability can be said to have occurred when there likely that patients defined as stable will studies. Conversely. NDI and was able to provide some clini. changes in Responsiveness The NDI appears to properties will vary according to pur. the patient has remained stable. The the NDI in patients with acute or chronic experience of pain and disability will be 8-item NDI was consistent with the 10. For personal use only.

comorbidity.10. ently subjective elements. This reviews only high-quality studies are research and practice. in both its original English format. Ceiling/floor effects have recommendations. respec. the Patient-Specific Functional Scale. We would also suggest that within tures that suggest that it has good clini- can increase the value of the MDC. estimates come from studies with lower struments. change to different instruments or in stand. tine evaluation of neck disability. These include its brevity and is not surprising that the larger MDC supplementing the NDI with other in. as study results score and based on the reliability coeffi. Although the first author of this review It is certainly true that no other in- ber of fully powered studies that vary on has addressed the critical appraisal of strument has undergone sufficient de- factors such as the type of intervention. none of the studies that per. we rank ordered studies by cal practice and research requires tre- that another instrument was superior to quality to allow the reader and ourselves mendous efforts in translation/cultural the NDI in terms of responsiveness. and around 5/50 (10%). suggesting that a num. For these reasons. teria for synthesis process for psychomet. interventions. this consideration.50 which is able to sample ness to detection of clinical change.25. While we tried to make the process gestion is that an 8-item version of the journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 413 . estimates are highly variable review incorporated critical appraisal.66 for their study. among studies. All rights reserved. Despite a range of 2 to 10. 10. the NDI has study population.26 there is no clear NDI as the instrument of choice in rou- nature of the neck condition will be re. which one assumes that the MDC is usually at 5 French. tural or language subgroups. is important to have outcome measures interval and the definition of stability on thus.org at University of Washington on December 1. However. of 40 or higher and 10 or lower might as the majority of translation and valida- the more common estimate for MDC is be considered problematic for detect. so it these ranges. If full-text papers written in only English or an MDC of 1. as objective as possible. responses. We summarized the information on psy. The quired to provide more precise estimates chometric evidence. we were able to extract data from English large range of MDC values are largely tively. Although the value of 5/50 or 10% commonly used cal presentation. neck pain. Further. 2014. There was agreement across studies on the findings from high-quality studies. any suggestions of that the NDI is easy to read and under. synthesized. clinicians should consider cal utility. different contexts and purposes. strated equivalence. nature of the questionnaire. therefore. when evaluating and measurement principles suggest that ent instruments and. One limitation of our review stems future studies may focus on whether the priate for most clinical comparisons that from the lack of agreed upon quality cri.49 who reported of what constitutes a ceiling or floor. varies across the subgroups and how Journal of Orthopaedic & Sports Physical Therapy® While the NDI has been shown to be ric studies. there are inher- being expressed in unit of the original dures for valid translation and demon. then scores have a substantial impact on our results. making the scale more amenable to that can be applied across different cul- reliability estimates. Despite variability. a mechanism to place greater emphasis adaptation and knowledge translation. method to synthesize the extracted psy. No other uses without permission. We don’t expect this limitation to included stable patients with recurrent points (and up to 10 points). chometrics and usefulness by adapting the NDI itself should be balanced with as well as in subsequent translations.4 Due to the brief must be taken within the context of the cient and the sample variability. individual studies by developing a tool velopment or validation to replace the length of follow-up. tion articles were printed in English. making it difficult to synthesize re- points. A second limitation in our review is with cervical radiculopathy. MDC tional Scale. and the for this purpose. change for patients with this type of clini. there are no levels of evidence that It is an important consideration that formed head-to-head comparisons of create clear categories for study quality. Neither previous systematic these variations are reflected on the NDI responsive. number of languages and its responsive- ICCs. est was from Voss et al. the NDI has a number of fea- deviations or lower reliability coefficients scale. and the low. The reasons for this ing worsening or improvement. the fact that it has been translated into a Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. The most recent sug- general.10 who reported an MDC of for confusion. and pur- veloper of the NDI suggests that MDC is 5 minimal administrative burden. this cross-validation is well under way. like the Patient-Specific Func. it is important to patients who score at the extremes may designed to comment on whether the see how the instrument performs across benefit from supplemental scales. It variations reflect the effects of test-retest items that are of high or low difficulty. For personal use only. Across the different studies. more.jospt. both least some of the recommended proce. The highest estimate comes from scoring variations can create potential sults from individual studies into global Cleland et al. ment error. although pose.2 points for their patient population been suggested as a potential problem. introducing new instruments into clini- different neck disability scales indicated Therefore. although evidence review did not attempt to evaluate differ. although either larger standard changes are relevant at these ends of the Overall. like NDI is the most responsive neck scale. Evidence suggests that smaller abstracts in non-English text. the SEM or the MDC. although there is no consistent definition that the scope of our search retrieved Downloaded from www. unclear. The de. However. in clinical situations appears to be appro. pain/disability experience in neck pain tend to occur over 2 weeks or less. published translations used at ers. In some systematic instrument is suitable for both clinical in different clinical circumstances. In and expanding a framework used by oth. was not an outcome measure.

Studies that determine NDI bench.1.45 Currently.45 <KJKH. 2014.23. The NDI is reliable. For personal use only. valid. German for Switzerland. These sive in numerous patient populations. and points change for whiplash-associated the possibility that the addition of items targets for important milestones. to the subjective categories reported current benchmarks set for normality whiplash-associated disorders. However. ficulty of obtaining official translations. cau. guese. Defining detectable change on the NDI. published with the validation manu. a ficiently validated nor occupational 414 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . so or comparing scores as to whether scores need for this revision.:. Patients who score in the range of ei- the work on MDC and CID is sparse and self-reported disability as reflected ther 40 to 50 or 0 to 10 on the NDI inconsistent. term goals. For example. English for the UK.48. would inform our understanding of 6. well as the expected outcome. 6. When the allow clinicians to provide more accurate 1.jospt. tion in score to demonstrate treatment gaps in defining clinically useful compar. should be considered as approaching a and important difference for different ceiling/floor effect (45/5. 3. are needed. Studies that compare the proposed re. French Cana- While the NDI may be considered the prognostication/outcome evaluation.24. 5. ies and may be up to 10 for cervical concepts for patients with neck pain or marks that clinicians could use with radiculopathy. veloped using a clinimetric process and by interested users when publishing and Spanish for the United States.DJ answered questions must be divided An advantage of this suggestion is that E<J>. there are interviewing that evaluate how pa. goals should require a minimum of 5 their relative priority. original pilot testing was conducted on a official manuscripts related to the 4. it is the responsibility of the clini. French. but should include and for levels of disability are arbitrary. Qualitative studies and cognitive be set for a minimum 7-point reduc- Perhaps. The NDI should be scored out of 50 ability benchmarks have not been suf- establish valid benchmarks. Because these are not ated. Swedish. Others have return to work.49.D:7J?EDI<EH percentage score used to adjust for un- and can be considered as interval data. Importantly. mechanisms for these to be obtained ish. thus.1. clinical contexts and in longitudinal be anticipated (2 weeks). MDC.43 ative data and benchmarks. as recommended by the developer. and to communicate more suffering from neck pain associated 7. Clinicians may relate an NDI score effectively with payers. Future studies on the psychometric clinical reports to ascertain what met- a practical issue for clinicians is the dif. as well as those Scale should be considered.L. literature and provide more accurate Dutch.10.43 a statement reflecting that these dis- and specific data analyses are needed to 2.10. commercial cian to seek out these tools by contacting 2. variations visions to the 8-item NDI in different timeframe where this change could in scoring exist in the literature. This leaves open sus different levels of disability. languages: English. Downloaded from www. chronic conditions.org at University of Washington on December 1. “gold standard” among outcome measures. Norwegian. supplementation of the NDI would be important in assisting clinicians including patients with acute and with a Patient-Specific Functional in using the NDI to set short. it is not clear whether translation process. this should be re-evaluated within a Journal of Orthopaedic & Sports Physical Therapy® prove performance. and in the literature.49 tion should be exercised when presenting studies are required to determine the 5. 9ED9BKI?ED%A. Authors should consider the availabil.OFE?DJI making it difficult to detect subsequent ferent prognostic variables on these would worsening or improvement. benefit clinical practice. All rights reserved. 40 range. sites provide translations in English developers. Korean. Short-term therapy weighs pain and disability according to confidence to indicate “normal” ver. No other uses without permission. properties in different clinical groups ric was used. Spanish for Spain. most importantly.9ECC. English for the United the accessibility of translations would needed to resolve confusion in the States. tients respond to items of the NDI benefits. moves to improve and CID in different subgroups are for Australia. clinical subgroups and the impact of dif. suggested that removal of items might im. longer-term treatment goals should are out of 50 points or 100%. very small sample. and up to 10 points might enhance performance. baseline score is outside of the 10-to- prognosis and outcome evaluation. Pol- is needed. Iranian. Portuguese. The MDC is accepted as 5 points. like disorder I and II. Published cultural validation has been as these are not usually published with state hypotheses that are to be evalu. dian. and Portu- script. including the specific psycho. it is easier to change to a reduced-item Caution should be used when reading instrument than a different one. and respon. 1. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Studies that determine SEM. The CID is approximately 7 points.1. metric property being analyzed. Italian. Finnish. for whiplash-associated disorder III.BEFC. [ LITERATURE REVIEW ] NDI performs as well as the original NDI H. Danish.and long. cervical radiculopathy.D:? by 2 to restore the expected metric. German. Because the NDI was not de.11.53 In addition. French for Switzerland. but varies considerably across stud- the NDI captures all of the important 4. are warranted and should specifically 3. performed for the NDI in the following validation studies. Ital- this review suggests further investigation ity of their translations and provide ian for Switzerland. respectively). the with musculoskeletal dysfunction. Finally. as Dutch. French.

(.$ Chok B. Documenting out.1016/j.1097/ Physiotherapy Singapore. Clair DA. dx. scale in patients with cervical radiculopa.27:801-807. A ment of functional outcome for cervical pain of the Neck Disability Index and Neck Pain and comparison of four disability scales for or dysfunction: a systematic review. J '&$ Cleland JA.34:284-287.1097/ version (NDI-DV): investigation of reliability in BRS.doi.0000221989. MacDermid JC.$ Makela M.29:E47-51. Horizon Estimation and consequences of chronic neck pain in Fin- Using SAS Software. 1998. Eur Spine J.25:3186-3191. Schrader H.112:94-99. Papageorgiou AC. Stewart metric properties of the Neck Disability Index ()$ Kumbhare DA. For personal use only. J Manipulative al.0b013e31815cf75b  /$ Cleland JA.doi. Disabil org/10. Chiro Med. and ankle instability: a systematic review.31:1841-1845.31:598-602.1197/j.93:317-325.D9.16:2111-2117. (/$ McCarthy MJ. cultural adaptation and validation of the Brazil.  ($ Andrade Ortega JA. Disorders. Elvers JWH.brs. general population. Translation and validation study of the IE. http://dx.39:358-362. Clin J Pain. Gomez E.27:515-522. NJ: Slack Inc.02. Richardson general population.17:165-173. Downloaded from www. Airaksinen O. Madson TJ. Al- '.jht. Lewis M. Spine. Aromaa A. The reliability and construct validity of the Neck Halaki M. 2007.doi. Disability Scale for measuring disability associ. Parkinson WL. 2004.doi.85:496-501. plete disability. Ferraz schrift Voor Fysiotherpie. Yakut Y. BMC Musculoskelet Disord.0b013e31817144e1 patients with chronic whiplash. 2004.1007/s00586-007-0503-y dex.1016/j.1:5-22. 2005.org/10.1016/j. Spine.doi. Sensitivity and specificity of outcome problem elicitation technique for measuring ))$ Nieto R.1080/09638280400020615 for cervical spine pathology: a narrative review. bil Med. Yagly N.org/10. Thorofare.22 org/10.  '-$ Hains F.0000257595. disorders. Evidence-Based Rehabili- Index and Neck Pain and Disability Scale. Evaluation of the core outcome measure in and Numeric Pain Rating Scale in patients with et al.doi. Fritz JM. Nederlands Tijd. In: Law M.doi.0b013e31817eb836  . 2000. Hoving JL. Zilvold G.$ Riddle DL. de Man Ther. No other uses without permission. San Antonio.35035. mechanical neck pain.brs. et al. )($ Nederhand MJ. Spine. 2007.1097/ 2008. Miro J. of a modified version of the neck disability in- Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 2008. have a score greater than 30. M. Evidence- vere disability. O’Leary EF. Bombardier C. 15 brs. 2004. Sievers K. Whitman JM.2004.009 )&$ Miettinen T. Spine. Wheeler AH. 5. Bu. Neck disability index Dutch 2007. Hobbs H.doi. Spine.130:85-89. DC. http://dx.4:113-134. J Manipulative Physiol Ther.3:16-19. Validity ('$ Kose G.1097/01. construct validity.doi. et  )$ Aslan E. The neck The possibility to use simple validated question- mecija Ruiz R. Arch Phys Med Rehabil. Psychometric proper.0b013e31815ce6dd BRS. and its validity compared with 8ekhd[cekj^Gk[ij_eddW_h[_dWiWcfb[e\ the short form-36 health survey questionnaire. http://dx. 2008:389-392. 2002. Bolton JE.102:273-281. '$ Ackelman BH. Spine. MacDermid JC. http://dx. The clinimetric qualities of patient. 25 and 34 se.org/10. Goldsmith CH.32:696-702.1097/01. The reliability of the Vernon and Mior neck of the Neck Disability Index and the Neck disability index. Turk Journal of Orthopaedic & Sports Physical Therapy®  *$ Beaton DE. (-$ MacDermid JC.org at University of Washington on December 1. (+$ MacDermid JC. 2002. Edmondston SJ. Richardson JK. http://dx. Lindgren U.75367. Guillemin F. non-traumatic neck pain. evaluation form. and Phys Med Rehabil. Silcocks P. Risk guide. Parnianpour M.1186/1471-2474-9-42 ). on outcome measures to hand therapy practice. Schipholt HJA. DeVellis RF.2007. 2008. eds.http://dx. 2008:387-388. Applying evidence that individuals who have recovered have Duquet W. Prevalence. ated with chronic. Hepguler S. Northwick Park neck pain questionnaire.H.I '+$ Gay RE. 2002. Leino E. Minimal clinically important change of math.07. factors for neck pain: a longitudinal study in the Based Rehabilitation. after whiplash injury. Stratford PW. Validity and reliability patients with chronic uncomplicated neck pain. http://dx.0000227268. Grevitt MP.126 org/10. Use of generic versus Disability Index and patient specific functional metric testing of Korean language versions region-specific functional status measures on journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 415 . 1994. thy. http://dx. Atamaz F. Rehabil.$ MacDermid JC.33:E630-635. http://dx. ability index. )+$ Pool JJ. disability have a score of 10 to 28. WJ. Measurement of cervical flexor whiplash. (&$ Humphreys H. Cross. Aras B.jospt. http://dx. and the Neck Pain and Disability Scale. Cieslak KR. Carey TS. Psycho. Spine.doi. 2006. endurance following whiplash.org/10.30:259-262. reliability Physiol Ther. Med Clin (Barc). consequences for clinical decision making. 2007. Subjective outcome assessments apmr.03.doi.0000201241. Delgado Martinez AD. Critical appraisal of study no disability.31:1621-1627. chbinder R. Coeytaux RR. Royuela A. Predictive value of fear avoid- MB.1097/01.doi. org/10. Sterling et al41 suggest ')$ Eechaute C. 2006.doi.32:E825-831. Braga L. http:// BRS. A “normal” score of between 0 to 20 org/10. 2007. (. Niere KR. J Rehabil Med. http://dx. All rights reserved. Whitman JM. Maher CG. eds. In: Law tinen et al. http://dx. Adams RD. Montazeri A.brs.a5 points. ance in developing chronic neck pain disability: adaptation of self-report measures. Huguet A. Simsek ties of the neck disability index. Spine. those with moderate to severe disability org/10. Iranian versions of the Neck Disability Index and validity of neck disability index in patients '.19:1307-1309.90914.18:245-250.005 Musculoskelet Disord.<.org/10. tation.2008.org/10. 1991. for psychometric articles. )'$ Mousavi SJ. '($ Croft PR. that a score between 0 and 4 represents 2006. Karaduwman A. demands considered. J Whiplash Rel )*$ Pietrobon R. of instruments to measure neck pain disability. determinants.32:3047-3051.$ Bovim G.16 quality for psychometric articles. Vaes P. 2008. Balsor B. http://dx. Asman S. 2014.89:69-74. Validity of the neck disability index. Pain. interpretation and 24 moderate disability. Arch 2000.21:75-80. with neck pain: a Turkish version study.org/10.doi.2007. http://dx.doi. Spine. Bolton JE. Oostendorp RAB. Spine. Am J Epidemiol. Spine. a NDI score of 8 or less.org/10. (*$ Lee H. org/10. 2008. et al.29:2410-2417. Standard scales for measure-  -$ Chan Ci En M.” has been suggested by Miet.08.1097/01. Critical appraisal of study quality disability. discussion 2418. 2003. Spine. J Reha. 2005. Bae SS. Lindgren KA. Hermens HJ. Sand T. [Validation of a Spanish version pain and disability scale: test-retest reliability and naires to predict long-term health problems of the Neck Disability Index].jmpt. Development and psycho. 2002. http:// 2004. T '*$ Foster GA. Bouter LM. Mior S.8:6. Maher CG.doi. Nicholson LL. Psycho. Impivaara O. SAS Global Forum 2008.1186/1471-2474-8-6 Knekt P.30 Vernon and Mior48 suggest ian Portuguese version of the Neck Disability M. Bago J.1097/BRS. 2001.9:42. Palmer JA. 2002. Spine. Waalen J. 2007. Van Aerschot L. 2007. land. Heliovaara M. Gretz SS.2340/16501977-0060 Vet HC. 5 and 14 mild disability. Stratford P. Green S. of 4 neck pain and disability questionnaires. of the Neck Disability Index in physiotherapy. http://dx. Pain.134:1356-1367.$ Goolkasian P. Thorofare.52 2008.org/10. Childs JD. Oder G. The reliability and application metric characteristics of the Spanish version Rating Scale for patients with neck pain.doi. dx. et al. Guidelines for the process of cross-cultural '/$ Hoving JL. disability associated with whiplash-associated whiplash patients: usefulness of the neck dis- ing clinically significant improvement. which represents “none to mild ''$ Cook C. ). Neck pain in the comes in neck pain patients.$ Heijmans WPG. Refshauge KM.$ Rebbeck TJ. Turkish patients with neck pain. Ijzerman MJ.33:E362-365.005 (($ Kovacs FM. Comparison G.53069. those with mild assessed instruments for measuring chronic J Hand Ther. Disability in subacute measures in patients with neck pain: detect. TX: 2008. The cultural adaptation. NJ: Slack Inc. the Neck Disability Index and the Numerical  . Ostelo RW. )-$ Resnick DN. and greater than 35 com. BMC org/10.

in general practice. Lionis CD.8:155-167. Spine. Beaton D. of the Neck Disability Index and validation of the +&$ Westaway MD.78:951-963.83:376-382. and the Journal website maintains a history of the quizzes you have taken and the credits and certificates you have been awarded in the “My CEUs” section of your “My JOSPT” account. Pain. No other uses without permission. For personal use only. The RFC program offers you 2 opportunities to pass the quiz.24399 pain.08.2 CEUs for each quiz passed. 2004.12598.1007/s00586-006- Ther. Koes BW. Mior S. Catanzariti *($ Stewart M.doi.?D<EHC7J?ED Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Eur Spine J.doi. Revel M.14:409-415.71056. 3.doi. Vernon HT. Take the quiz.jospt. You receive 0.2008. Reliability and @ Disability Index to make decisions concern. 2002. Evaluate the RFC experience and receive a personalized certificate of continuing education credits. Choose an article to study and when ready.doi.32:580-585.29:182-188. Physical and new insights into the measurement properties for neck pain: validation of a new outcome mea- psychological factors maintain long-term of the neck disability index.15:1729.jbspin.JOSPT.1002/ Downloaded from www. http://dx. Kenardy J. 2. Physiother Canada. 2004. Physiol Ther. Joint Bone Spine. Spine.6d whiplash-associated disorder. Rannou F.7:174-179. [ LITERATURE REVIEW ] patients with cervical spine disorders. the-art. The Neck Disability Index: state-of.2006. J Muscskel Pain.2003.org/10.doi. Westaway MD. Antono. Prescott P. French N. Phys **$ Trouli MN. 2007. Fermanian J. Paganas AN. *)$ Stratford PW. Rasch analysis provides +'$ White P. Arthritis Rheum.1016/j. Refshauge KM.$ Vernon HT. Riddle DL. poulou MD. Using the Neck */$ Vos CJ. 2014. Disability Index in patients with acute neck pain 1999. *. sure. pain. 2004. Spadoni 2000. 5. Kenardy J.1016/j. *+$ van der Velde G. The chometric properties of the Cervical Spine Greek version in a sample of neck pain patients. J Orthop research.AE *-$ Vernon H.01. ing individual patients. You may review all of your answers—including the questions you missed. Poiraudeau S.doi.61:544-551.1097/01. Hogg-Johnston S.org and click the link in the “Read for Credit” box in the right-hand column of the home page. impairment and sincerity of effort in ity scales for neck pain.9:106. Bogduk study of reliability and validity. Spine. Riley LH. art. 2009. Rannou F.004 org/10. WWW.spinee. Padfield B. http://dx. Catanzariti 2006. Spine J. Assessment of self-rated dis. http://dx. Maher CG. Nicholas M. The Neck Disability Index: a +)$ Wlodyka-Demaille S.doi. All rights reserved. ability measures for chronic whiplash.1186/1471-2474-9-106 use in persons with neck dysfunction.jospt. Jull G.014 *.org at University of Washington on December 1.1097/01. Jull G.005 Hurwitz E.org/10.122:102-108.$ Vernon H. +($ Wlodyka-Demaille S. http:// to standard assessment tools used in spine dx. Translation 0119-7 )/$ Skolasky RL. Char. org/10. 2008. 4. Tennant A. Revel M. Responsiveness of pain and dis.71:317-326. 3rd.org/10.ORG EARN CEUs With JOSPT’s Read for Credit Program Journal of Orthopaedic & Sports Physical Therapy® JOSPT’s Read for Credit (RFC) program invites Journal readers to study and analyze selected JOSPT articles and successfully complete online quizzes about them for continuing education credit. 1998. Lewith G. G.org/10. jmpt. http://dx. Go to www. responsiveness of the Dutch version of the Neck CEH. 1736.org/10. ity to change of three questionnaires for neck acterization of acute whiplash-associated dis. Verhagen AP. 2008. patient-specific functional scale: validation of its EkjYec[iGk[ij_eddW_h[WdZ_jih[bWj_edi^_f BMC Musculoskelet Disord. Stratford PW. 416 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .04. Binkley JM. Psy.org/10.27:331-338. J Manipulative Physiol Ther. Sports Phys Ther.07.0000105535. Login and pay for the quiz by credit card.1016/j. http://dx. Kakavelakis KN. 1991. ability. To participate in the program: 1. Fermanian J. 1998. JF.51:107-112. http://dx.org/10.006 dx. brs. The abil- *'$ Sterling M. http://dx.1016/j. 1991-2008. translation and validation of 3 functional disabil- 2007. predictive capacity post-whiplash injury. J Manipulative JF. http:// orders. The core outcomes *&$ Sterling M.BRS.29:1923-1930. Binkley JM.31:491-502.doi. 2006. Poiraudeau S.2006.doi. Arch Phys Med Rehabil. Vicenzino B.0000256380. click “Take Exam” for that article. Albert TJ.

Q The pain is very severe at the moment. Section 1: Pain Intensity Q I can’t concentrate at all. one box that applies to you. and I am slow and careful Q I can drive my car without neck pain. Q I have some neck pain with a few recreational activities. Q I need help every day in most aspects of self -care. Q I can lift only very light weights. although you may consider that two of the statements in Q I can concentrate fully with slight difficulty. Q I have no neck pain at the moment. are conveniently positioned. but no more. No other uses without permission. Q I have headaches almost all the time. Q I have a lot of difficulty concentrating. Q I can’t do any work at all. Q I can’t drive as long as I want because of moderate neck pain. Score __________ [50] Q I have slight headaches that come infrequently.org at University of Washington on December 1. Section 2: Personal Care Q My sleep is completely disturbed for up to 5-7 hours. Q The pain is fairly severe at the moment. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 417 . Q I can’t read as much as I want because of severe neck pain. Q I can do most of my usual work. ie. Q I can look after myself normally without causing extra neck pain. on a table. but I can manage light weights if Q I can read as much as I want with moderate neck pain. Q I have a great deal of difficulty concentrating. All rights reserved. Q I have some neck pain with all recreational activities. Q My sleep is moderately disturbed for up to 2-3 hours. Section 6: Concentration fects your ability to manage everyday-life activities.jospt. Q Neck pain prevents me from lifting heavy weights. Q The pain is very mild at the moment. Q I have moderate headaches that come infrequently. Q I can only do my usual work. Please mark the box that most closely describes your Q I have a fair degree of difficulty concentrating. Section 8: Driving Q It is painful to look after myself. Q My sleep is greatly disturbed for up to 3-5 hours. Q I can lift heavy weights without causing extra neck pain. Q I can hardly do recreational activities due to neck pain. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Howard Vernon at hvernon@cmcc. Q I can’t do my usual work. any one section relate to you. Section 9: Reading Q Neck pain prevents me from lifting heavy weights off the floor but I can manage if items Q I can read as much as I want with no neck pain. Q I cannot lift or carry anything at all. For permission to use the NDI. Section 4: Work Section 10: Recreation Q I can do as much work as I want. Q I do not get dressed. Copyright: Vernon H & Hagino C. Q My sleep is slightly disturbed for less than 1 hour.ca Q I have severe headaches that come frequently. Q I can read as much as I want with slight neck pain. but it causes extra neck pain. I wash with difficulty and stay in bed. Q The pain is the worst imaginable at the moment. Q I have no neck pain during all recreational activities. Q I can drive my car with only slight neck pain. Please mark in each section the Q I can concentrate fully without difficulty. 2014. For personal use only. Q I can drive as long as I want with moderate neck pain. present-day situation. Q I can’t read at all. 1991. Q I have neck pain with most recreational activities. Section 7: Sleeping Q The pain is moderate at the moment. Journal of Orthopaedic & Sports Physical Therapy® they are conveniently positioned Q I can’t read as much as I want because of moderate neck pain. please contact Q I have moderate headaches that come frequently. Q My sleep is mildly disturbed for up to 1-2 hours. Section 3: Lifting Q I can’t drive my car at all because of neck pain. Q I can hardly do any work at all. Q I have no trouble sleeping. Dr. Q I can hardly drive at all because of severe neck pain. Q I can look after myself normally. Q I can’t do any recreational activities due to neck pain. Downloaded from www. APPENDIX A NECK DISABILITY INDEX This questionnaire is designed to help us better understand how your neck pain af. but it gives me extra neck pain. Q I can lift heavy weights. but no more. Section 5: Headaches Patient Name _______________________________________ Date _____________ Q I have no headaches at all. Q I need some help but manage most of my personal care.

2014.org at University of Washington on December 1. Data Extracted Population studied Population Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.jospt. read the accompanying guide and be as specific as possible when extracting information. it is important to realize that the purpose of data extraction is to remove or extract the specific information reported by authors within a study. [ LITERATURE REVIEW ] APPENDIX B DATA EXTRACTION FORM FOR STUDIES EVALUATING THE PSYCHOMETRICS OF OUTCOME MEASURES Authors: _______________________ Year: ____________ Rater: ___________ Downloaded from www. Intervention Reliability Reliability (relative) Journal of Orthopaedic & Sports Physical Therapy® Reliability (absolute) Minimum detectable change (MDC) Content/structural validity Internal consistency Content validity Floor/ceiling effects Factorial validity Item response/Rasch analyses C1 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . not to evaluate the validity or value of that piece of information. Instructions When using the data extraction form. To make data extraction as useful as possible and to avoid the need for repeated data extractions. For personal use only. All rights reserved. No other uses without permission.

All rights reserved. Responsiveness Minimally clinical important difference (MCID) Usefulness/practicality Journal of Orthopaedic & Sports Physical Therapy® Readability Interpretability Time to administer Administration burden Cultural applicability © JC MacDermid 2008 journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C2 . Divergent Longitudinal validity Concurrent criterion Predictive criterion Responsiveness/clinical change Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.org at University of Washington on December 1. For personal use only.jospt. Construct/criterion validity Known groups Convergent Downloaded from www. No other uses without permission. 2014.

demographics. The accompanying extraction form can then be used to collect specific information about these psychometric or utility properties from particular studies. it is useful to collect information within standardized domains so that this information might be more easily synthesized to provide a summary of our collective knowledge about psychometrics and the utility of any given outcome measure. you may elect to present the synthesized data in a descriptive way by creating a summary table in each category. clinically important difference (CID). All rights reserved. and 4–14 days for chronic conditions. If you find some studies with similar designs. you may be able to conduct a meta-analysis of some properties such as minimal detectable change (MDC). it is best to be as specific as possible when extracting information. and intensity Journal of Orthopaedic & Sports Physical Therapy® studies of the intervention. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. how and where subjects were chosen. To make data extraction as useful as possible and to avoid the need for repeated data extractions. In summarizing what is known about the psychometrics and utility of outcome measures. Methods That Might Be Used to Collect This Property Information/Relevant Statistics Population studied Population A description of the study population Report meaningful demographics and indicators of the population studied: sample size. Some consider that sufficient time between tests is within 48 hours for acute conditions. In some cases. or standard error of measurement (SEM). server-based scale is used. No other uses without permission. There is no clear or easy method to synthesize this information. OR by the same individual compare to the variability between rater (intrarater). Reliability (relative) The extent to which variability in test scores on repeated ICCs and their associated confidence intervals tests of the same person are related to variability overall (including between people) C3 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . not to evaluate the validity or value of that information. pathology/ disorder. different test instruments (inter-instrument) are evaluated. and the follow-up interval Reliability Reliability description The extent to which the same results are obtained Test procedures or measures are typically reapplied on repeated administrations of the same measure when on repeated occasions in individuals considered to no change in status has occurred (reliability) or how have a stable condition during that timeframe in precise the scores are on repeated measurements which repeated testing occurs. or different raters (interrater) if a individuals.jospt. When using the data extraction form. acute versus chron- ic. Based on the findings of extraction.org at University of Washington on December 1. [ LITERATURE REVIEW ] DATA EXTRACTION GUIDE FOR STUDIES EVALUATING THE PSYCHOMETRICS OF OUTCOME MEASURES Instructions Psychometric studies may evaluate a wide spectrum of potential psychometric properties and aspects that relate to the utility or usability of outcome measures. and how it might be measured within a study. Listed here are an explanation of each property Downloaded from www. Relative reliability evaluates the extent to may be performed on different occasions (test- which variations on repeated assessment of the same retest) for self-report measures. The most common statistic used is the intraclass correlation coefficient (ICC) for quantitative data and Kappa for nominal data. Report the type of reliability evaluated and coefficients obtained. the following guide has been devised to describe the separate properties that might be evaluated in a psychometric study. it is important to realize that the purpose of data extraction is to remove or extract the specific information reported by authors within a study. Evaluation of the quality of articles (also called critical appraisal) is performed in a separate step. To this end. Studies will not address every aspect. Repeated testing (agreement). For personal use only. setting. Intervention Interventions (if applicable) applied for longitudinal Description of the nature. 2014. SEM is used to pres- ent a quantitative estimate of the reliability in the original units of measure. frequency.

content. Extract what was done and what was found. Report the number of factors derived and the extent to which these findings agree with the instrument structure or original factor structure. During translation. No other uses without permission.Factor analysis is conducted and compared to the tions surrounding constructs measured as defined by inherent structure of the instrument or factor analy- the measure or as indicated by subscale structure sis on which its construction is based. the content’s relevance.jospt. it is important to consider the population the necessary content to capture the concept of to which the measure applies. Factorial validity The extent to which factor analysis supports assump. SEM (possibly expressed as coefficient of varia- no change in status has occurred (reliability) or how tion in older articles) precise the scores are on repeated measurements (agreement). Floor-ceiling effects When the measure fails to demonstrate a worse score in Descriptive statistics of the distribution of scores patients who clinically deteriorated and/or an improved were presented graphically or numerically and score in patients who clinically improved indicate this. Extract the number and level of confidence this measure reflects the amount required to change before achieving a level of confidence that exceeds the random error that occurs in stable patients. and limits of 2 standard deviations are shown) Downloaded from www. Specific quantitative estimates of reli. For personal use only.org at University of Washington on December 1. listed. Report alpha and whether it relates to measured) the entire instrument or specific subscales. Note that different studies may define floor/ceiling differently. 2. specific study of the meaning of the questions to another cultural or language group was studied. 2. Patients and experts were involved during item selection/reduction. 4. Altman and Bland graphical technique where the ability in individual units portray the actual amount of difference on repeated tests for each individual is variation expected between repeated applications— plotted versus their mean score (and the overall reported in the original units of measure. All rights reserved. 1. Patients were consulted for reading and comprehension. 2014. the completeness of the interest. Content validity The extent to which the domain of interest is adequately A variety of techniques can be used to assess the sampled by the items in the measure. Content/structural validity Internal consistency The extent to which items on a test or subscale are Cronbach’s alpha is the inter-item correlation usu- related (an indication of the consistency of the concept ally reported. so extract how these attributes were determined as well as the percentage. MDC Based on the reliability and defined level of confidence. and emphasis. Cognitive interviews were done with patients to determine the meaning of items. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C4 . In assessing con. 3. Expert panels or Delphi procedures were used Journal of Orthopaedic & Sports Physical Therapy® to select items or evaluate the validity of the instrument. Reliability (absolute) The extent to which the same results are obtained on This may be reported as repeated administrations of the same measure when 1. extent to which items on a given measure reflect tent validity. Some of the techniques typically found are Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 5. Report the percentage of patients who sustained a floor or ceiling effect.

their mean scores.org at University of Washington on December 1. Known groups Known groups are constructed on theoretical assump. particularly if one is considered more accepted or rigorous. Predictive criterion Predictive validity is evaluated by determining the Extract the test names and correlations and time extent to which the results of administering an outcome interval measure at 1 point in time is associated with a future status or outcome.Extract the groups. and comparison of ability estimation. Concurrent validity is assessed by comparing a scale and its criterion at a single point in time. in accordance with the hypotheses. Constructed hypotheses “known groups” OR in relation to other indices that can assess convergent validity where measures are are similar (convergent) or different (divergent). therefore. Most commonly. Results are validity where it is assumed they measure different evaluated to determine whether they are acceptable constructs.70) correlations Divergent Relationship to scales/tests assumed unrelated Extract test names and correlations Longitudinal validity Relationship between change scores among similar Extract test names and correlations Journal of Orthopaedic & Sports Physical Therapy® scales/tests measured on different time points where change is expected Criterion validity description Criterion validation is determined by comparing a given Authors will state that their measure is being com- outcome measure to an accepted standard of measure. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. In thought to represent similar constructs or divergent each case. Concurrent criterion Comparing a scale and its criterion at a single point in Extract the test names and correlations time assesses concurrent validity. For personal use only. it is sometimes argued that there is no criterion and. or Items can be placed along a spectrum using item analyses a spectrum of the concept measured response theory or Rasch analysis.70) and moderate (0. hypotheses are formulated. Report whether these analyses were conducted and if they determined that items crossed a range of difficulty. the term comparability is used when scales measure the same subjective construct. Extract the test names and correlations. pared against a specific instrument and report the For subjective constructs such as pain and disability. consider grouping into strong (>0. and the difference significance is assessed. All rights reserved. when summa- rizing. In other cases. C5 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Rasch analysis can be used to evaluate Downloaded from www. Construct validity can be cross-sectional measured OR the expected differences between or longitudinal (predictive). Predictive validity is evaluated by determining the extent to which the results of administering an outcome measure at one point in time is associated with a future status or outcome.40–0.jospt. No other uses without permission. Analyses might address item difficulty. correlation or agreement between the measures. and level of tions that they should be different. validation focuses on construct validity. individual ability curves. differential item functioning (DIF)—report if done and whether items demonstrated DIF. Construct/criterion validity Construct validity Constructs are artificial frameworks that are not The extent to which scores relate to other mea- description directly observable. 2014. the order of item placement in a test and/or the discriminative ability of specific items are defined. Convergent Relationship between similar scales/tests Extract test names and correlations. [ LITERATURE REVIEW ] Item response/ Rasch The extent to which items cross a range of difficulty. Construct validity is the extent to sures in a manner consistent with theoretically which measures perform according to a priori defined derived hypothesis concerning the domains that are constructs.

Hypotheses are formulated and results are in agreement. patients perceive to be beneficial mandates a change minimally clinical important difference (MCID). Validation of subjective categories of outcome: excellent/ good/fair. CID The difference in scores in the domain of interests that Extract the minimally important difference (MID). All rights reserved. MCID. Comparative data for relevant subgroups (acute versus chronic. to extract the specific scoring algorithm for an instrument and compare it to others. Subjects are evaluated at size. in patients’ management. Responsiveness/clinical change Responsiveness The ability to detect important change over time in the Extract indicators of responsiveness. etc) 2. during a systematic review. stable. This must be done in a systematic way and the method should be documented. grade level can be assigned in some software Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. tance. improved (or have improved an important amount) is or worse. and the method for 2 points in time and change in patients who had assessing whether patients were improved. If this is not specifically reported in the text of the study. compare scoring algorithms and administrative burden and report this.jospt. MID. Downloaded from www. Usefulness/practicality Readability The questionnaire is understandable by all patients. Extract study data on scoring method burden Authors provide specific information on the ease of scoring. if appropriate. For example. 2014. it cannot be extracted. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C6 .org at University of Washington on December 1. normal/abnormal Journal of Orthopaedic & Sports Physical Therapy® Time to administer Authors provide specific information on time to administer. Extract time Administration burden Ease of method used to calculate the questionnaire’s score. including effect concept being measured. and CID are 3 different terms for How clinically meaningful is defined may vary (global the same concept. For personal use only. No other uses without permission. Authors State method and results obtained provide specific information on readability as evaluated by the target population or specific tools are applied to evaluate readability (eg. Information is provided about or CID and the method/cut-off used to define impor- what difference in score would be clinically meaningful. packages) Interpretability State the type of categorization and relevant scores The degree to which one can assign qualitative meaning to and. Authors analyze subgroup differences provide information on the interpretation of scores: 1. However. rating of change with different methods of rating or establishing cutoffs for importance are often used). standard response mean. compared across different measures (and/or to patients who have not improved). multiple raters should evaluate the relative com- plexity of these scoring algorithms. the method used to validate/ quantitative scores or define subgroups by scores.

For personal use only. 2014. including a structured translation process that includes forward and backward translation and retesting Downloaded from www. No other uses without permission. © JC MacDermid 2008 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. an annotation should be made to note that the psychometric property applies to an adapted version. Highlight the results of cross-cultural adaptation as well as any languages/cultural adaptations that have been reported. they can be documented in the appropriate data collection boxes above. and a formal evalua- tion of the meaning of items in different cultures. All rights reserved. a number of analyses may be performed.org at University of Washington on December 1. reliability of the translated version. [ LITERATURE REVIEW ] Cultural applicability The extent to which an instrument contains items that will Extract language/cultural translation performed be meaningful across different cultural groups or subgroups (and relevant findings) for which the measure is relevant or may be applied. however.jospt. If specific reliability and validity values are reported. Journal of Orthopaedic & Sports Physical Therapy® C7 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . In cross-cultural adaptation. Cross- cultural adaptation may be used to translate measures into different languages and to convert any items having specific meaning to culturally relevant items.

2014. Were appropriate statistical tests conducted to obtain point estimates of the psychometric property? 11. benchmark comparisons. 3. standard error of measurement [SEM]. minimally important difference [MID])? journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C8 . Were analyses conducted for each specific hypothesis or purpose? 10. Was appropriate retention/follow-up obtained? (Studies involving retesting or follow-up only) Journal of Orthopaedic & Sports Physical Therapy® Measurements 7. Was an appropriate sample size used? 6. No other uses without permission.Were appropriate ancillary analyses done to describe properties beyond the point estimates (confidence intervals [CI]. Was the relevant background research cited to define what is currently known about the psychometric properties of the measures under study and the need or potential contributions of the current research question? Study design 2. Standardized methods: Were administration and application of measurement techniques within the study standardized. CRITICAL APPRAISAL OF STUDY DESIGN FOR PSYCHOMETRIC ARTICLES: EVALUATION FORM Authors: _______________________ Year: ____________ Rater: ___________ Evaluation Criteria Score Downloaded from www. Were appropriate inclusion/exclusion criteria defined? Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.jospt.org at University of Washington on December 1. Study question 2 1 0 1. Was an appropriate scope of psychometric properties considered? 5. All rights reserved. For personal use only. and did they consider potential sources of error/misinterpretation? Analyses 9. Were specific psychometric hypotheses identified? 4. Documentation: Were specific descriptions provided or referenced that explain the measures and their correct application/interpretation (to a standard that would allow replication)? 8.

org at University of Washington on December 1. [ LITERATURE REVIEW ] Recommendations 12. and multiply by 100 to get the percentage score. divide by 2 times the number of items. and results? Subtotals (of columns 1 and 2) Downloaded from www. © JC MacDermid 2008 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Journal of Orthopaedic & Sports Physical Therapy® C9 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . if for a specific paper an item is deemed inappropriate. Were the conclusions/clinical recommendations supported by the study objectives. then sum items. For personal use only. No other uses without permission. Total score % (sum of subtotals/24 × 100). analysis. or. All rights reserved.jospt. 2014.

0 The scope of psychometric properties was narrow as indicated by evaluation of only 1 form of reliability or validity.org at University of Washington on December 1. Presented a critical and unbiased view of the current state of knowledge. but without additional information. If there is no documentation of an action. as well as both relative and absolute reliability [eg. For example. Information on the type of patients is briefly defined. 1 Some. Study design 2 2 Specific inclusion/exclusion criteria for the study were defined. and the rationale was not founded on previous literature. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C10 . however. convergent/divergent associations). A detailed focus on reliability that includes multiple forms of reliability (at least 2 of intrarater.jospt. internal consistency and 1 test/form of validity). standard error of measurement (SEM). 1 Two or more psychometric properties were evaluated. but a clear rationale was provided for the research question. treat it as not done. convergent/divergent) being tested. 0 No information on the type of clinical settings or study participants is provided. as well as expected relationships (strength Journal of Orthopaedic & Sports Physical Therapy® of associations) or constructs. 1 All of the above criteria were not fulfilled (little reference to previous research and present gaps in knowledge). 4. age/sex/diagnosis and the name or type of the practice is given. demographic information was presented. CRITICAL APPRAISAL OF STUDY DESIGN FOR PSYCHOMETRIC ARTICLES Interpretation Guide To decide which score to provide for each item on a quality checklist. 3 2 Authors identified specific hypotheses. but not clearly defined in terms of specific hypotheses. the practice setting was described. responsiveness. 1. and minimally important difference (MID)] 2. which included the specific type of reliability (intra/interrater or test- retest) or validity (construct/criterion/content. All rights reserved. indicating what is currently known about the psychometric properties of the instruments or tests under study from previous research studies. construct (known group differences. evaluative or discriminative properties 3. Question Score Descriptors Study question 1 2 The authors: Downloaded from www. Indicated how the current research question evolves from a gap in the current knowledge base. 2014. A priori hypothesis definitions are provided of level of reliability and for validity. and specific hypotheses on reli- ability and validity were not stated. 1 Types of reliability and validity being tested were stated. (“The purpose of this study was to investigate the reliability and validity of…” can be rated as zero if no further detail on the types of reliability and validity or the nature of specific hypotheses is provided. read the following descriptors. interrater. and appropriate Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. expert review/survey or qualitative interviews) or statistical (eg. Some aspects of both reliability and validity were examined concurrently. For personal use only. yielding a study group generalizable to a clinical situation. criterion (concurrent/predictive).) 4 2 An appropriate scope of psychometric properties would be indicated by: 1. but is insufficient to allow the reader to generalize the study to a specific population. intraclass correlation coefficients (ICC). A detailed focus on validity that includes multiple forms of validity (content) judgmental (structured. scope was narrow and did not meet the above criteria (eg. Performed a thorough literature review. No other uses without permission. 2. but not all information on participants and place is provided. and established predictive. factor analyses). longitudinal/concurrent. 0 Specific types of reliability or validity under evaluation were not clearly defined. eg. 0 A foundation for the current research question was not clear. test- retest). 3. Established a research question based on the above. Pick the descriptor that sounds most like the study being evaluated with respect to a given item.

Measurements 7 2 The authors provided or referenced a published manual/article that outlines specific procedures for administration. 0 Less than 70% of the patients eligible for study were reevaluated. and test procedures. If no manual is referenced. Analyses 9 2 Authors clearly defined which specific analyses were conducted for each of the stated specific hypotheses of the study. Post-hoc power analyses and/or confidence intervals (CI) confirm that the sample size was sufficient to define relatively precise estimates of reliability or validity. were performed in a standardized way. 0 Data were not presented for each hypothesis or psychometric property outlined in the purposes or methods. For impairment measures. any special equipment required. but minimal attention or description of the extent to which the above standards were maintained. scoring (including how scoring algorithms handle missing data). calibration of equipment if necessary. 8 2 All of the measurements. and examiner procedures/actions. For simple reliability/validity statistics if sample is greater than 100 but without justification for sample size. or by demarcat- ing which analyses addressed specific psychometric properties. 0 Size of the sample was not rationalized or is clearly underpowered. use of consistent measurement tools and scoring. For personal use only. this includes calibration of any equipment. 1 Data were presented for each hypothesis. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 0 No description of the extent to which the above standards were maintained or an obvious source of bias in data collection methods. 2014. C11 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . or a limited description of procedures is included in the text.org at University of Washington on December 1. training required. Downloaded from www. a priori exclusion of any participants likely to give invalid results or unable to complete testing (not exclusion after enrollment). cost. but authors did not clearly link analyses to hypotheses. this is characterized by a statement of who administered the forms and by what process (definition of rules for exclusion of forms [eg. use of standardized instructions. No other uses without permission. 1 The authors provided a rationale for the number of subjects included in the study. too many missing items] or individuals [language. including test administration and scoring. This may be accomplished through organization of the results under specific subheadings. comprehension. etc]). 1 Procedures are referenced without any details. Journal of Orthopaedic & Sports Physical Therapy® 1 No obvious sources of bias in how tests were performed/administered. but did not present specific sample size calculations or post-hoc power analyses. [ LITERATURE REVIEW ] 5 2 Authors performed a sample size calculation and obtained their recruitment targets. Data were presented for each hypothesis/research question. For self-report. 6 2 Ninety percent or more of the patients enrolled for study were reevaluated.jospt. then the text describes the procedures in sufficient detail so they can be replicated. and interpretation of test procedures that includes any necessary information about positioning/active participation of the subject. 1 More than 70% of the patients eligible for study were reevaluated. 0 Minimal description of procedures is provided and without appropriate references. All rights reserved.

but suboptimal choices were made in other analyses. ICCs for quantitative data. Reliability (Relative. Validity a. Clinical relevance—minimal detectable change (MDC). All rights reserved. MID 3. with post-hoc tests that adjusted for multiple testing 4. but not both. OR authors made conclusions and clinical recommendations for only some of the study hypotheses. or other correlations if appropriate b. Validity tests of significant difference—an appropriate global test such as analysis of variance was used where indicated.org at University of Washington on December 1. © JC MacDermid 2008 journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C12 . 10 2 Appropriate statistical tests were conducted: 1.jospt. absolute . data. No other uses without permission. 2014. SEM or plot of score differences versus average score showing mean and 2 standard deviation (SD) limit—Altman and Bland) 2. Spearman rank correlations for ordinal Downloaded from www. and conclusions OR recommendations contradicted the actual data presented. associated CI. Appropriate confidence intervals (CI) 2. SEM Correlation matrices for validity analysis may not require that each individual correlation be presented with its Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Recommendations 12 2 Authors made specific conclusions and clinical recommendations that were clearly related to the specific hypotheses stated at the beginning of the study and supported by the data presented. Journal of Orthopaedic & Sports Physical Therapy® 0 Authors made vague conclusions without any clinical recommendations. Responsiveness—standardized response means or effect sizes or other recognized responsiveness indices were used 1 Appropriate statistical tests were used in some instances. 0 Inappropriate use of statistical tests 11 2 For key indicators such as reliability coefficients indices. Comparison to appropriate benchmarks or standards 3. For personal use only. Kappa for nominal data). CIs and benchmarks should be used according to standards for this type of analysis. however. 0 Benchmarks or CIs were either used inappropriately or not included. at least 2 of the following were presented: 1. 1 Either CIs or appropriate benchmarks were used. Validity associations—Pearson correlations for normally distributed data. but basically supported by the study data. 1 Authors made conclusions and clinical recommendations that were general.