Measurement Properties of the Neck
Disability Index: A Systematic Review
Downloaded from www.jospt.org at University of Washington on December 1, 2014. For personal use only. No other uses without permission.

t is estimated that a third of all adults will experience neck pain However, the episodic, fluctuating nature

I during the course of 1 year,12 and 70% is the approximate lifetime
prevalence.6,28 About 19% of the population may suffer from chronic
neck pain at any given time,6 creating a substantial societal burden.
Monitoring outcomes is a key component of monitoring the effects of
evidence-based care, in justifying services, and in program evaluation.
of neck pain and the lack of clear and con-
sistent physiological findings complicate
this process. A fundamental component
of monitoring outcomes is having reliable
and valid tools with known measurement
properties.27 In the case of neck disorders,
self-reported pain and disability are usu-
TIJK:O:;I?=D0 Systematic review of clinical group differences or absolute reliability (standard ally the primary focus.
measurement. error of measurement [SEM] or minimum detect- Isolated studies on outcomes mea-
Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

TE8@;9J?L;0 To find and synthesize evidence
able change [MDC]). Most studies suggest that the sures provide a context and method-
NDI has acceptable reliability, although intraclass specific view of the measure’s ability to
on the psychometric properties and usefulness of
correlation coefficients (ICCs) range from 0.50 to
the neck disability index (NDI). provide useful clinical information. It is
0.98. Longer test intervals and the definition of sta-
T879A=HEKD:0 The NDI is the most commonly ble can influence reliability estimates. A number
only by synthesizing information from
used outcome measure for neck pain, and a of high-quality published (Korean, Dutch, Spanish, multiple studies that we can understand
synthesis of knowledge should provide a deeper French, Brazilian Portuguese) and commercially how a measurement tool performs across
understanding of its use and limitations. supported translations are available. The NDI is different contexts and applications. The
TC;J>E:I7D:C;7IKH;I0 Using a standard considered a 1-dimensional measure that can be synthesis of larger pools of data is a mech-
search strategy (1966 to September 2008) and interpreted as an interval scale. Some studies
anism to provide more stable estimates of
question these assumptions. The MDC is around
Journal of Orthopaedic & Sports Physical Therapy®

4 databases (Medline, CINAHL, Embase, and
5/50 for uncomplicated neck pain and up to 10/50 measurement errors and benchmarks for
PsychInfo), a structured search was conducted
and supplemented by web and hand searching. In for cervical radiculopathy. The reported clinically change/outcomes. In fact, 2 systematic
total, 37 published primary studies, 3 reviews, and important difference (CID) is inconsistent across reviews have been conducted that ad-
1 in-press paper were analyzed. Pairs of raters con- different studies ranging from 5/50 to 19/50. The dress self-report measures for neck pain.
ducted data extraction and critical appraisal using NDI is strongly correlated (0.70) to a number
Both compared different outcome mea-
structured tools. Ranking of quality and descriptive of similar indices and moderately related to both
physical and mental aspects of general health. sures using a semistructured process and
synthesis were performed.
concluded that the Neck Disability Index
TH;IKBJI0 Horizon estimation suggested the T9ED9BKI?ED0 The NDI has sufficient support
and usefulness to retain its current status as the (NDI) was the most commonly used self-
potential for 1 missed paper. The agreement
most commonly used self-report measure for report instrument for evaluating status
between raters on quality assessments was high
(kappa = 0.82). Half of the studies reached a qual- neck pain. More studies of CID in different clinical in neck pain clinical research.34,37 Both
ity level greater than 70%. Failures to report clear populations and the relationship to subjective/ reviews alluded to the psychometric and
work/function categories are required. J Orthop
psychometric objectives/hypotheses or to rational- clinical properties of the tools, but neither
ize the sample size were the most common design Sports Phys Ther 2009;39(5):400-417. doi:10.2519/
jospt.2009.2930 attempted to deal with specific properties
flaws. Studies often focused on less clinically
or to synthesize this knowledge. The de-
applicable properties, like construct validity or TA;OMEH:I0 cervical spine, outcome measure,
group reliability, than transferable data, like known reliability, validity veloper of the NDI published a summary
paper in 2008 summarizing a 17-year his-

Co-director, Hand and Upper Limb Centre Clinical Research Laboratory, St Joseph’s Health Centre, London, ON, Canada; Associate Professor, School of Rehabilitation Science,
McMaster University, Hamilton, ON, Canada; Funded by a New Investigator Award, Canadian Institutes of Health Research. 2 Lecturer and PhD Candidate, School of Physical
Therapy, The University of Western Ontario, London, ON, Canada; Funded by a Doctoral Fellowship, Canadian Institutes of Health Research. 3 Physiotherapy Student, School of
Physical Therapy, The University of Western Ontario, London, ON, Canada. 4 Emeritus Professor, Department of Clinical Epidemiology and Biostatistics, McMaster University,
Hamilton, ON, Canada. Address correspondence to Dr Joy MacDermid, Co-director, Hand and Upper Limb Centre Clinical Research Lab, Monsignor Roney Ambulatory Care
Centre, 930 Richmond Street, London, ON, N6A 3J4 Canada. E-mail: jmacderm@uwo.ca

400 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy

tory with the NDI.46 The author stated
that “The Neck Disability Index has been Search Strategy
used in over 300 publications, translated Key words: (NDI OR Neck Disability Index) AND (reliability OR validity OR
responsiveness OR clinically important difference OR MCID OR Rasch OR
into 22 languages…and endorsed for use factor analysis OR translation OR validation).
by a number of clinical practice guideline Dates: January 1966 to September 2008
committees…making it the most widely Other: Google searches and hand searches of retrieved study references lists
used and most strongly validated instru-
ment for assessing suffering disability in
patients with neck pain.” Located citations (n = 189):
Downloaded from www.jospt.org at University of Washington on December 1, 2014. For personal use only. No other uses without permission.

While previous reviews have claimed D mbase (n = 66)
D "edline (n = 63)
to be systematic, none have included a
D INAHL (n = 66)
key element of systematic review, criti- D %sycInfo (n = 1)
cal appraisal of the quality of individual D and searching (n = 2)
studies. This may reflect inherent dif- D ontact author (n = 1)
ficulties in performing the critical ap-
praisal due to lack of instrumentation.
(3=5/ +,<=;+-=;/?3/@7 
The lead author of this current review
has published a scale and interpretation
guide25,26 for this purpose. Psychometric Unique articles accepted for full A-5>./.+=+,<=;+-=;/?3/@
studies are important to establish mea- review (n =  ): D #8.+=+8;78=;/5/?+7= unique)
Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.

surement properties like the relative dif- D mbase (n = 30) D tudy design criteria (n = 0)
ficulty of items, appropriate grouping of D "edline (n = 28)
D #!7

items into subscales, reliability, validity, D and searching (n = 2)
and responsiveness. In addition to psy- Horizon estimation of database
D $ther (n = 1) </+;-2;/=;3/?+5
chometric properties, clinicians are con-
cerned with issues on feasibility, floor/
ceiling effects, availability of different Included: A-5>./.+0=/;0>55=/A=;/?3/@7 = 2):
language/cultural adaptations, and ad- D tructured reviews (data No data on NDI (n = 2)
ministration burden for themselves and extraction) (n = 3) !+71>+1/

Terms like “clinician/pa- D ata extraction (n = 38) excluded from critical appraisal tient friendliness.  7   clinical practice. “usefulness” have been used in reference to these practical considerations. they are concerned with +3.  7   formation on usefulness.<=. When thera- pists try to integrate the NDI into their Quality summary of appraised primary papers (n = 36): %88.  7 . 88.B188. The purpose of */. as we prefer.3=3-+5+99.” or. D >55=/A=-.@3=27153<2 D %rimary studies Journal of Orthopaedic & Sports Physical Therapy® +.+3<+5736) kept for data extraction plicability.” or “clinical utility/ap.  7  the psychometric issues but also need in.+-=7877153<2=/<= their patients.

<?=KH. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 401 . and 5 new C. and 2 of which were modified. A-/55/7=( 7  view that would summarize the quality and content of current research regard. and submitted this to a consult. each item is summed for a score varying L the NDI using the Oswestry Low Back Pain Index (OLBPI) as a tem- plate for identifying items and a scoring These 10 items were modified for clarity and relevance based on feedback from 5 individuals with a history of whiplash in- from 0 to 50. (full disability). The consulting team added 4 a 6-point scale from 0 (no disability) to 5 :[l[befc[dje\j^[D:? items: headache. the unanswered (7FF. driving. sleep. reading.J>E:I sex life. resulting in the 10-item scale. this study was to conduct a systematic re.47 The questions are measured on ing team. ing the psychometric properties and use- fulness of the NDI. personal care. In the end. The numeric response for ernon and Mior47 developed and work. particularly as a strategy to deal with questions left metric. They initially selected 6 items jury and the review team. items.47 Some evaluators choose to provide a score out of 100%. lifting.The systematic review evidence flowchart. from the original scale: pain intensity. NDI contained 5 items from the OLBPI.D:?N7). concentration.

original manuscript. Journal of Orthopaedic & Sports Physical Therapy® the following inclusion criteria: reported clarification of the meaning or inter. who also developed an there was no formal mechanism to weight 2008 (<?=KH. interventions. leaving unequal ?Z[dj_ÓYWj_ed tools. 2014. then CINAHL.26 All authors met for a calibra. and 1 unpublished documented in the manuscript it was reporting specific psychometric hypoth- in press manuscript proceeded to a full treated as “not done”). and time intervals. was variable. with 37% of papers reaching or exceeding Dutch. the study. although between January 1966 and September the first author. First. and in a previous psychometric review by sidered this ranking when making con- Embase. The data extraction form was de.25.IKBJI responsiveness OR clinically important the lead author. tions of the extent to which the study a score of 75% on the quality rating (J78B. CINAHL. lish abstracts and non-English full text [1 3. with the default compromise of as. 97% yield). Due to the reviewed 1 paper then met and discussed 2 points.I) through 5. which were inde. (). abstract/title and full review are noted in vidual items varying from 0.34.M. 4. their disagreement. Two neck disabil- by Foster and Goldsmith for SAS soft. We conducted a Horizon esti. raters clarified if the disagree. Critical appraisal forms and associated source document. Index) AND (reliability OR validity OR vidual study quality were developed by H. surement (SEM). an adjudicator was required. stract data extraction but not full critical manuscript complied with item re."J78B. veloped by adaptation of categories used rank ordered studies on quality and con- ing Medline. first. heterogeneity of study populations and each item to clarify the meaning and in. Raters then discussed their percep.. gible studies: (NDI OR Neck Disability interpretation guide used to assess indi- Downloaded from www. The number of scores of rater pairs.jospt. of individual studies.org at University of Washington on December 1. absence of error estimates such as confi- cess was conducted by pairs of raters and ers decided if they were comfortable dence intervals or standard error of mea- was based on the use of structured instru.').25.37 Similarly. Factual content/oversights (the although neither included formal critical Charlie Goldsmith). PsychInfo did not contribute any infor- the <?=KH. Interrater reliability on quality ratings was calculated based on preconsensus J a possibility that our search strategy missed 1 article (95% confidence interval. Quality of the individual studies written in French or English (2 with Eng. we conducted Google searches and hand searches of retrieved associated ratings for individual studies. ware (full details available from author item. Pairs of raters consulted the first au. 78B.D:?N 8" 7L7?B. pairs of raters indepen. no meta-analyses terpretation of critical appraisal items. articles were missed and the efficiency of 1.(. quirements (if something was not the psychometric articles were (1) not 3 structured reviews. and on at least 1 psychometric property of the pretation of specific items of the scale addressed different psychometric prop- NDI in patients with neck pain and was if they felt this was contributing to erties. or the extent of compliance with the ity instrument reviews were identified. If the of the findings for psychometric proper- process. a recent compre- The first stage of study identification ment) were resolved by recheck of the hensive review of the NDI by the develop- was of title/abstracts. Each paper’s score was converted into a spectrum of psychometric proper- dently evaluated an assigned subset of a percentage because 1 item was based on ties.13 with additions from clusions and recommendations. studies provided specific conclusions or 402 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . [ LITERATURE REVIEW ] B_j[hWjkh[I[WhY^WdZIjkZo extraction forms and quality appraisal ies are cross-sectional. At this point. 41 studies were identified that Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. and (3) The data extraction and review pro.).82. with agreement on indi.'). conclusions. 0-2. Most studies addressed Following this. The following keywords accompanying guide (7FF. In addition. 37 primary studies. based on the quality of the were used to search all databases for eli. and then Medline. addressed at least 1 psychometric proper- our search using a procedure developed ment was based on factual content ty of the NDI (J78B. denominators for different studies. disagree on an item by 1 point.) for studies crossed different populations. The most common flaws observed in appraisal). appraisal. but few were comprehensive. esis/objectives. All rights reserved. in which they independently resolution or continued to disagree by marized in J78B.14 most common source of disagree. For personal use only. A descriptive synthesis ments and a predetermined consensus signing the lesser of the 2 scores. (2) inadequate sample review (<?=KH. which included papers written Euchaute et al. the overall estimated search strategy would have been Embase studies retrieved and the net results of kappa was 0. the rat. In total. raters were not comfortable with this ties across all identified studies is sum- tion review. although this was not required during properties evaluated.46 The primary authors. er did not include formal critical appraisal pendently reviewed by at least 2 of the 2. An article was accepted if it met thor/instrument developer (J.26 The items from the difference OR MCID OR Rasch OR fac. Few articles using previously developed data follow-up and some psychometric stud. No other uses without permission. quality tool are listed in J78B. mation to evaluate the potential that any when independent assessments differed: In total. with the he Horizon analysis14 indicated tor analysis OR translation OR valida- tion). ranging from 21% to 96%.00. We A database search was performed us.EDB?D. PsychInfo.43 to 1. We used the following consensus process mation after these 3 databases.). if the raters continued to size calculations/justification. were performed. An optimal study references lists. 1 Spanish] were included for ab.

lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 1 Ackelman et al 2002 Patients: 59 Swedish patients (28 M. varying severity and duration of WAD (level of WAD: 23% I. 59% of patients were M. For personal use only. central nervous system involvement. No other uses without permission. 21 F. chronic NP 33 Responsiveness Intervention: group 1. responsiveness Intervention: PT 14 wk Retest interval: baseline evaluation and Site: PT Clinics after 5-7 visits Cleland et al 20089 Patients: stable. 42% II. responsiveness Retest interval: 10 subjects retested randomly at 24 h and a separate 10 subjects retested at 7 d Gay et al 200715 Patients: 7 M. neck trauma Chok et al 20008 Patients: NP 46 Reliability. Intervention: manual therapy duration. 51 y. improved. many occupations. mean age. and community rheumatology clinic into private PT practice journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 403 . 2014. NP with or without referral of symptoms to the upper extremity or extremities. J78B. mean age. 2 or more Journal of Orthopaedic & Sports Physical Therapy® signs consistent with nerve root compression. n = 71 Site: patients recruited from PT clinics. chronic 23 Primary purpose to evaluate Intervention: 1 of 3 manual therapy uncomplicated NP another scale. ankylosing spondylitis.org at University of Washington on December 1. 42 y. 39 y. fied version) Retest interval: 5 occasions. reliability. symptom 138 Reliability (SEM and ICC). group 2. twenty 59 (38 for modi. 40 y. and 20 had no NP but had other within 48 hours. 43 y. at 3 w and 3 mo musculoskeletal symptoms Andrade et al 20082 Patients: nonspecific or posttraumatic NP 48 Spanish translation. construct validity Intervention: PT Included: patients with at least 3 mo of NP Retest interval: 7 d Bolton 20045 Patients: nonspecific NP 125 Responsiveness Intervention: chiropractic Site: 8 chiropractic clinics and 1 teaching clinic Retest interval: pre-Rx and 4-6 wk postinitial Rx Chan et al 20087 Included: patients with chronic nontraumatic NP with a 20 Construct validity Intervention: ND duration of greater than 3 mo Retest interval: Cross-sectional Excluded: cervical radiculopathy. 18-89 y (mean. WAD within the past 6 wk. mean age. mean age. mean age. and chronic NP Retest interval: unknown Site: chiropractic college clinics in Vancouver and Toronto Included: English speaking. 25% F. Site: Spain responsiveness Aslan et al 20083 Patients: NP 88 Reliability. Site: PT clinics involved in a study of botulin toxin Included: received PT that they considered inadequate for NP Excluded: surgical candidates Retest interval: within 1 wk Hains et al 199817 Patients: 62% F. symptom duration.jospt. Cleland et al 200610 Patients: 47% F. plus heat. 16 F. Excluded: any signs or symptoms consistent with a nonmusculoskeletal etiology. validity Intervention: PT Site: an outpatient setting Retest interval: admission and Included: 2 cohorts. 60% F. Intervention: ND Site: midsize hospital in southern Brazil validity. NP. aged. and 6% III). 71 Validity Intervention: none all patients were less than 24 mo since MVA. 237 Validity Intervention: unknown acute. 46 d (SRM and MDC) Included: age 18-60 y. prior surgery to the cervical or thoracic spine. 50 y. All rights reserved. 31 F). responsiveness Retest interval: after 1 session symptom duration. 38 Reliability.' Summary of Studies Addressing Psychometrics of the NDI IjkZo FefkbWj_ed n Fhef[hj_[i. 22 tested twice for reliability and 24 discharge that completed baseline and discharge assessments Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. responsiveness Intervention: PT patients were in the acute phase after a neck sprain. construct validity. or pending legal action regarding their NP Cook et al 200611 Patients: 62% M. GPs. retest 19 had chronic NP. rheumatoid arthritis. 43 y. reliability. 83 d. 17 y or older Heijmans et al 200218 Patients: chronic WAD 61 Dutch translation. home consistency. Reliability. responsiveness exercise program Retest interval: 4 wk Goolkasian et al 200216 Patients: 12 M. Intervention: ND Downloaded from www. subacute. NDI score greater than 10%. mean age. validity. NDI: internal approaches. validity. PT. 43 y) 203 Brazilian Portuguese translation. Unknown validity Hoving et al 200319 Patients: mean age.

aged 18-65 y Excluded: fracture dislocation. spine injury. validity. 66%. 146 (69 for retest) Validity. comor. 4-6 d post-Rx Kose et al 200721 Patients: 83% F. and chronic who visited their 221 (54 from pilot. 54% F. 250) Convergent validity. 31 F. 48%. neurological injury. 18 Resnick 200537 An overview of instruments in a structured review by NA Validity NA a single author and evaluator. 55/100 reliability. nonspecific NP. 7-12 wk. responsive. subacute. 18-69 y. NDI score. Site: patients were chosen from 3 hospitals and 7 7th or 8th Rx clinics in 5 Korean cities McCarthy et al 200729 Patients: NP 160 (34 retest) Reliability. tion 2-6 wk. Spanish translation. 23. GP care 14. 2 or more missing items on NDI Pietrobon et al 200234 Structured review of different neck disability scales NA Validity NA Pool et al 200735 Patients: 61% F. No other uses without permission. NDI. gradual onset 185 Iranian translation. validity Intervention: various Site: spinal clinic Retest interval: 1-2 wk Miettinen et al 200430 Patients: NP who had been involved in an MVA 144 Validity. translation Intervention: routine Rx than 3 mo following whiplash injury from 10 different practices. prior history of chronic pain. 2014. 13 wk. 6/10 Rebbeck et al 200736 Patients: 3 cohorts primary care insurance (early and (99. manual therapy. interpretability Journal of Orthopaedic & Sports Physical Therapy® Nieto et al 200833 Patients: Catalan-speaking. surgery. For personal use only. dura. responsiveness Intervention: PT 5/wk for 3 wk Excluded: inflammatory arthritis. 70%. responsiveness Retest interval: 1-d retest and Site: 9 primary care and 12 specialty services from 9 re-evaluation at 14 d regions in Spain Excluded: functional illiteracy. 134. convergent validity. [ LITERATURE REVIEW ] J78B. central nervous system Kumbhare et al 200523 Patients: the first group comprised of 81 patients from 241 (81 patients. baseline VAS. ness 67%. 26%. responsiveness Intervention: ND Site: Finland Retest interval: 1 and 3 y post-MVA Mousavi et al 200731 Patients: NP 3 mo 30%. responsiveness Intervention: ND diagnosis of NP Retest interval: first and last visit or Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 26%. NR. include a content analysis at item level Riddle et al 199838 Patients: 64% F. systemic disease. employed: 80%. reliability. mean age. 1 year 33%. Retest interval: none Site: primary care or PT clinics floor/ceiling. ND late). neck trauma. Retest interval: 1-3 d without therapy Downloaded from www. VA Excluded: under treatment for problems other than c-spine Scolasky et al 200739 Patients: NP cohort cervical radiculopathy or myelopa. adequate mental capacity within 4 wk of surgery decompression and fusion Sites: 23 clinics Excluded: revisions/tumors 404 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . 183 CID Intervention: random allocation to PT. 534 Convergent validity Intervention: anterior cervical thy. validity. responsiveness Intervention: unknown variety of neck disorders. All rights reserved. F: 80%. age. second 160 controls) Retest interval: follow-up varied group subjects from a convenience sample who did between 4-12 w not have NP and who were not being treated Site: tertiary outpatient clinic Lee et al 200624 Patients: 17 M. 46 y. Intervention: ND physician for NP plus 167) validity. pain. bidity. infection Kovacs et al 200822 Patients: acute. responsiveness Intervention: PT Ontario Canada who had grade II WAD. Reliability.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 20 Humphreys et al 2002 Patients: NA 125 Reliability.' Summary of Studies Addressing Psychometrics of the NDI (continued) IjkZo FefkbWj_ed n Fhef[hj_[i. responsiveness Intervention: chiropractic Site: 8 chiropractic clinics and 1 teaching clinic Retest interval: postinitial Rx. with a primary 180 Reliability. most common neck strain Retest interval: admission and or NP discharge Site: 1 of 4 clinics in Richmond. 88% of patients 50 years or younger. internal consistency. 6 wk of NP 102 Reliability. validity. age.org at University of Washington on December 1. 18-55 y. 68%. with subacute pain of less 150 Factor validity. relative Intervention: ND 78%.jospt.

nonchronic NP of musculoskel. GP. neurological ability. rehab Retest interval: 11 2 mo department. general practitioner. tertiary care teaching hospital and outpatient clinic) Wlodyka-Demaille et al Patients: 43 F. recurrent acute NP lasting no 187 Reliability. prediction and effect sizes across treatment RCTs that used the NDI. systemic or other major disorder Wlodyka-Demaille et al Patients: mean age. content validity. within 1 mo of MVA. Downloaded from www. at at 2 hospitals in UK 1 wk and 8 wk Excluded: undergoing litigation. 28 M. vascular or neurological disorders. nerve root compromise. NDI. current PT neck Rx. F.jospt. evalua- Included: pain for 5-6 wk. severe or greater depressive symptoms. with pain-free interval of at least 3 mo Retest interval: 1-wk follow-up Site: referred by GPs in Rotterdam and region Inclusion criteria: aged 18 and sufficient Dutch Exclusion criteria: specific cause of NP (ie. randomized controlled trial. ND. poor English Stratford et al 199943 Patients: NP 50 Reliability. ICC. intraclass correlation coefficient. known or suspected 1. Virginia. 49 y. females. PSFS Retest interval: 3 sessions in week Excluded: previous neck surgery. pain bother-someness. Pain-Specific Functional Scale. clinically important difference. item consistency. 2 follow-up phone calls no neck radiograph since MVA. 49 y 71 Validity. 12-80 y. NR. motor vehicle accident. retest reli- Excluded: symptoms below the elbow. chronic WAD (>3 mo). rheumatology department (ie. rheumatology department (ie. responsiveness Intervention: 6-wk individualized disabled preinjury and presented for medical care submaximal exercise program. responsiveness Intervention: PT Site: 5 PT clinics in North America Retest interval: 1-3 wk Trouli et al 200844 Patients: NP 65 (43 classified Factor validity. not described. Site: referred by physician to PT 72 h. 31 F. Intervention: routine Rx Site: primary care in 3 rural centers as stable) internal consistency. 31 Reliability. J78B. no formal critical appraisal of studies Vernon et al 199147 Patients: 17 M. positive upper limb tension test. WAD. 30% 48 Reliability. standardized response mean. United Kingdom. to exercise. responsiveness Intervention: routine Rx etal origin “cervical dysfunction” Retest interval: initial assessment. 31 F. numeric rating. mean age.org at University of Washington on December 1. Intervention: ND studies of n = 188 and the other n = 333 Rasch modeling of item. low blood pressure. 68 M.' Summary of Studies Addressing Psychometrics of the NDI (continued) IjkZo FefkbWj_ed n Fhef[hj_[i. M. NP 101 Reliability. validity Intervention: none 200253 Site: patients recruited from outpatient clinics. “mildly” 134 Reliability. baseline pain of 5-6/10. 6. Retest interval: 24 h bilitation department. Numeric Pain Rat- ing Scale. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 405 . NA NA NA oper) that includes psychometrics. 2014. visual analog scale. a score of at least 20% on primary supervised by PT outcome measures: NPRS. and tion of reduced 8-item scale and total NDI of 13-14 differential item functioning of Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. males.lWbkWj[Z ?dj[hl[dj_ed%H[j[ij?dj[hlWb 42 Stewart et al 2007 Patients: 17 M. tertiary care teaching hospital and outpatient clinic) Abbreviations: CID. 1 red flags. validity. responsiveness Intervention: ND 200452 Site: patients recruited from outpatient clinics. validity Intervention: none chronic nontraumatic Retest interval: 2 d Site: Clinic of Canadian Memorial Chiropractic College Journal of Orthopaedic & Sports Physical Therapy® Vos et al 200649 Patients: 119 F. treatment. 2 sessions week 3. Site: different clinics items Vernon 200048 Combination of instrument review (NDI and other). standard error of measurement. VA. PT. plus NA Content validity (comparison of NA some impairment sincerity of effort testing items) Vernon 200846 A structured review by a single author (the NDI devel. responsiveness Intervention: none longer than 6 wk. 2. 70% WAD in past 4-6 wk. LBP. physiotherapy. SRM. NPRS. whiplash-associated disorder. translation signs. MDC. reha. etc) Westaway et al 199850 Patients: Age. PSFS. validity. RCT. No other uses without permission. NP. Rx. neck pain. All rights reserved. minimal detectable change. MVA. UK. 4. and following 1-4 w of Rx by 1 certified Canadian orthopae- dic manipulative PT White et al 200451 Patients: chronic mechanical NP 133 Validity Intervention: in treatment RCT Site: recruited from PT and rheumatology waiting lists Retest interval: before Rx initiated. SEM. Neck Disability Index. contraindicated session weeks 5. 6 cases removed because of 2 missing items on NDI van der Velde et al45 Patients: 55% or more F in 2 cohorts from previous 521 Factor analysis. VAS. For personal use only.

rather.( Quality of Studies on the Psychometric of the Neck Disability Index ?j[c. Thorough literature review to define the research question. because they were in non-English text. For personal use only. 12.18 One paper had item comparison but no NDI data. 8. and interpretation of procedures. - . 11. Valid conclusions and clinical recommendations. [ LITERATURE REVIEW ] numbers that could readily be integrated to rely on generalities when making con.2. 5.lWbkWj_ed9h_j[h_W IjkZo† ' ( ) * + . 0 0 1 1 1 0 36 Chok et al 20058 1 1 0 1 0 0 1 1 1 0 1 1 33 Miettenen et al 200430 0 2 0 0 0 0 0 0 1 0 1 1 21 Abbreviations: NA.46 Two papers had data extraction from the English abstract and study tables but no critical appraisal. 2 2 1 1 0 0 50 Bolton 20045 1 1 0 1 0 0 2 1 2 1 1 2 50 7 Chan et al 2008 1 2 0 0 0 NA 1 1 2 2 0 1 43 Humphreys et al 200220 1 1 1 1 0 0 0 0 2 2 0 2 42 Rebbeck 200736 0 1 1 1 2 . No other uses without permission. The type of data collected dur. Data were presented for each hypothesis. tended clusions. Appropri- ate scope of psychometric properties.org at University of Washington on December 1.jospt.37. The authors referenced specific procedures for administration. Ap- propriate statistical error estimates. not applicable to paper. Follow-up. ing NDI validation studies was typically into clinical practice but. 2014. Three structured reviews underwent data extraction but no critical appraisal. Specific inclusion/exclusion criteria.46 406 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . 10. * Evaluation criteria: 1. J78B. † Papers where NDI evaluations were performed while evaluating a different measure as the primary purpose. Riddle et al 199838 2 2 2 2 0 0 2 1 2 2 1 2 75 Lee et al24 2 1 2 2 0 0 2 2 1 2 2 2 75 42 Stewart et al 2007 1 2 2 1 0 NA 2 2 2 2 1 1 73 Pool 200735 2 2 1 0 1 1 1 1 2 2 2 2 71 Kose et al 200721 1 2 1 2 1 2 1 1 2 2 1 1 71 Kovacs et al 200822 1 2 1 2 1 1 1 2 1 2 2 1 71 Hains et al 199817 2 1 2 1 0 NA 2 1 2 1 1 2 68 Ackelman et al 20021 1 2 1 2 0 2 1 2 2 2 0 1 67 Vos et al 200649 2 2 1 1 0 0 2 2 1 2 2 1 66 Journal of Orthopaedic & Sports Physical Therapy® Vernon et al 199147 2 1 1 2 0 2 1 1 2 2 0 2 66 52 Wlododycka et al 2 2 0 0 0 1 1 1 2 2 2 2 63 White et al51 1 2 1 1 0 1 1 1 2 2 1 2 63 Goolkasian et al 200216 1 1 2 1 0 1 1 1 2 1 2 2 63 Gay et al 200715 2 2 1 2 0 1 1 0 1 2 2 1 63 Nieto et al 200833 2 2 1 1 1 NA 1 1 2 2 0 1 61 Pietrobon et al 200234 2 0 1 2 1 0 2 NA 2 1 0 2 59 Mousavi et al 200731 2 1 1 2 1 0 1 0 1 2 1 1 59 Scolasky 200739 1 2 0 0 2 . Sample size. Cleland et a 20089 2 2 1 2 2 2 2 2 2 2 2 2 96 Van der Velde et al45 2 2 2 1 2 2 2 2 2 2 2 2 96 Kumbhare et al 200523 2 2 2 2 2 2 2 2 1 2 2 2 96 Westway et al 199850 2 2 2 2 0 2 2 1 2 2 2 2 88 Cook et al 200611 2 2 2 2 0 2 1 2 2 2 2 2 88 10 Cleland et al 2006 2 2 2 2 0 2 1 2 2 2 2 2 88 Trouli et al 2008 1 2 1 2 1 2 2 1 2 2 2 2 83 Hoving et al 200319 2 2 1 2 0 NA 2 2 2 2 1 2 82 Aslan et al 20083 2 2 1 2 0 1 2 2 2 2 1 2 79 Stratford et al 199943 2 1 1 1 1 2 2 1 2 2 2 2 79 McCarthy et al 200729 2 1 1 2 1 1 2 2 1 2 2 2 79 Wlodyka-Demaille et al 200253 1 2 1 2 1 2 1 1 2 2 1 2 75 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Quality scores were rated in content for the NDI-related methods. 7. 4. comprised of less clinically useful data. 2. / '& '' '( JejWb Downloaded from www. Specific hypotheses. scoring. 9. so no full critical appraisal. Measurement techniques were standardized. All rights reserved. 6.34. 3. Appropriate statistics-point estimates.

0. 0.48 (chronic) to 0. 0. 7 d.90 (acute)11 tation tracking using the citation index. confidence interval. 0. and 7 d. global rating of change. tween retests and the 2 standard deviation brackets around that difference. patients in acute condition. This re.9443 identified and compared qualitatively 21 d. was reported SEM Patients classified as stable. r = 0.org at University of Washington on December 1.941 properties. 1. cific validated for the neck) scales were that the Neck Pain and Disability Scale velopment is ongoing as it identified an reviewed.90 in first cohort8 tional Disability Scale. MEDLINE and CINAHL.982. 0. 0.5  344 relation coefficients (ICCs). 7 d. 7 d. r = 0. It was suggested that ficient.66 in WAD49 authors performed a comprehensive structured review with use of a formal Test-retest reliability 1 d. Again. No other uses without permission. quality ratings for indi. 1-3 d.8621 view was based on a formal search of 2 3-7 d. MDC. 3 to –3/7). in healthy controls. MDC.983 ditional papers (up to year 2000). –1. 2014. ments of systematic review were not per. standard error of the measurement. where reliability is assessed as mean difference be- group reliability.9024 databases.”34 data extractors.5 in patients with neck pain35 tectable change [MDC]). or a formal process to syn- different study populations. The instruments (and year must be read to the patient and the increased number of neck pain scales (a published) are listed in J78B. 2 d. Similarly. Mean retest difference As per Altman and Bland technique. while changes in individual patients. and the North. All rights reserved.7350 when compared to other scales.93 (acute)11 search for articles. in patients with 6 wk of neck pain 0.22 3 of the scales were similar in terms of 0.96 Journal of Orthopaedic & Sports Physical Therapy® formal quality evaluation was performed. the content of instruments was compared parisons between patients are virtually formed.7844 Three papers were identified where the MDC.jospt.9353 praisal. while databases chometric properties of each scale were considered “very sensitive to functional were used to identify papers. like known group differ- ences that could be used as comparative H[b_WX_b_jo :WjW. SEM.njhWYj[Z data for clinical comparisons. including use of multiple raters/ more quantitatively in a summary table.89 (acute)1 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.. versus more useful J78B.9049 hand searching of relevant journals. These authors concluded that 0. Eight English-language of this review suggested other scales had for the cervical spine published up un. the Copenhagen Neck Func. 1 d.971 strengths and weaknesses of the NDI 3 d. 4.49 absolute measurement error (like SEMs. 0. impossible. ci.9 mean retest differences. MDC. Abbreviations: CI. 7-14 d. 0.93 correspondence with experts to find ad..81-0. ICC. k = 0. (specific-to-neck) scales and 3 (nonspe- important limitations. r = 0.) Summary of Reliability Properties information. 0. using GRC scores of 3 to –3/7. like correlations indicating construct convergent validity. minimal detectable change. –0.92 (chronic) to 0. 19. other ele.7 with 90% CI43 IjhkYjkh[ZH[l_[mi Patients classified as stable over 1 wk (GRC. No 7-14 d. r = 0.94-0. 0.99 (chronic) to 0. 0. 8. The psy- Patient-Specific Functional Scale was total of 11 scales). 10. 5 with 90% CI in patients with neck pain48 IkccWhoe\Fh[l_eki MDC. MDC Patients classified as stable (GRC 3 to –3/7).97 (n = 30/185)31 structure and psychometric properties: Patients classified as stable on GRC. r = 0.951 in terms of content and measurement 90 d. For personal use only.9329 Five standardized neck pain scales were 7-21 d. Downloaded from www. 1.6. the NDI had the strength of being most studied amongst these 3 and was at that time the only instrument revalidated in The results of a subsequent systematic vidual studies.6444 more often than more useful indicators of Patients classified as stable over 1 wk (GRC. tant because it established the relative 2 d. ICC = 0. WAD. but without use of coefficients 1 d. whiplash- associated disorder. This item analysis indicated that the most journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 407 . SEM. or minimal de- MDC.2 in patients with cervical radiculopathy10 MDC. but com. intraclass correlation coef- wick Park Scale. The first review34 is impor. 10. GRC.509 the NDI. such as intraclass cor. These included til 2004 suggested that instrument de. The authors review of studies on outcome measures thesize results. 3 to –3/7). MDC. SEM. reviewed in a qualitative manner.9211 multiple raters or formal critical ap.

and general exercise. recreation48 tion.7 š D:?mWideji[di_j_l[jej^[_j[ciYedY[djhWj_ed'$*"'$'ehf[hiedWbYWh[&$-"&$.19.C9I">7:I"DF7:"L7IfW_d"Y^hed_Y1. missing matic neck pain.19.#. and lifting.53 VAS pain. appropriate.1c_bZ"(-1ceZ[hWj["*&1i[l[h["**WdZl[hoi[l[h["-&$ š FW_d0defW_d"((1c_bZ")&1ceZ[hWj[").org at University of Washington on December 1.50. It was determined that 75% of patients identify problems with sleep. concentration. the latter being significantly better.38.jospt. The second factor on functional disability included the items personal care. headaches.e\_j[cic_ii_d]"Zh_l_d]ceije\j[db[\jc_ii_d]Zk[jedejWffb_YWXb[$21 š ?d?hWd"*(Z_ZdejWdim[hZh_l_d]"/h[WZ_d]$31 š Beea[Z\ehY[_b_d][÷[YjiXkjdejZ[j[Yj[Z$31 š .njhWYj[Z Content (including š 9ecfWh_iede\_j[ciWYheiiD:?"DFG"WdZj^[9ef[d^W][dD[Ya<kdYj_edWb:_iWX_b_jo?dZ[n1_j[ciedWbb)0ib[[f_d]WdZh[WZ_d]1_j[cied(iYWb[i0fW_d" analyses of ques. For personal use only. More than half of subjects identified difficulties with frustration. Because of previous reports of a 1-factor solution."'&53 š <WYjeh("d[YafW_d_j[ci0'"*"+"/53 š <WYjeh'"fW_dWdZ_dj[h\[h[dY[m_j^Ye]d_j_l[\kdYj_ed_d]33 š <WYjeh("\kdYj_edWbZ_iWX_b_jo š Ki_d]WdeXb_gk[hejWj_edj[Y^d_gk["'\WYjehÇfW_dWdZ_dj[h\[h[dY[e\Ye]d_j_l[\kdYj_ed_d]"È[nfbW_d[Z*(e\j^[lWh_WdY[WdZYedjW_d[Zj^[\ebbem_d]_j[ci0 pain intensity. The highest mean severity scores were found Downloaded from www. concentration.47 Factor validity Support for 1 factor š '\WYjehYedÓhc[Z. š 7kj^ehiki[ZWfWj_[dj_dj[hl_[mje[b_Y_jfheXb[ciYWbb[Zj^[fheXb[c[b_Y_jWj_edj[Y^d_gk[je_Z[dj_\o\kdYj_edWbfheXb[ci_dfWj_[djim_j^Y^hed_YdedjhWk# ness.53 š L7IZ_iWX_b_jo"F9II"C9II"DFGXWi[b_d["9IGXWi[b_d[ š L7IZ_iWX_b_jo"f^oi_YWbhWj_d]"ckiYb[ifWic"d[YaHEC"d[Yai[di_j_l_jo21 š =[d[hWb^[Wbj^"l_jWb_jo"ieY_Wb\kdYj_ed"f^oi_YWb\kdYj_ed31 š 9EI0fW_d_dd[Ya"Whci"f^oi_YWbiocfjeci"\kdYj_ed$FioY^ebe]_YWbZ_ijh[ii"^[Wbj^YWh[ki["39 a Patient-Specific Problem Elicitation Technique7. driving. reading.51 and 0.24. Construct.$19 š D:?Z_Zdej_dYbkZ[j^[[njh[c[ie\l[ho[Wioehl[hoZ_øYkbj_j[ci"cWdom[h[i_c_bWh_dZ_øYkbjoWdZj^[h[\eh[fej[dj_Wbboh[ZkdZWdj$24 š IeY_WbYedi[gk[dY[iWh[Wd_cfehjWdjfWhje\ikX`[YjÊii_jkWj_edj^Wj_idejc[dj_ed[Z_dD:?$1 š LWb_ZWj[Z\ehckbj_fb[eh_]_die\d[YafW_d"\kdYj_ed"WdZYb_d_YWbi_]diWdZiocfjeci$34 š .623 Construct.1i[l[h[". and symptoms. driving. convergent Correlations with other scales reported 0.70): š I<#). mobility. analyses were completed to deter- mine differences between the 1.43. and sleeping.21L7IfW_d"9IG"DFG22 š L7IXeZ_bofW_d31 Correlation with other scales reported as moderate (0. sleep disturbance.1ib_]^jbo_cfhel[Z")$-1deY^Wd]["&$(.and 2-factor solutions.1_cfehjWdjY^Wd]["/35 š H[Yel[h[Z". and recreation.F9I"I<#). Less common but mentioned by more than 30% of effects) patients were looking into cupboards. in depression.$11. emotion. š '\WYjeh2 Support for 2 factors š <WYjehWdWboi_i0(\WYjehi[njhWYj[Z[_][dlWbk[i"1.29. A scree plot confirmed a 2-factor solution as optimal.'22 š 7cekdje\Y^Wd][0'&$+"Yecfb[j[boh[Yel[h[Z1ckY^_cfhel[Z". headaches. All rights reserved. [ LITERATURE REVIEW ] J78B. 2014. The correlation between these factors was 0.17.* Summary of Validity Properties LWb_Z_jo :WjW.$+e\fWj_[dji^WZ_d_j_WbiYeh[im_j^_d'C:9Z_ijWdY[\hecj^[X[ijfeii_Xb[Wdim[h&h[l[Wb_d]deY[_b_d][÷[YjiWYYehZ_d]jeW'+Yh_j[h_ed$44 š '&%+&Wia[Z\ehYbWh_ÓYWj_edZkh_d]IfWd_i^WZWfjWj_ed$22 š CeijYeccedboc_ii_d]_j[ciedXWi[b_d[Wii[iic[djWdZ'mabWj[h_dYbkZ[Zb_\j_d]''"h[WZ_d]/"Zh_l_d]*+"h[Yh[Wj_ed'44 Internal consistency š 9edi_ij[djbo^_]^9hedXWY^Wbf^W0&$-&#&$/. cooking. ceiling/floor the highest prevalence. known Detected differences between group differences š :_iWX_b_job[l[bi0deZ_iWX_b_jo"'.70 Journal of Orthopaedic & Sports Physical Therapy® š FI<I"50DFG"DF:I":H?"L7IWYj_l_jo#Y^hed_Y"L7IfW_dWdZWYj_l_jo"WYkj[$1. work.0)53 š <WYjeh'"\kdYj_edWdZZ_iWX_b_jo_j[ci0(".*lWh_WdY[edÓhij\WYjeh11.66. personal care. lifting.30-0. r for VAS test and retest groups of 0.17 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. work. gardening. driving.WdZl[hoi[l[h[".15. and frustration. housework. Individual problems ranked most important by subjects were driving. working overhead.16. No other uses without permission. role activity. Sleep disturbance had items. and sitting upright. lifting. Of the 11 problems identified by most subjects. 6 were included on the NDI.

Short-Form 36 mental component scale. Pain Coping Strategy Scale. Patient-specific Functional Scale. SF-36 MCS. range of motion. NPDS.8. mobilization. a summary of therapy. SF-36 PCS. NPQ. 408 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .32. Short-Form 36 physical component scale. appraisal. Cervical Spine Outcomes Questionnaire. pain ent studies. DRI. Multi- country Survey Study. exercise. acupuncture. standing/running. PSFS. 10-28. common content items on neck disability ed a historical perspective. and driving. work. Neck Pain and Disability Score. MCSS. HADS. NPAD. and relaxation intensity. This publication provided the Most recently. a summary of ments.46 Abbreviations: CSQ. PCSS. visual analogue scale. 3041 š 7j(*ma"h[Yel[h[Z"'*1f[hi_ij[djfW_d"(. VAS. Disability Rating Index. Neck Pain and Disability Scale. scales were self-care/activities of daily liv.46 This review includ. ROM. scores in effect sizes with different treat. Northwick Park Neck Pain Questionnaire. Hospital Anxiety and Depression Scale. ing (ADL). mild disability.37 from the MAPI website. medication. laser. a list of translations available cervical pillow. the developer of the results from prognostic studies that use most extensive review of the NDI to date NDI published a comprehensive 17-year the NDI. and tables listing mean change but did not include independent critical review of the NDI. moderate/severe disability. psychometric properties from 22 differ. including manipulation.

and subsequently examples in these areas to make the while Wlodyka-Demaille et al53 reported defined a score of less than 15 as a recovery questionnaire more understandable for that it took a mean (SD) of 7. :_÷[h[dj_WbEkjYec[i specifically analyzed how the MDC varies tic equivalence. IkccWhoe\Fh_cWhoIjkZ_[i French (Canadian/Switzerland/ validated. experiential equivalence.34. 2014. is considered “complete disability. sure” and “social activities” had differ.53 Through a series of utes (range. for level and reliability was excellent. demonstrated a ceiling effect when using Ackelman and Lindgren1 changed each inally. patients defined this population. reading. All rights reserved. ().”47 Orig. com. 15 dence to support appropriate retest re- Not all translations have been to 24. as they may have attained a the items. and 20 points. However. their status.9 for scores that fall in the “severe disabil- a high rate of nonresponse in an Iranian provide a percentage score as a strategy to ity” interval. all agree that only a short as having scores of greater than 15 (30%). if a subject population. where reliability co- The developer reported that he worked plete disability. Finnish. (Australian/UK/US). 16% of patients had items required for validity. Possible reasons translation methodology (although the “normal limit” of the NDI between 0 for variation in reliability between stud- psychometric testing was not performed). ac- These translations included English behind setting this benchmark was not tual difference in clinical subpopulations.31 Overall readability in the account for questions that are left unan.” Some authors report that it comprehension difficulties. NDI. cess retained the sound measurement 5 minutes to analyze the (Dutch) NDI. and definitions of “stable.38 For example. recovered as having NDI scores of less addressed issues around readability. and none have crepancies in 4 areas. liability of the NDI in both acute and published in peer-reviewed literature.8) min. Wlodyka-Demaille measured the time taken to complete the those with moderate to severe disability et al53 reported that the concepts of “lei. Sterling et al40 used data from Readability/Language and Cultural Germany). 5 to 14. those with mild disability as usually in the context of language and specifically address or state how they having scores of 5 to 14 (10%-28%).30 Again. Vernon and Mior47 offered the follow. no disability. near maximum score. sion in the Korean version. although some Lindgren. Nederland32 found that pa- ent meanings in French and American al43 reported that the time for patients to tients defined as recovered at 24 weeks Downloaded from www. Danish. it may be difficult to deemed to be acceptable. and cultural translations. described for how these ratings were de.jospt. In the same study. cultures. cutoff. se. For personal use only. no process was efficients above 0. as well as ers identified that the driving item had of a total of 50. Reliability There is considerable evi- lesser extent.46 the extreme ends of the scale. Lee et a single report of therapist burden stating 28 or greater. ?dj[hfh[jWX_b_jo%IkX]hekf_d]e\ cally addressed this issue. where 0 is over the spectrum of possible NDI scores. with a severe pathology scored 48 out English and all translated versions is swered. and nonresponse is more com. NDI scores vary from 0 to 50. One group of authors set reliability9. it centration. clinical studies to define patients who had Translation A number of papers have Spanish (Spain/US). Stratford et Conversely. to a ing interpretation of NDI scores: 0 to 4. chronic populations. or minimum number of items regarding “personal care” and “con- Journal of Orthopaedic & Sports Physical Therapy® the Spanish version. the score may not be valid. and conceptual considered “no activity limitation” and 50 Riddle et al38 reported that 6% of patients equivalence.11 In the Swedish version.90 have been reported with an independent organization (www. but these did has been suggested that if 3 or more items is difficult to observe change in scores at not appear to be related to educational are missing. Polish.10 In the study by Ackelman and of 50 on the NDI. Norwegian. that is. Cook et al11 questionnaire could be completed in the as they represent patients for whom ensured good readability of the Brazilian/ waiting room and thus did not add any pain and disability estimates may be Portuguese version of the NDI through additional time to the patient’s visit. 25 to 34. No other uses without permission. invalid and for whom changes may not Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. a committee who reviewed the translator be measurable.18 Floor-Ceiling Effect Floor and ceiling properties of the original English ver.” journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 409 . that it takes 8-10 minutes to complete and cutoff of 20/50 at 3 years. amount of time is required.1.4 (6. mon for task items like driving and. mild disability. and greater than 35. The majority of authors report the NDI out both “low” or “normal” scores. made by the developers on the handling al49 reported a floor-ceiling effect on the ability due to neck pain was of interest. however. the subjects were not included. For that reason. Stratford et al43 commented that the effects have practical clinical relevance. whereas Hains et al17 and Vos et item of the tool to specify that only dis.1 if more than 2 items were left objectively distinguish further decline in patients require support to understand unscored. they provided complete the NDI was about 3 minutes. including seman. idiomatic equivalence. some authors1. in a number of studies. Administration Burden Few authors than 4 (8%). In of missing items. Few studies have specifi- reports. More recently. recent proqolid. and this cutoff has not been study process. There was only as having persistent pain had a score of translating and back translating. vere disability.org) to produce additional rived and no validation of these categories large high-quality studies indicated lower translations through standardized was performed.2 Oth. Italian. the methodology ies are random differences in samples. They reached consensus on dis. In contrast. had a score of 14 or less. However. described. Miettinen30 used a recovery al24 concluded that their translation pro. 1-60 minutes). no explicit recommendations were the NDI.org at University of Washington on December 1. moderate disability.10 (J78B.

Improved 0.72)9 with acute and chronic neck pain.17 estimate for MDC is around 5/50 (10%).67 –0.53 It was also reported by Hains et adequate convergent validity has been detected between the Hospital Anxiety al17 that the single pain item or the total demonstrated for a variety of measures and Depression Scale and the NDI indi. making it difficult to deter.50. mine why variations in reliability exist. analogue scale (VAS) pain ratings. Summary of Responsiveness.91 study sample. to CID CID: 5 points43 CID: 7 points10 0.77 0. In a study of the Neck Pain and Disability Index response to botulin toxin Validity Construct convergent validity measured by Cohen d was 0.87-1.49 De.25 0.53 Others provided some FioY^ec[j_YFhef[hj_[i :WjW.I IHC in a different order have been exception. reported when the retest was based on a 7Ykj[ ?dikhWdY[BWj[ single occasion but administered in dif- ferent languages31 or presented questions . 84% CI42 acute condition when the test-retest in.46.99 in patients Downloaded from www.16.0 12. 0. high-quality ES (total cohort): 0. tients’ perceived pain and psychological purported constructs (J78B.55 0.24. Deteriorated –0.66 points for their Overall 0. weaker correla.93 0.60/0.40 for former and 0.77 the lowest comes from Voss et al. J78B.95 1. ES or SRM ES: 0.17.98 on NDI versus 8-10 Park Neck Pain Questionnaire. the Northwick Exercise 0.03 1.1. For example.66 ally high (0.53 The Patient- Manipulation 1.91.1-1.77.77 0.jospt.2/12.1.19.I IHC comes from Cleland et al. the more common Overall 1. convergent validity status. the Neck Mobilization 7 Acupuncture 0.3/–4. 84% CI24 SRM: 1.53 Lower was reli. These Mean change –4. ES (improved cohort): 0.11. specificity. cates that there is a link between the pa. [ LITERATURE REVIEW ] Some studies did not define these impor- tant factors. respec. *). ES: 1.njhWYj[Z information on acuity and found test-re.85-0.68). Responsiveness test reliability varied from an r of 0. The 410 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .88 at 4-6 wk5 tively.16 tients with recurrent neck pain.25 –0.I IHC .93. Short term 0.5/4. CID: 19 (sensitivity.92 7 Pain and Disability Score.20.org at University of Washington on December 1.2 points for their patient Stable –0.47.04 0. SRM (improved cohort): 1. For personal use only.1.0 studies based their definition of stable on ES –0.92 Long term 1.53 which is consistent with the similarity in their tions were observed with less similar in.73 0. many studies did not sepa. 84% CI24 ability reported for individuals with an ES (total cohort): 0. Change24 ES SRM spite a range of 2 to 10. No other uses without permission. 84% CI42 tration.15/0.93 Journal of Orthopaedic & Sports Physical Therapy® an MDC of 10. score of the NDI both predicted visual with related constructs.19.51. 2014.44 for patient and physician GRC16 has been established between the NDI and a variety of other questionnaires In a review of randomized controlled trials using the NDI46 that can be used for patients with neck .47. For example.805 studies by Cleland et al9.24.73 0. Administrative rate patients with acute or chronic neck pain.90).50 or 0.34 population with cervical radiculopathy. 0. 84% CI42 terval was 72 hours from initial adminis. and an r of 0.)).+ Cultural Adaptation. ICCs Change in 2 cohorts36 (similar to CWOM or FRS) Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.11. Only a few studies presented the SEM Change in group after 11 mo (no standardized Rx during gap) (compared   DF:?"DFG"L7IfW_d"L7I^WdZ_YWf"ceh[i_]d_ÓYWdjP value than all but or the MDC.89 to 0.83. The developer suggests that   DFG3 MDC be 5 points.48.49.24/–0. Although dices.10 suggested that Mehi[ KdY^Wd][Z ?cfhel[Z reliability may be lower than generally    FW_d%:_iWX_b_jo FW_d%:_iWX_b_jo FW_d%:_iWX_b_jo reported (ICCs of 0.95.50 Two well-powered. and Disabil- Medication 5 ity Rating Index have the strongest cor- relation with the NDI. Response Burden.I 9^Wd][_dD:?IYeh[i%+& pain (J78B.38.10 who reported Improved 0.38.67 a –3 to +3 global rating of change.50. All rights reserved.40 and 0. The highest estimate .04 1. 84% CI42 SRM (total cohort): 0.1 2.49 who Change42 ES SRM reported an MDC of 1.80/0. which included stable pa.6 9-10 Specific Functional Scale.

and in 7 min 59 s (1 min 26 s) by those with low one (p . rate concepts across different pathologies German.22 Iranian.   š F[Whiedr.88. For personal use only. items are positively correlated with the Summary of Responsiveness. 70% Nieto et al33 conducted a factor analysis With COM. French. 0.83 change scores50 ing” and “functional disability.839 function) in some subgroups.36 0. 0. Italian. SF-36 physical.5.733 the NDI (n = 521 patients) was available Response burden Time to complete: 9 min. No other uses without permission. 0.21 4 min.21 Spanish.84). Correlation of English and Turkish NDI the r for test and retest was 0. Italian for Switzerland.5450 Correlation to PSFS.77). VAS pain. English for the UK.344 and interference with cognitive function- Correlation to clinicians prediction of change. Translated into Greek.49   š HE9Ykhl["&$-.73-0. total score (0. All rights reserved. Wlodyka-  m_j^ej^[hY^Wd][iYeh[i š =F.88 tions about pain and disability treated as Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.66 and A highly powered Rasch analysis of 0. the items Administrative (continued) of “work” and “driving” had the strongest correlation (0. ICC = 0.5 (pooled).54-0.+ Cultural Adaptation. Spanish for the United States and samples.31 Portugese-Brazilian. and items of “personal FioY^ec[j_Yfhef[hj_[i :WjW.com: English for Australia.” suggesting that the   š 9ehh[bWj_edD:?Y^Wd][jeD8G&$. tent to which pain and disability are sepa- Dutch.76 Correlation to Disability Rating Index.33). 0. Norwe- gian. specificity. French Canadian. r = 0.601 found 2 factors that they termed “pain Correlation of NDI change to GRC change.95. Finnish. English for the United States. Interitem corre- J78B. Portuguese. lations perform similarly well. –0. Spanish for Spain.njhWYj[Z care” and “headaches” had the weakest Longitudinal validity correlation Comparisons: correlation (0. not clear how/if for appraisal.45 “fulfilled in 6 min 08 s (54 s) by those with middle-high cultural level. 0. 3.$49 ity” and “Neck pain. ROC. German for Switzerland.jospt.WdZL7I&$+ scale may have 2 dimensions (pain and AUC.38.or 2-factor effect has been observed on other Cultural adaptation Language/cultural translation/validation Turkish. Similarly.86. 2014.proqoulid. with input from the developer44 one-dimensional and may reflect the ex- Translations available on the MAPI website www."&$*&"/+9?53 Demaille et al53 extracted 2 domains in   š 7K9"&$-/"/+9?53 the French version. MID 10. Response Burden. “Functional Disabil- Downloaded from www.11 French53 brief disability scales which include ques- Agreement between translated version 69% identical answers.org at University of Washington on December 1.17 Conversely. 0. sensitivity. 90%.” This 1. based on a manuscript now measured)29 in press that was shared by the authors. VAS in 150 patients with whiplash injury and activity. Danish. 0.22 5 min (stated. Polish. French for Switzerland.

sional scale. NDI. suggesting and chronic neck conditions. an acute population. indicating that some items did not have the NDI pertaining to pain and activity in tural validity or dimensionality. In particular. Pain-Specific Functional Scale. operating characteristic curve. model. with neck pain. visual analog scale ably interpreted as numbers. that the NDI items did not contribute to from a musculoskeletal or neural source. Overall.001)”2 An advantage of this approach is that it Journal of Orthopaedic & Sports Physical Therapy® formally assesses whether an instrument Adminstrative burden 5 min to analyze18 satisfies rules for interval scale mea- Abbreviations: AUC. However. NDI to the model and that 4 items had sions of the NDI. COM. relation between the VAS and items on vides a relatively weak indication of struc. Items man and Lindgren1 included individuals ally exceeded . this approach is Northwick Park Neck Pain Questionnaire. Hains et al17 observed that all on headaches did not fit with the trait journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 411 . GRC. ROC. NPQ. SRM. Items on changing the order of the questions does sional disability instrument for patients headaches and recreation had significant not change the validity of the tool. global perceived effect.85. This study determined that there was a lack of fit of same study administered 7 different ver. Dif- demonstrated that there is a higher cor. CWOM. which is consistent with on lifting. Some studies different person estimates. Rx. and work also with acute and chronic conditions. scales. GPE. Unidimensionality tests showed that dif- of the NDI has demonstrated validity in tor analysis in a small number of studies ferent subsets of items gave significantly evaluation of pain and disability in acute with inconsistent findings. personal care. this statistic alone pro. and concluded that The NDI was developed as a unidimen. neck disability index. tions are ordered and might be reason- standardized response mean. clinically important difference. receiver operat. SF-36 physical. disordered response thresholds. CID. 0. global rating of change. the content validity has also been evaluated using fac. core surement by examining fit with a Rasch outcome measure. Short-Form 36 physical component scale. PSFS. The item and whether stemming from a traumatic 2 factors. able to determine whether response op- ing characteristic. ferential item functioning was detected. while others suggest it has a single underlying construct. VAS. Structural the same meaning across subgroups. Neck Pain Disability Index. treatment. NPDI. and values reported for other unidimensional demonstrated disordered thresholds. deviation from model expectations. core whiplash outcome measure. Cronbach’s B has gener. or nontraumatic injury. ES. Ackel. whether have supported the NDI as a 1-dimen. effect size.

and how “sta- Neck Pain and Disability Visual Analog Scale 1999 ble” is defined within a given time win- Functional Rating Index 2001 dow.47 When neck NDI is stable over short test-retest time important to consider when designing pain is of musculoskeletal origin. 9[hl_YWbIf_d[EkjYec[iGk[ij_eddW_h[(&&( While it is important to understand short- M^_fbWi^:_iWX_b_joGk[ij_eddW_h[(&&* term and long-term stability of clinical measurements. Even if a patient’s severity of Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.jospt. reliability studies ies calculated the clinically important dif.90). Only a few studies presented more cal recommendations regarding its use. They concluded within the limits prescribed by the avail.10. cal studies. be appropriate for consideration when compared to a 7-point change in patients tervals. Therefore. Some studies have used as much as Extended Aberdeen Spine Pain Scale 2001 a 5-week interval in patients defined as 8ekhd[cekj^D[YaGk[ij_eddW_h[(&&( stable to determine test-retest reliability. Conversely. ate self-reported changes are defined as clinically relevant indicators of measure- 412 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . although it is less clinical research studies and reliability can be said to have occurred when there likely that patients defined as stable will studies.10 researchers often assume that most un. when using the NDI support may contribute to alterations in ering the typical reported effect sizes or to evaluate clinical change in individual perceived disability over a 5-week period. tant to consider that these variables do indicates that the NDI has good ability to ences between known subgroups. pose. ICCs English Language (Specific-to-Neck) and 3 reported when the test-retest was based J78B. For personal use only. changes in Responsiveness The NDI appears to properties will vary according to pur. patients. it is impor- the improved cohort (J78B. whereas shorter intervals may is a change of greater than 5 points. definition. No other uses without permission. Conversely. to 0. [ LITERATURE REVIEW ] stable in this analysis.42 This discriminative validation that tests differ. pain or task difficulty remains consis- these requirements and provide interval. (Nonspecific Validated for the Neck) Scales on a single occasion. chometric properties supporting use of tent over the course of several weeks. a CID intervals (0-3 days). In longer test-retest in. +). Journal of Orthopaedic & Sports Physical Therapy® standardized response mean (SRM). All rights reserved. with the NDI ad- Reviewed in a Previous Systematic Review 37 ministered in different languages31 or different question order. the absolute measurement error Recalibration of the disability experience varying from 0. However. affect scores as test-retest intervals are that the NDI is not unidimensional. would contribute to measurement error in clini- detect change over time. or social have good responsiveness when consid. it should be appreciated that an increasing number of factors may captured by other items. in different clinical subgroups.5. factors like coping. a form of test-retest intervals. the NDI to differentiate different levels of timates of reliability in studies with larger orders. For example. Patient-Specific Functional Scale 1998 Copenhagen Neck Functional Disability Scale 1998 neck pain.60 when measured in the and responsiveness should be considered in the absence of changes in pain intensi- total cohort of subjects with “mildly” dis.95 when measured in only disability. by to their situation and use. patients with small to moder. re-evaluations.43 have stable scores in longer test-retest in. importance of different psychometric external references. when using ty is a plausible explanation for lower es- abling chronic whiplash-associated dis. Overall.org at University of Washington on December 1. Thus. be more important. Only a few stud. This suggests Dehj^m_YaFWhaD[YaFW_dGk[ij_eddW_h['//* that lower test-retest reliability across Disability Rating Index 1994 longer test intervals may reflect changes in scores due to the episodic nature of Downloaded from www. early recovery. self-efficacy. mediated by psychosocial influences and item version and had similar correlations eletal or neurogenic origin. most relevant. NDI and was able to provide some clini. the level measurement of neck disability. but able evidence. 2014.23. The relative calibrated against multiple internal and to measurements of neck pain intensity. In short-term test-retest intervals estimating error across shorter clinical with neck pain of neural origin.43. known group validity. that a revised 8-item NDI would satisfy to strong evidence for a spectrum of psy. The the NDI in patients with acute or chronic experience of pain and disability will be 8-item NDI was consistent with the 10.. neck pain with symptoms of musculosk. This is with different test-retest intervals allow- J his study synthesized current research in 37 studies addressing often based on a –3 to +3 score on global ing users to select estimates most similar the psychometric properties of the rating of change instrument. estimates that reflect different definitions tervals it is more necessary to establish of stable. The reliability data cur- treated patients remain relatively stable rently published in the literature provide :?I9KII?ED over 1 to 3 days. have been ex- Neck Disability Index 1991 ceptionally high (0.+). there is moderate extended. the patient has remained stable. There is adequate evidence that the with longer test-retest intervals may be ference (CID) (J78B. Thus.

For personal use only. Despite a range of 2 to 10. Further. This reviews only high-quality studies are research and practice. The most recent sug- general. making the scale more amenable to that can be applied across different cul- reliability estimates. was not an outcome measure. responses. estimates are highly variable review incorporated critical appraisal. There was agreement across studies on the findings from high-quality studies. 10.4 Due to the brief must be taken within the context of the cient and the sample variability. Neither previous systematic these variations are reflected on the NDI responsive. Although the first author of this review It is certainly true that no other in- ber of fully powered studies that vary on has addressed the critical appraisal of strument has undergone sufficient de- factors such as the type of intervention. tine evaluation of neck disability. Ceiling/floor effects have recommendations.10 who reported an MDC of for confusion.50 which is able to sample ness to detection of clinical change. ently subjective elements. any suggestions of that the NDI is easy to read and under. although pose. then scores have a substantial impact on our results. introducing new instruments into clini- different neck disability scales indicated Therefore. While we tried to make the process gestion is that an 8-item version of the journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 413 . method to synthesize the extracted psy. ment error. suggesting that a num. it is important to patients who score at the extremes may designed to comment on whether the see how the instrument performs across benefit from supplemental scales. The de. like the Patient-Specific Func. change for patients with this type of clini. like NDI is the most responsive neck scale.26 there is no clear NDI as the instrument of choice in rou- nature of the neck condition will be re. as study results score and based on the reliability coeffi. The reasons for this ing worsening or improvement. No other uses without permission. and pur- veloper of the NDI suggests that MDC is 5 minimal administrative burden. individual studies by developing a tool velopment or validation to replace the length of follow-up. estimates come from studies with lower struments. interventions. chometrics and usefulness by adapting the NDI itself should be balanced with as well as in subsequent translations. and around 5/50 (10%). tural or language subgroups. although either larger standard changes are relevant at these ends of the Overall. the fact that it has been translated into a Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. although evidence review did not attempt to evaluate differ. this consideration. of 40 or higher and 10 or lower might as the majority of translation and valida- the more common estimate for MDC is be considered problematic for detect. the NDI has study population. making it difficult to synthesize re- points. the NDI has a number of fea- deviations or lower reliability coefficients scale. respec. we were able to extract data from English large range of MDC values are largely tively. These include its brevity and is not surprising that the larger MDC supplementing the NDI with other in. there are inher- being expressed in unit of the original dures for valid translation and demon.66 for their study. 2014. teria for synthesis process for psychomet. All rights reserved.49 who reported of what constitutes a ceiling or floor. strated equivalence. the SEM or the MDC.jospt. in clinical situations appears to be appro. varies across the subgroups and how Journal of Orthopaedic & Sports Physical Therapy® While the NDI has been shown to be ric studies. this cross-validation is well under way. A second limitation in our review is with cervical radiculopathy. none of the studies that per. In some systematic instrument is suitable for both clinical in different clinical circumstances.org at University of Washington on December 1. among studies. We don’t expect this limitation to included stable patients with recurrent points (and up to 10 points). For these reasons. and the low. in both its original English format. different contexts and purposes. Across the different studies. which one assumes that the MDC is usually at 5 French. If full-text papers written in only English or an MDC of 1. unclear. The quired to provide more precise estimates chometric evidence. est was from Voss et al. MDC tional Scale. when evaluating and measurement principles suggest that ent instruments and. The highest estimate comes from scoring variations can create potential sults from individual studies into global Cleland et al. as objective as possible. Although the value of 5/50 or 10% commonly used cal presentation. One limitation of our review stems future studies may focus on whether the priate for most clinical comparisons that from the lack of agreed upon quality cri. both least some of the recommended proce. pain/disability experience in neck pain tend to occur over 2 weeks or less. clinicians should consider cal utility.10. a mechanism to place greater emphasis adaptation and knowledge translation. change to different instruments or in stand. comorbidity. We would also suggest that within tures that suggest that it has good clini- can increase the value of the MDC. and the for this purpose. neck pain. We summarized the information on psy. Despite variability. more. Evidence suggests that smaller abstracts in non-English text. published translations used at ers. In and expanding a framework used by oth. we rank ordered studies by cal practice and research requires tre- that another instrument was superior to quality to allow the reader and ourselves mendous efforts in translation/cultural the NDI in terms of responsiveness. However. the Patient-Specific Functional Scale. However. It variations reflect the effects of test-retest items that are of high or low difficulty. number of languages and its responsive- ICCs. although there is no consistent definition that the scope of our search retrieved Downloaded from www. there are no levels of evidence that It is an important consideration that formed head-to-head comparisons of create clear categories for study quality. synthesized. so it these ranges. is important to have outcome measures interval and the definition of stability on thus.2 points for their patient population been suggested as a potential problem. tion articles were printed in English.25. therefore. nature of the questionnaire.

valid.and long.24. guese. The NDI should be scored out of 50 ability benchmarks have not been suf- establish valid benchmarks. The CID is approximately 7 points. chronic conditions. sites provide translations in English developers. 40 range.BEFC. languages: English. moves to improve and CID in different subgroups are for Australia. commercial cian to seek out these tools by contacting 2. For personal use only. No other uses without permission. The MDC is accepted as 5 points. French Cana- While the NDI may be considered the prognostication/outcome evaluation. French for Switzerland. as well as those Scale should be considered. Published cultural validation has been as these are not usually published with state hypotheses that are to be evalu. properties in different clinical groups ric was used. but varies considerably across stud- the NDI captures all of the important 4. metric property being analyzed. well as the expected outcome. Swedish. Studies that determine SEM. cau.1. and up to 10 points might enhance performance. Qualitative studies and cognitive be set for a minimum 7-point reduc- Perhaps. original pilot testing was conducted on a official manuscripts related to the 4. ficulty of obtaining official translations. suggested that removal of items might im. Studies that compare the proposed re.1. Others have return to work. 9ED9BKI?ED%A. 3. thus. and respon. French. French.49 tion should be exercised when presenting studies are required to determine the 5. benefit clinical practice. would inform our understanding of 6. Spanish for Spain.45 <KJKH. there are interviewing that evaluate how pa. ies and may be up to 10 for cervical concepts for patients with neck pain or marks that clinicians could use with radiculopathy. including the specific psycho. The NDI is reliable. and in the literature. German for Switzerland. it is the responsibility of the clini.jospt. variations visions to the 8-item NDI in different timeframe where this change could in scoring exist in the literature.D:7J?EDI<EH percentage score used to adjust for un- and can be considered as interval data.L. Because the NDI was not de. When the allow clinicians to provide more accurate 1. respectively). are needed. and Portu- script. baseline score is outside of the 10-to- prognosis and outcome evaluation. as Dutch.org at University of Washington on December 1. Danish.:.45 Currently. English for the UK. dian. Because these are not ated.9ECC. term goals. and to communicate more suffering from neck pain associated 7. clinical subgroups and the impact of dif. Future studies on the psychometric clinical reports to ascertain what met- a practical issue for clinicians is the dif. a ficiently validated nor occupational 414 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.53 In addition. very small sample. 5. These sive in numerous patient populations. Finally. cervical radiculopathy. All rights reserved. Patients who score in the range of ei- the work on MDC and CID is sparse and self-reported disability as reflected ther 40 to 50 or 0 to 10 on the NDI inconsistent. tients respond to items of the NDI benefits. 6. However. English for the United the accessibility of translations would needed to resolve confusion in the States. Iranian. should be considered as approaching a and important difference for different ceiling/floor effect (45/5.49. veloped using a clinimetric process and by interested users when publishing and Spanish for the United States. goals should require a minimum of 5 their relative priority. Norwegian. Clinicians may relate an NDI score effectively with payers. published with the validation manu. MDC. and points change for whiplash-associated the possibility that the addition of items targets for important milestones. it is easier to change to a reduced-item Caution should be used when reading instrument than a different one.43 ative data and benchmarks. Downloaded from www. Importantly. supplementation of the NDI would be important in assisting clinicians including patients with acute and with a Patient-Specific Functional in using the NDI to set short. “gold standard” among outcome measures. longer-term treatment goals should are out of 50 points or 100%. mechanisms for these to be obtained ish.10. are warranted and should specifically 3. most importantly. tion in score to demonstrate treatment gaps in defining clinically useful compar. This leaves open sus different levels of disability. For example. Pol- is needed. Finnish. clinical contexts and in longitudinal be anticipated (2 weeks). this should be re-evaluated within a Journal of Orthopaedic & Sports Physical Therapy® prove performance. literature and provide more accurate Dutch.OFE?DJI making it difficult to detect subsequent ferent prognostic variables on these would worsening or improvement. the with musculoskeletal dysfunction. for whiplash-associated disorder III. 1.23. performed for the NDI in the following validation studies. Ital- this review suggests further investigation ity of their translations and provide ian for Switzerland. to the subjective categories reported current benchmarks set for normality whiplash-associated disorders.D:? by 2 to restore the expected metric. 2014. Portuguese. Short-term therapy weighs pain and disability according to confidence to indicate “normal” ver. [ LITERATURE REVIEW ] NDI performs as well as the original NDI H.43 a statement reflecting that these dis- and specific data analyses are needed to 2. as recommended by the developer.1. but should include and for levels of disability are arbitrary.11. it is not clear whether translation process. Italian. Defining detectable change on the NDI. so or comparing scores as to whether scores need for this revision. Korean.48. like disorder I and II.10. German. Studies that determine NDI bench.DJ answered questions must be divided An advantage of this suggestion is that E<J>. Authors should consider the availabil.

doi. Bolton JE.1016/j. for psychometric articles. Chiro Med. http://dx. disability associated with whiplash-associated whiplash patients: usefulness of the neck dis- ing clinically significant improvement. Karaduwman A.1097/01.jmpt. Refshauge KM. of 4 neck pain and disability questionnaires. ability index.0b013e31817144e1 patients with chronic whiplash. All rights reserved.0b013e31815ce6dd BRS. Downloaded from www. Gretz SS. 2000.doi. 2007. 2008:389-392. Mior S. 2007.30:259-262.30 Vernon and Mior48 suggest ian Portuguese version of the Neck Disability M. Zilvold G. Predictive value of fear avoid- MB.19:1307-1309. Leino E.org/10. 5 and 14 mild disability. [Validation of a Spanish version pain and disability scale: test-retest reliability and naires to predict long-term health problems of the Neck Disability Index].31:1841-1845.53069. Psychometric proper.$ Makela M. Lindgren KA. Spine. have a score greater than 30. Prevalence. Lindgren U. Spine.doi. Risk guide.doi. Hobbs H. Neck disability index Dutch 2007. et al. 2003. eds. Arch 2000. Green S.39:358-362. and Phys Med Rehabil. Hoving JL.1186/1471-2474-9-42 ). No other uses without permission. M. Whitman JM. Documenting out. (.doi. J Manipulative al.org/10. Minimal clinically important change of math. ance in developing chronic neck pain disability: adaptation of self-report measures.52 2008. Evaluation of the core outcome measure in and Numeric Pain Rating Scale in patients with et al. http://dx.2340/16501977-0060 Vet HC.” has been suggested by Miet. construct validity. Comparison G. Hermens HJ.9:42. The reliability of the Vernon and Mior neck of the Neck Disability Index and the Neck disability index.03. Madson TJ. Spine. Subjective outcome assessments apmr. Cieslak KR.doi. SAS Global Forum 2008. The clinimetric qualities of patient. Rehabil. J Manipulative Physiol Ther. interpretation and 24 moderate disability. Thorofare. Oder G. Schipholt HJA. the Neck Disability Index and the Numerical  .doi. Validity and reliability patients with chronic uncomplicated neck pain. Neck pain in the comes in neck pain patients. (+$ MacDermid JC. Sievers K. Med Clin (Barc).brs. http://dx. Fritz JM. (*$ Lee H. DC.$ MacDermid JC. et al.1097/ Physiotherapy Singapore. which represents “none to mild ''$ Cook C. Evidence- vere disability.4:113-134.jht.27:515-522. San Antonio. Stratford PW. chbinder R. evaluation form. Silcocks P. Yagly N.34:284-287. 2001. Grevitt MP.112:94-99. on outcome measures to hand therapy practice. Turkish patients with neck pain. BMC Musculoskelet Disord. 2008:387-388.org/10. ated with chronic. http://dx. those with mild assessed instruments for measuring chronic J Hand Ther. Disability in subacute measures in patients with neck pain: detect.D9.doi. '($ Croft PR.21:75-80. non-traumatic neck pain. Wheeler AH.doi. J Rehabil Med. Spine. Balsor B. Thorofare. Spine.doi.1016/j. Nederlands Tijd. determinants.75367. general population. et  )$ Aslan E.0b013e31815cf75b  /$ Cleland JA. Bouter LM. Disability Scale for measuring disability associ. mechanical neck pain. scale in patients with cervical radiculopa. of the Neck Disability Index in physiotherapy.32:3047-3051. http://dx. Asman S.22 org/10. org/10.doi.org/10.2008. that a score between 0 and 4 represents 2006. thy.102:273-281.org/10.32:E825-831. Palmer JA. those with moderate to severe disability org/10. Ferraz schrift Voor Fysiotherpie.$ Chok B. Disorders. MacDermid JC. http://dx. 2008. The neck The possibility to use simple validated question- mecija Ruiz R.doi. Am J Epidemiol. with neck pain: a Turkish version study.jospt. 2002. The cultural adaptation. Spine.16 quality for psychometric articles. The reliability and construct validity of the Neck Halaki M.0b013e31817eb836  . 2004. Bolton JE. Validity of the neck disability index.90914. NJ: Slack Inc.02. Stratford P. Spine. and ankle instability: a systematic review. 5. WJ. The reliability and application metric characteristics of the Spanish version Rating Scale for patients with neck pain. http://dx. Sterling et al41 suggest ')$ Eechaute C. eds. Elvers JWH. http://dx.29:2410-2417.1007/s00586-007-0503-y dex. Eur Spine J. 2005. Sensitivity and specificity of outcome problem elicitation technique for measuring ))$ Nieto R. 1994. http:// 2004.33:E630-635.org at University of Washington on December 1. Gomez E.brs. Adams RD. )($ Nederhand MJ. Royuela A.32:696-702. 25 and 34 se. 1991. Clair DA.org/10. Standard scales for measure-  -$ Chan Ci En M.2007. Aromaa A. Evidence-Based Rehabili- Index and Neck Pain and Disability Scale. cultural adaptation and validation of the Brazil. 2008. of a modified version of the neck disability in- Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.$ Rebbeck TJ. Development and psycho. dx. endurance following whiplash.35035. http://dx. Yakut Y. 2004.1:5-22. 2007. http://dx.126 org/10. 2014.16:2111-2117. J '&$ Cleland JA. Spine. Arch Phys Med Rehabil. Applying evidence that individuals who have recovered have Duquet W. http:// BRS. dx.1097/ 2008.<. Pain. Psycho.doi. '$ Ackelman BH. (-$ MacDermid JC.18:245-250. http://dx. Parkinson WL. Simsek ties of the neck disability index. 2002. 2002. Parnianpour M. Northwick Park neck pain questionnaire.29:E47-51. plete disability. after whiplash injury.0000221989.93:317-325.doi. a NDI score of 8 or less.31:1621-1627. Van Aerschot L.89:69-74.$ Heijmans WPG. Vaes P. Critical appraisal of study quality disability. Miro J. )+$ Pool JJ. Hepguler S.33:E362-365. Guillemin F.009 )&$ Miettinen T. Whitman JM.$ Riddle DL.07. bil Med.25:3186-3191. Guidelines for the process of cross-cultural '/$ Hoving JL. Montazeri A. of instruments to measure neck pain disability. 2006. Ijzerman MJ. Spine. disability have a score of 10 to 28.0000201241. http://dx.  ($ Andrade Ortega JA. Spine. Disabil org/10.a5 points. demands considered. Schrader H. et al.1097/01.doi. Goldsmith CH. MacDermid JC. 2002. reliability Physiol Ther. (. Maher CG.134:1356-1367. de Man Ther. consequences for clinical decision making.31:598-602. Validity ('$ Kose G. J Whiplash Rel )*$ Pietrobon R. Psycho. Horizon Estimation and consequences of chronic neck pain in Fin- Using SAS Software. disorders. http://dx. (&$ Humphreys H. 2002. Richardson JK. 1998. Bu. Papageorgiou AC. Niere KR. 2007. Critical appraisal of study no disability. (/$ McCarthy MJ. In: Law tinen et al.org/10. Heliovaara M.1016/j. Ostelo RW. Al- '.1097/01. 2007. Spine.2004. T '*$ Foster GA. BMC org/10.2007. and the Neck Pain and Disability Scale. Oostendorp RAB. Lewis M. 2008. Huguet A. Turk Journal of Orthopaedic & Sports Physical Therapy®  *$ Beaton DE.005 (($ Kovacs FM. Stewart metric properties of the Neck Disability Index ()$ Kumbhare DA. Braga L. Sand T. Spine. Childs JD.17:165-173. Bae SS. A ment of functional outcome for cervical pain of the Neck Disability Index and Neck Pain and comparison of four disability scales for or dysfunction: a systematic review. J Reha. For personal use only.1097/ version (NDI-DV): investigation of reliability in BRS.1097/BRS.0000257595. Translation and validation study of the IE. Aras B. discussion 2418.1097/01. 2008.brs.http://dx. 2006. Edmondston SJ.doi.0000227268. and greater than 35 com.08.85:496-501. )-$ Resnick DN. 2007. A “normal” score of between 0 to 20 org/10. Airaksinen O.005 Musculoskelet Disord.$ Goolkasian P. Pain.  '-$ Hains F. Waalen J.$ Bovim G. Bombardier C. Carey TS. NJ: Slack Inc. O’Leary EF. 15 brs.org/10. Iranian versions of the Neck Disability Index and validity of neck disability index in patients '. Spine.org/10. Maher CG. DeVellis RF. Measurement of cervical flexor whiplash. 2004. ). Delgado Martinez AD. Bago J.1197/j.H.1080/09638280400020615 for cervical spine pathology: a narrative review. Cross. land.8:6. Nicholson LL.27:801-807. http://dx. org/10. 2005. Impivaara O.3:16-19. TX: 2008.130:85-89. Richardson general population. Coeytaux RR. )'$ Mousavi SJ. tation. Clin J Pain. factors for neck pain: a longitudinal study in the Based Rehabilitation. Atamaz F. and its validity compared with 8ekhd[cekj^Gk[ij_eddW_h[_dWiWcfb[e\ the short form-36 health survey questionnaire. Use of generic versus Disability Index and patient specific functional metric testing of Korean language versions region-specific functional status measures on journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 415 . In: Law M.doi.1186/1471-2474-8-6 Knekt P.I '+$ Gay RE.

1991.doi. click “Take Exam” for that article. impairment and sincerity of effort in ity scales for neck pain. The abil- *'$ Sterling M.122:102-108.01. Take the quiz.doi. Joint Bone Spine. 3rd.1002/ Downloaded from www.spinee. http://dx.doi.83:376-382.24399 pain.$ Vernon H. Maher CG. Koes BW. Mior S.04. Spine.JOSPT.jospt. Paganas AN.org and click the link in the “Read for Credit” box in the right-hand column of the home page.2006. Disability Index in patients with acute neck pain 1999.7:174-179. Using the Neck */$ Vos CJ. Kenardy J.org/10.2 CEUs for each quiz passed. For personal use only. Vernon HT. [ LITERATURE REVIEW ] patients with cervical spine disorders. *+$ van der Velde G. translation and validation of 3 functional disabil- 2007.31:491-502. Beaton D.ORG EARN CEUs With JOSPT’s Read for Credit Program Journal of Orthopaedic & Sports Physical Therapy® JOSPT’s Read for Credit (RFC) program invites Journal readers to study and analyze selected JOSPT articles and successfully complete online quizzes about them for continuing education credit. Physiother Canada. The Neck Disability Index: state-of.0000256380. Catanzariti *($ Stewart M. http:// to standard assessment tools used in spine dx. Hogg-Johnston S. 3. Eur Spine J. Kakavelakis KN. The RFC program offers you 2 opportunities to pass the quiz. *)$ Stratford PW.2003.1186/1471-2474-9-106 use in persons with neck dysfunction. Riley LH. *. 1998. Physiol Ther. 2004. http:// orders.61:544-551.71056. 5.6d whiplash-associated disorder. Nicholas M.$ Vernon HT. WWW. 416 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .0000105535.doi. jmpt. org/10. Fermanian J.1007/s00586-006- Ther. No other uses without permission. Revel M.doi.1097/01.org/10.78:951-963. Psy. http://dx.29:182-188. 2009. Go to www.1016/j. Choose an article to study and when ready. Padfield B. Jull G. Rasch analysis provides +'$ White P. J Muscskel Pain.15:1729. Responsiveness of pain and dis. Riddle DL. Spine. Lewith G. http://dx. Arthritis Rheum. 4. Rannou F. ability. ability measures for chronic whiplash.014 *. pain. responsiveness of the Dutch version of the Neck CEH.2006. http://dx. Vicenzino B.29:1923-1930. The core outcomes *&$ Sterling M. brs. ity to change of three questionnaires for neck acterization of acute whiplash-associated dis. Physical and new insights into the measurement properties for neck pain: validation of a new outcome mea- psychological factors maintain long-term of the neck disability index. 2. J Manipulative Physiol Ther. 2004.?D<EHC7J?ED Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.12598. 1736.51:107-112. JF. http://dx.004 org/10. Spadoni 2000. The chometric properties of the Cervical Spine Greek version in a sample of neck pain patients. Catanzariti 2006.org/10. 2002. 1998. Reliability and @ Disability Index to make decisions concern. Phys **$ Trouli MN. G.2008. Rannou F. Evaluate the RFC experience and receive a personalized certificate of continuing education credits.32:580-585. Tennant A. All rights reserved.org/10.BRS. Spine J.14:409-415. predictive capacity post-whiplash injury. http://dx.AE *-$ Vernon H. You may review all of your answers—including the questions you missed. To participate in the program: 1.org at University of Washington on December 1. 2008. Refshauge KM. Prescott P.71:317-326. Binkley JM. art.org/10.org/10. patient-specific functional scale: validation of its EkjYec[iGk[ij_eddW_h[WdZ_jih[bWj_edi^_f BMC Musculoskelet Disord. Kenardy J. 2004.jbspin.8:155-167.08. in general practice. 2008. French N. Stratford PW. sure. Spine. You receive 0. +($ Wlodyka-Demaille S. http://dx.006 dx. Antono. Translation 0119-7 )/$ Skolasky RL.doi. Fermanian J. Bogduk study of reliability and validity. Binkley JM. Sports Phys Ther. Login and pay for the quiz by credit card.1016/j. The Neck Disability Index: a +)$ Wlodyka-Demaille S. Verhagen AP. 1991-2008.005 Hurwitz E.1016/j. 2014.07. Poiraudeau S.jospt. poulou MD. Westaway MD. Albert TJ.27:331-338. Arch Phys Med Rehabil.doi.9:106.doi. and the Journal website maintains a history of the quizzes you have taken and the credits and certificates you have been awarded in the “My CEUs” section of your “My JOSPT” account. of the Neck Disability Index and validation of the +&$ Westaway MD. the-art. ing individual patients. Assessment of self-rated dis. Char. Revel M. Lionis CD.org/10.1097/01.1016/j. J Orthop research. J Manipulative JF. Jull G.doi. 2007. 2006. Poiraudeau S. Pain.

Q The pain is fairly severe at the moment. Q I can’t read as much as I want because of severe neck pain. APPENDIX A NECK DISABILITY INDEX This questionnaire is designed to help us better understand how your neck pain af. Q I have headaches almost all the time. Q Neck pain prevents me from lifting heavy weights.jospt. Q I need help every day in most aspects of self -care. Q The pain is very mild at the moment. Q I have some neck pain with a few recreational activities. please contact Q I have moderate headaches that come frequently. Q I have neck pain with most recreational activities. Q The pain is very severe at the moment. any one section relate to you. but it gives me extra neck pain. but I can manage light weights if Q I can read as much as I want with moderate neck pain. Q I can’t drive as long as I want because of moderate neck pain. Section 7: Sleeping Q The pain is moderate at the moment. one box that applies to you. Q I can’t do any recreational activities due to neck pain. Section 6: Concentration fects your ability to manage everyday-life activities.org at University of Washington on December 1. although you may consider that two of the statements in Q I can concentrate fully with slight difficulty. Q My sleep is greatly disturbed for up to 3-5 hours. Q I can hardly do recreational activities due to neck pain. Section 2: Personal Care Q My sleep is completely disturbed for up to 5-7 hours. Q I can look after myself normally without causing extra neck pain. and I am slow and careful Q I can drive my car without neck pain. Q I have no neck pain during all recreational activities. Q I have a great deal of difficulty concentrating. Q I have moderate headaches that come infrequently. are conveniently positioned. present-day situation. For permission to use the NDI. 1991. ie. I wash with difficulty and stay in bed. Q I can hardly drive at all because of severe neck pain. Q I can look after myself normally. Q I can hardly do any work at all. Q I can read as much as I want with slight neck pain. Q I can only do my usual work. but no more. Section 3: Lifting Q I can’t drive my car at all because of neck pain. Q I can lift only very light weights. Downloaded from www. Q I do not get dressed. Q I need some help but manage most of my personal care. Q I have no neck pain at the moment. but no more. Q I can’t read at all. Copyright: Vernon H & Hagino C. Dr. Section 9: Reading Q Neck pain prevents me from lifting heavy weights off the floor but I can manage if items Q I can read as much as I want with no neck pain. Q I can lift heavy weights. All rights reserved. Q My sleep is slightly disturbed for less than 1 hour. Q The pain is the worst imaginable at the moment. Q I can drive as long as I want with moderate neck pain. Q My sleep is moderately disturbed for up to 2-3 hours. Section 8: Driving Q It is painful to look after myself. For personal use only. Section 1: Pain Intensity Q I can’t concentrate at all. Journal of Orthopaedic & Sports Physical Therapy® they are conveniently positioned Q I can’t read as much as I want because of moderate neck pain. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | 417 . Q My sleep is mildly disturbed for up to 1-2 hours. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®.ca Q I have severe headaches that come frequently. Score __________ [50] Q I have slight headaches that come infrequently. Section 4: Work Section 10: Recreation Q I can do as much work as I want. Howard Vernon at hvernon@cmcc. Q I can drive my car with only slight neck pain. but it causes extra neck pain. Q I cannot lift or carry anything at all. Q I can’t do my usual work. Q I have no trouble sleeping. Please mark the box that most closely describes your Q I have a fair degree of difficulty concentrating. Q I have some neck pain with all recreational activities. Q I have a lot of difficulty concentrating. No other uses without permission. Q I can’t do any work at all. on a table. 2014. Q I can do most of my usual work. Section 5: Headaches Patient Name _______________________________________ Date _____________ Q I have no headaches at all. Q I can lift heavy weights without causing extra neck pain. Please mark in each section the Q I can concentrate fully without difficulty.

not to evaluate the validity or value of that piece of information. read the accompanying guide and be as specific as possible when extracting information. All rights reserved. Data Extracted Population studied Population Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. To make data extraction as useful as possible and to avoid the need for repeated data extractions. it is important to realize that the purpose of data extraction is to remove or extract the specific information reported by authors within a study. [ LITERATURE REVIEW ] APPENDIX B DATA EXTRACTION FORM FOR STUDIES EVALUATING THE PSYCHOMETRICS OF OUTCOME MEASURES Authors: _______________________ Year: ____________ Rater: ___________ Downloaded from www.jospt. For personal use only. 2014.org at University of Washington on December 1. No other uses without permission. Instructions When using the data extraction form. Intervention Reliability Reliability (relative) Journal of Orthopaedic & Sports Physical Therapy® Reliability (absolute) Minimum detectable change (MDC) Content/structural validity Internal consistency Content validity Floor/ceiling effects Factorial validity Item response/Rasch analyses C1 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .

2014. For personal use only. No other uses without permission. Divergent Longitudinal validity Concurrent criterion Predictive criterion Responsiveness/clinical change Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Construct/criterion validity Known groups Convergent Downloaded from www. Responsiveness Minimally clinical important difference (MCID) Usefulness/practicality Journal of Orthopaedic & Sports Physical Therapy® Readability Interpretability Time to administer Administration burden Cultural applicability © JC MacDermid 2008 journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C2 .org at University of Washington on December 1. All rights reserved.jospt.

Relative reliability evaluates the extent to may be performed on different occasions (test- which variations on repeated assessment of the same retest) for self-report measures. The accompanying extraction form can then be used to collect specific information about these psychometric or utility properties from particular studies. and how it might be measured within a study. clinically important difference (CID).org at University of Washington on December 1. Studies will not address every aspect. All rights reserved. To this end. frequency. No other uses without permission. and the follow-up interval Reliability Reliability description The extent to which the same results are obtained Test procedures or measures are typically reapplied on repeated administrations of the same measure when on repeated occasions in individuals considered to no change in status has occurred (reliability) or how have a stable condition during that timeframe in precise the scores are on repeated measurements which repeated testing occurs. you may elect to present the synthesized data in a descriptive way by creating a summary table in each category. or standard error of measurement (SEM). There is no clear or easy method to synthesize this information. the following guide has been devised to describe the separate properties that might be evaluated in a psychometric study. or different raters (interrater) if a individuals. Listed here are an explanation of each property Downloaded from www. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. Some consider that sufficient time between tests is within 48 hours for acute conditions. SEM is used to pres- ent a quantitative estimate of the reliability in the original units of measure. and 4–14 days for chronic conditions. In some cases. OR by the same individual compare to the variability between rater (intrarater). [ LITERATURE REVIEW ] DATA EXTRACTION GUIDE FOR STUDIES EVALUATING THE PSYCHOMETRICS OF OUTCOME MEASURES Instructions Psychometric studies may evaluate a wide spectrum of potential psychometric properties and aspects that relate to the utility or usability of outcome measures.jospt. you may be able to conduct a meta-analysis of some properties such as minimal detectable change (MDC). setting. Based on the findings of extraction. it is important to realize that the purpose of data extraction is to remove or extract the specific information reported by authors within a study. and intensity Journal of Orthopaedic & Sports Physical Therapy® studies of the intervention. it is useful to collect information within standardized domains so that this information might be more easily synthesized to provide a summary of our collective knowledge about psychometrics and the utility of any given outcome measure. server-based scale is used. Methods That Might Be Used to Collect This Property Information/Relevant Statistics Population studied Population A description of the study population Report meaningful demographics and indicators of the population studied: sample size. For personal use only. The most common statistic used is the intraclass correlation coefficient (ICC) for quantitative data and Kappa for nominal data. When using the data extraction form. pathology/ disorder. Repeated testing (agreement). acute versus chron- ic. Report the type of reliability evaluated and coefficients obtained. demographics. it is best to be as specific as possible when extracting information. To make data extraction as useful as possible and to avoid the need for repeated data extractions. Evaluation of the quality of articles (also called critical appraisal) is performed in a separate step. how and where subjects were chosen. 2014. not to evaluate the validity or value of that information. If you find some studies with similar designs. different test instruments (inter-instrument) are evaluated. Reliability (relative) The extent to which variability in test scores on repeated ICCs and their associated confidence intervals tests of the same person are related to variability overall (including between people) C3 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . Intervention Interventions (if applicable) applied for longitudinal Description of the nature. In summarizing what is known about the psychometrics and utility of outcome measures.

Extract the number and level of confidence this measure reflects the amount required to change before achieving a level of confidence that exceeds the random error that occurs in stable patients. Content validity The extent to which the domain of interest is adequately A variety of techniques can be used to assess the sampled by the items in the measure. so extract how these attributes were determined as well as the percentage. 2. No other uses without permission. SEM (possibly expressed as coefficient of varia- no change in status has occurred (reliability) or how tion in older articles) precise the scores are on repeated measurements (agreement). content.jospt. 5. 4. All rights reserved. For personal use only. Extract what was done and what was found. Report alpha and whether it relates to measured) the entire instrument or specific subscales. During translation. Floor-ceiling effects When the measure fails to demonstrate a worse score in Descriptive statistics of the distribution of scores patients who clinically deteriorated and/or an improved were presented graphically or numerically and score in patients who clinically improved indicate this. 1. Cognitive interviews were done with patients to determine the meaning of items. In assessing con. Altman and Bland graphical technique where the ability in individual units portray the actual amount of difference on repeated tests for each individual is variation expected between repeated applications— plotted versus their mean score (and the overall reported in the original units of measure. Reliability (absolute) The extent to which the same results are obtained on This may be reported as repeated administrations of the same measure when 1. Report the percentage of patients who sustained a floor or ceiling effect. 2. Patients and experts were involved during item selection/reduction. specific study of the meaning of the questions to another cultural or language group was studied. the completeness of the interest. and limits of 2 standard deviations are shown) Downloaded from www. Specific quantitative estimates of reli. extent to which items on a given measure reflect tent validity.org at University of Washington on December 1. Some of the techniques typically found are Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. it is important to consider the population the necessary content to capture the concept of to which the measure applies. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C4 . 2014.Factor analysis is conducted and compared to the tions surrounding constructs measured as defined by inherent structure of the instrument or factor analy- the measure or as indicated by subscale structure sis on which its construction is based. and emphasis. the content’s relevance. Expert panels or Delphi procedures were used Journal of Orthopaedic & Sports Physical Therapy® to select items or evaluate the validity of the instrument. Patients were consulted for reading and comprehension. Content/structural validity Internal consistency The extent to which items on a test or subscale are Cronbach’s alpha is the inter-item correlation usu- related (an indication of the consistency of the concept ally reported. 3. MDC Based on the reliability and defined level of confidence. Note that different studies may define floor/ceiling differently. Report the number of factors derived and the extent to which these findings agree with the instrument structure or original factor structure. listed. Factorial validity The extent to which factor analysis supports assump.

C5 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .jospt. Concurrent criterion Comparing a scale and its criterion at a single point in Extract the test names and correlations time assesses concurrent validity. Results are validity where it is assumed they measure different evaluated to determine whether they are acceptable constructs. Construct validity is the extent to sures in a manner consistent with theoretically which measures perform according to a priori defined derived hypothesis concerning the domains that are constructs.70) and moderate (0. Report whether these analyses were conducted and if they determined that items crossed a range of difficulty.70) correlations Divergent Relationship to scales/tests assumed unrelated Extract test names and correlations Longitudinal validity Relationship between change scores among similar Extract test names and correlations Journal of Orthopaedic & Sports Physical Therapy® scales/tests measured on different time points where change is expected Criterion validity description Criterion validation is determined by comparing a given Authors will state that their measure is being com- outcome measure to an accepted standard of measure. Convergent Relationship between similar scales/tests Extract test names and correlations.40–0. individual ability curves. Extract the test names and correlations. or Items can be placed along a spectrum using item analyses a spectrum of the concept measured response theory or Rasch analysis. correlation or agreement between the measures. the order of item placement in a test and/or the discriminative ability of specific items are defined. pared against a specific instrument and report the For subjective constructs such as pain and disability. Construct validity can be cross-sectional measured OR the expected differences between or longitudinal (predictive). Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. therefore. Concurrent validity is assessed by comparing a scale and its criterion at a single point in time. In other cases. Most commonly. Rasch analysis can be used to evaluate Downloaded from www.org at University of Washington on December 1. All rights reserved. consider grouping into strong (>0. in accordance with the hypotheses. validation focuses on construct validity. the term comparability is used when scales measure the same subjective construct. Construct/criterion validity Construct validity Constructs are artificial frameworks that are not The extent to which scores relate to other mea- description directly observable. their mean scores. it is sometimes argued that there is no criterion and. Predictive criterion Predictive validity is evaluated by determining the Extract the test names and correlations and time extent to which the results of administering an outcome interval measure at 1 point in time is associated with a future status or outcome. hypotheses are formulated. differential item functioning (DIF)—report if done and whether items demonstrated DIF. and comparison of ability estimation. No other uses without permission. Analyses might address item difficulty. In thought to represent similar constructs or divergent each case.Extract the groups. Predictive validity is evaluated by determining the extent to which the results of administering an outcome measure at one point in time is associated with a future status or outcome. 2014. Constructed hypotheses “known groups” OR in relation to other indices that can assess convergent validity where measures are are similar (convergent) or different (divergent). particularly if one is considered more accepted or rigorous. when summa- rizing. and the difference significance is assessed. Known groups Known groups are constructed on theoretical assump. For personal use only. and level of tions that they should be different. [ LITERATURE REVIEW ] Item response/ Rasch The extent to which items cross a range of difficulty.

This must be done in a systematic way and the method should be documented. standard response mean. stable. Extract study data on scoring method burden Authors provide specific information on the ease of scoring. in patients’ management. including effect concept being measured. Usefulness/practicality Readability The questionnaire is understandable by all patients. if appropriate. normal/abnormal Journal of Orthopaedic & Sports Physical Therapy® Time to administer Authors provide specific information on time to administer. compare scoring algorithms and administrative burden and report this. Authors State method and results obtained provide specific information on readability as evaluated by the target population or specific tools are applied to evaluate readability (eg. rating of change with different methods of rating or establishing cutoffs for importance are often used). during a systematic review. Responsiveness/clinical change Responsiveness The ability to detect important change over time in the Extract indicators of responsiveness. and CID are 3 different terms for How clinically meaningful is defined may vary (global the same concept. the method used to validate/ quantitative scores or define subgroups by scores. Hypotheses are formulated and results are in agreement. improved (or have improved an important amount) is or worse. packages) Interpretability State the type of categorization and relevant scores The degree to which one can assign qualitative meaning to and. Extract time Administration burden Ease of method used to calculate the questionnaire’s score. etc) 2. No other uses without permission. Comparative data for relevant subgroups (acute versus chronic. For example. Authors analyze subgroup differences provide information on the interpretation of scores: 1. If this is not specifically reported in the text of the study. it cannot be extracted. Subjects are evaluated at size. grade level can be assigned in some software Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. All rights reserved.jospt. 2014. Downloaded from www. Validation of subjective categories of outcome: excellent/ good/fair. tance. For personal use only.org at University of Washington on December 1. multiple raters should evaluate the relative com- plexity of these scoring algorithms. to extract the specific scoring algorithm for an instrument and compare it to others. and the method for 2 points in time and change in patients who had assessing whether patients were improved. CID The difference in scores in the domain of interests that Extract the minimally important difference (MID). However. MCID. MID. compared across different measures (and/or to patients who have not improved). journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C6 . Information is provided about or CID and the method/cut-off used to define impor- what difference in score would be clinically meaningful. patients perceive to be beneficial mandates a change minimally clinical important difference (MCID).

including a structured translation process that includes forward and backward translation and retesting Downloaded from www. a number of analyses may be performed. however. For personal use only. Journal of Orthopaedic & Sports Physical Therapy® C7 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy .org at University of Washington on December 1. an annotation should be made to note that the psychometric property applies to an adapted version. 2014. If specific reliability and validity values are reported. All rights reserved. © JC MacDermid 2008 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. reliability of the translated version.jospt. In cross-cultural adaptation. they can be documented in the appropriate data collection boxes above. Highlight the results of cross-cultural adaptation as well as any languages/cultural adaptations that have been reported. No other uses without permission. Cross- cultural adaptation may be used to translate measures into different languages and to convert any items having specific meaning to culturally relevant items. and a formal evalua- tion of the meaning of items in different cultures. [ LITERATURE REVIEW ] Cultural applicability The extent to which an instrument contains items that will Extract language/cultural translation performed be meaningful across different cultural groups or subgroups (and relevant findings) for which the measure is relevant or may be applied.

benchmark comparisons. Was appropriate retention/follow-up obtained? (Studies involving retesting or follow-up only) Journal of Orthopaedic & Sports Physical Therapy® Measurements 7. Was the relevant background research cited to define what is currently known about the psychometric properties of the measures under study and the need or potential contributions of the current research question? Study design 2. All rights reserved. Were appropriate statistical tests conducted to obtain point estimates of the psychometric property? 11. Were specific psychometric hypotheses identified? 4.jospt. Study question 2 1 0 1. 2014. Documentation: Were specific descriptions provided or referenced that explain the measures and their correct application/interpretation (to a standard that would allow replication)? 8. For personal use only. Were appropriate inclusion/exclusion criteria defined? Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. CRITICAL APPRAISAL OF STUDY DESIGN FOR PSYCHOMETRIC ARTICLES: EVALUATION FORM Authors: _______________________ Year: ____________ Rater: ___________ Evaluation Criteria Score Downloaded from www. Was an appropriate sample size used? 6. minimally important difference [MID])? journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C8 . Was an appropriate scope of psychometric properties considered? 5. Standardized methods: Were administration and application of measurement techniques within the study standardized. Were analyses conducted for each specific hypothesis or purpose? 10. No other uses without permission.Were appropriate ancillary analyses done to describe properties beyond the point estimates (confidence intervals [CI]. and did they consider potential sources of error/misinterpretation? Analyses 9.org at University of Washington on December 1. 3. standard error of measurement [SEM].

© JC MacDermid 2008 Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 2014. and results? Subtotals (of columns 1 and 2) Downloaded from www. or. Journal of Orthopaedic & Sports Physical Therapy® C9 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . then sum items.jospt. Were the conclusions/clinical recommendations supported by the study objectives. All rights reserved.org at University of Washington on December 1. and multiply by 100 to get the percentage score. Total score % (sum of subtotals/24 × 100). No other uses without permission. if for a specific paper an item is deemed inappropriate. analysis. [ LITERATURE REVIEW ] Recommendations 12. divide by 2 times the number of items. For personal use only.

but is insufficient to allow the reader to generalize the study to a specific population. construct (known group differences. test- retest). No other uses without permission. 0 Specific types of reliability or validity under evaluation were not clearly defined. Established a research question based on the above. but without additional information. A priori hypothesis definitions are provided of level of reliability and for validity. but not all information on participants and place is provided. and established predictive. 4. journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C10 . 1. CRITICAL APPRAISAL OF STUDY DESIGN FOR PSYCHOMETRIC ARTICLES Interpretation Guide To decide which score to provide for each item on a quality checklist. 1 Two or more psychometric properties were evaluated. Presented a critical and unbiased view of the current state of knowledge. intraclass correlation coefficients (ICC). 1 Some. and the rationale was not founded on previous literature. Performed a thorough literature review. A detailed focus on reliability that includes multiple forms of reliability (at least 2 of intrarater. 0 The scope of psychometric properties was narrow as indicated by evaluation of only 1 form of reliability or validity. (“The purpose of this study was to investigate the reliability and validity of…” can be rated as zero if no further detail on the types of reliability and validity or the nature of specific hypotheses is provided. Indicated how the current research question evolves from a gap in the current knowledge base. internal consistency and 1 test/form of validity). responsiveness. 1 Types of reliability and validity being tested were stated. and minimally important difference (MID)] 2. read the following descriptors. factor analyses). Question Score Descriptors Study question 1 2 The authors: Downloaded from www. Study design 2 2 Specific inclusion/exclusion criteria for the study were defined. 3. interrater.jospt. 0 A foundation for the current research question was not clear. and specific hypotheses on reli- ability and validity were not stated. 2. 2014. however. the practice setting was described. expert review/survey or qualitative interviews) or statistical (eg. longitudinal/concurrent. treat it as not done. scope was narrow and did not meet the above criteria (eg. For personal use only. as well as both relative and absolute reliability [eg. convergent/divergent associations). 3 2 Authors identified specific hypotheses.org at University of Washington on December 1. and appropriate Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. 0 No information on the type of clinical settings or study participants is provided. evaluative or discriminative properties 3. All rights reserved. which included the specific type of reliability (intra/interrater or test- retest) or validity (construct/criterion/content. yielding a study group generalizable to a clinical situation. 1 All of the above criteria were not fulfilled (little reference to previous research and present gaps in knowledge).) 4 2 An appropriate scope of psychometric properties would be indicated by: 1. age/sex/diagnosis and the name or type of the practice is given. but a clear rationale was provided for the research question. standard error of measurement (SEM). For example. as well as expected relationships (strength Journal of Orthopaedic & Sports Physical Therapy® of associations) or constructs. but not clearly defined in terms of specific hypotheses. demographic information was presented. Pick the descriptor that sounds most like the study being evaluated with respect to a given item. indicating what is currently known about the psychometric properties of the instruments or tests under study from previous research studies. criterion (concurrent/predictive). If there is no documentation of an action. A detailed focus on validity that includes multiple forms of validity (content) judgmental (structured. Information on the type of patients is briefly defined. convergent/divergent) being tested. eg. Some aspects of both reliability and validity were examined concurrently.

1 The authors provided a rationale for the number of subjects included in the study. For simple reliability/validity statistics if sample is greater than 100 but without justification for sample size. calibration of equipment if necessary. 0 No description of the extent to which the above standards were maintained or an obvious source of bias in data collection methods. use of consistent measurement tools and scoring. too many missing items] or individuals [language. etc]). a priori exclusion of any participants likely to give invalid results or unable to complete testing (not exclusion after enrollment). training required. this is characterized by a statement of who administered the forms and by what process (definition of rules for exclusion of forms [eg. 1 More than 70% of the patients eligible for study were reevaluated. Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. then the text describes the procedures in sufficient detail so they can be replicated. 0 Data were not presented for each hypothesis or psychometric property outlined in the purposes or methods. this includes calibration of any equipment. scoring (including how scoring algorithms handle missing data). For personal use only. [ LITERATURE REVIEW ] 5 2 Authors performed a sample size calculation and obtained their recruitment targets. This may be accomplished through organization of the results under specific subheadings. cost. or a limited description of procedures is included in the text. C11 | may 2009 | volume 39 | number 5 | journal of orthopaedic & sports physical therapy . 6 2 Ninety percent or more of the patients enrolled for study were reevaluated. but did not present specific sample size calculations or post-hoc power analyses. and test procedures. and interpretation of test procedures that includes any necessary information about positioning/active participation of the subject. 0 Size of the sample was not rationalized or is clearly underpowered. 1 Procedures are referenced without any details.jospt. Post-hoc power analyses and/or confidence intervals (CI) confirm that the sample size was sufficient to define relatively precise estimates of reliability or validity. were performed in a standardized way.org at University of Washington on December 1. Downloaded from www. but authors did not clearly link analyses to hypotheses. including test administration and scoring. Journal of Orthopaedic & Sports Physical Therapy® 1 No obvious sources of bias in how tests were performed/administered. All rights reserved. and examiner procedures/actions. No other uses without permission. 0 Minimal description of procedures is provided and without appropriate references. but minimal attention or description of the extent to which the above standards were maintained. For impairment measures. comprehension. For self-report. Analyses 9 2 Authors clearly defined which specific analyses were conducted for each of the stated specific hypotheses of the study. If no manual is referenced. Measurements 7 2 The authors provided or referenced a published manual/article that outlines specific procedures for administration. 8 2 All of the measurements. 0 Less than 70% of the patients eligible for study were reevaluated. 1 Data were presented for each hypothesis. any special equipment required. use of standardized instructions. 2014. or by demarcat- ing which analyses addressed specific psychometric properties. Data were presented for each hypothesis/research question.

2014. with post-hoc tests that adjusted for multiple testing 4. Comparison to appropriate benchmarks or standards 3.org at University of Washington on December 1. Validity associations—Pearson correlations for normally distributed data. however. For personal use only. but not both. or other correlations if appropriate b. All rights reserved. CIs and benchmarks should be used according to standards for this type of analysis. Journal of Orthopaedic & Sports Physical Therapy® 0 Authors made vague conclusions without any clinical recommendations. Spearman rank correlations for ordinal Downloaded from www. data. MID 3. Responsiveness—standardized response means or effect sizes or other recognized responsiveness indices were used 1 Appropriate statistical tests were used in some instances. 0 Inappropriate use of statistical tests 11 2 For key indicators such as reliability coefficients indices. Kappa for nominal data). OR authors made conclusions and clinical recommendations for only some of the study hypotheses. Reliability (Relative. © JC MacDermid 2008 journal of orthopaedic & sports physical therapy | volume 39 | number 5 | may 2009 | C12 . SEM or plot of score differences versus average score showing mean and 2 standard deviation (SD) limit—Altman and Bland) 2. SEM Correlation matrices for validity analysis may not require that each individual correlation be presented with its Copyright © 2009 Journal of Orthopaedic & Sports Physical Therapy®. and conclusions OR recommendations contradicted the actual data presented. Recommendations 12 2 Authors made specific conclusions and clinical recommendations that were clearly related to the specific hypotheses stated at the beginning of the study and supported by the data presented. but suboptimal choices were made in other analyses. 1 Either CIs or appropriate benchmarks were used. 0 Benchmarks or CIs were either used inappropriately or not included. No other uses without permission. Validity a. 1 Authors made conclusions and clinical recommendations that were general. 10 2 Appropriate statistical tests were conducted: 1. associated CI.jospt. but basically supported by the study data. at least 2 of the following were presented: 1. Validity tests of significant difference—an appropriate global test such as analysis of variance was used where indicated. ICCs for quantitative data. Appropriate confidence intervals (CI) 2. absolute . Clinical relevance—minimal detectable change (MDC).