Sie sind auf Seite 1von 2

[ evidence in practice ]

STEVEN J. KAMPER, PhD1

Reliability and Validity:


Linking Evidence to Practice
J Orthop Sports Phys Ther 2019;49(4):286-287. doi:10.2519/jospt.2019.0702

T
he previous Evidence in Practice article introduced the idea we might ask a patient to answer the
of the “construct,” or what you are interested in measuring, 24 questions of the Roland-Morris Dis-
for example, pain, disability, or strength. As there are often ability Questionnaire (the measure) and
score his or her level of back pain–related
numerous measures available for any given construct, how
disability (construct) by adding up the
do you choose which to use? Of the number of considerations that number of “yes” responses. The patient’s
go into this, none are more important than reliability and validity. score out of 24 is valid to the extent that
Downloaded from www.jospt.org by Springfield College on 04/01/19. For personal use only.

It is no overstatement to say that if a able measures do not provide useful in- the questions really reflect the construct
measure is not both sufficiently reliable formation; a measure that is not reliable of disability and to the extent that having
and valid, then it is not fit for purpose. cannot be valid. difficulty with more of the items reflects
There are several different types of greater disability.
Reliability reliability, each uniquely relevant to situ- There are several different types of
Formally, reliability is the extent to ations in which the measures might be validity relevant to clinical measures,
which a measurement is free from er- used. Intrarater reliability refers to the the most commonly assessed being con-
ror. In practice, a reliable measure is one situation where the same rater takes the struct validity. When researchers assess
that gives you the same answer when measure on one patient on several occa- the construct validity of a measure, they
you measure the same construct several sions, and reliability is the extent to which are ideally able to compare their mea-
times. Consider the example of measur- the scores from the successive measure- sure to a “gold standard.” For example,
J Orthop Sports Phys Ther 2019.49:286-287.

ing height (the construct) with a tape ments are the same. Interrater reliability arthroscopic visualization of the anterior
measure (the measure). You might mea- is relevant when multiple raters use the cruciate ligament is considered a gold
sure a person’s height in millimeters 3 same measure on a single person, and re- standard of anterior cruciate ligament
times with the tape measure; the extent liability is the extent to which scores from rupture, so a study might compare results
to which the number of millimeters is the the different raters are the same. from Lachman’s test to the findings from
same on each occasion is the reliability of arthroscopy to assess the validity of Lach-
the measure. Validity man’s test.
The implications of unreliable mea- Validity is the extent to which the score Unfortunately, there are no gold stan-
sures are serious. If an unreliable diag- on a measure truly reflects the construct dards for many constructs in which we
nostic test (measure) was applied to a it is supposed to measure. This is rela- are interested (eg, latent constructs such
patient several times, then the same pa- tively straightforward when it comes to as disability and pain). In these cases,
tient might be diagnosed as both having things like height or strength, but waters construct validity is tested against a “ref-
and not having the condition on differ- quickly become murky when we consider erence standard,” which is a sort of im-
ent occasions or by different people. If an unobservable or “latent” constructs such perfect gold standard. When there is no
unreliable measure of symptom severity as pain, quality of life, or disability. For gold standard, the best way to test valid-
was collected from a patient before and these sorts of constructs, we collect in- ity is via hypothesis testing. This involves
after an intervention, then it would be direct measures, such as self-reported setting out a series of hypotheses before
impossible to tell whether that symptom experiences and behaviors, or recall of collecting the data. These hypotheses are
improved, stayed the same, or got worse. beliefs and emotions, and assume that theorized relationships between a score
Essentially, data collected from unreli- these reflect the construct. For example, on the measure and other characteris-

School of Public Health, University of Sydney, Camperdown, Australia; Centre for Pain, Health and Lifestyle, Australia. t Copyright ©2019 Journal of Orthopaedic & Sports Physical
1

Therapy®

286 | april 2019 | volume 49 | number 4 | journal of orthopaedic & sports physical therapy


tics, for example, that the score will be coefficients, correlations, limits of agree- no perfectly reliable and perfectly valid
strongly correlated with scores on anoth- ment, R2). measure. Even in the case of measuring
er measure of the same construct and less height with a tape measure, successive
strongly correlated with scores on a dif- Conclusion measures are likely to differ by a few mil-
ferent but related construct. The extent There are a couple of important general limeters here and there. Finally, there
to which the data are consistent with the points to note about reliability and valid- are practical concerns when it comes to
predetermined hypotheses will be evi- ity. First, both are on a spectrum, so mea- choosing a measure, including how long
dence supporting validity of the measure. sures are not “unreliable” or “reliable,” but it takes to administer, whether the patient
more or less reliable and more or less valid. can comprehend text or instructions, and
Statistics Of course, this makes choosing measures how data will be stored and used.
Testing reliability and validity generally more difficult; we need to make a subjec- Measurement is an entire field of re-
involves assessing agreement between 2 tive judgment as to whether a measure is search by itself. Although the general
scores, either scores on the same measure “reliable enough” and “valid enough” in a concepts are quite straightforward, you
collected twice (reliability) or scores on particular situation. General guidelines do not have to scratch too far below the
different measures (validity). The statis- exist to help interpret reliability and va- surface before things become complicat-
tics used to describe agreement depend lidity statistics, but these guidelines do not ed. When reading research, you should
on whether the measures are dichoto- and should not replace clinical judgment. look for information that reassures you
mous (eg, kappa, sensitivity/specificity) The second point is that no measure sits that the measures used are sufficiently re-
or continuous (eg, intraclass correlation on the very end of the spectrum; there is liable and valid. The take-home message:
Downloaded from www.jospt.org by Springfield College on 04/01/19. For personal use only.

be very cautious about using, or trying to


“Essentially, data collected from unreliable interpret, information from a measure if
you have no information about its reli-
measures do not provide useful information.” ability and validity. t
J Orthop Sports Phys Ther 2019.49:286-287.

BROWSE Collections of Articles on JOSPT’s Website


JOSPTs website (www.jospt.org) offers readers the opportunity to browse
published articles by Previous Issues with accompanying volume and issue
numbers, date of publication, and page range; the table of contents of the
Upcoming Issue; a list of available accepted Ahead of Print articles; and
a listing of Categories and their associated article collections by type
of article (Research Report, Case Report, etc).

Features further curates 3 primary JOSPT article collections:


Musculoskeletal Imaging, Clinical Practice Guidelines, and Perspectives
for Patients, and provides a directory of Special Reports published
by JOSPT.

journal of orthopaedic & sports physical therapy | volume 49 | number 4 | april 2019 | 287

Das könnte Ihnen auch gefallen