Beruflich Dokumente
Kultur Dokumente
ABSTRACT
This study determined reliability and concurrent validity of measurements of the single leg squat made by novice
examiners. Twelve video recordings of individuals performing a single leg squat were evaluated by six student
physical therapists. Students assessed movement quality on an ordinal scale and manually measured frontal
plane knee movement quantity on a video monitor. Inter- and intrarater reliability of ordinal scale ratings were
determined via quadratically weighted kappa. Inter- and intrarater reliability of frontal plane knee measures
were determined through intraclass correlation coefficient models 2,k and 3,k (k = 3 ratings), respectively.
Concurrent validity of frontal plane knee measures was examined by comparison with Vicon-Peak motion-
tracking system measures via Bland-Altman scatterplots. Ordinal scale measures displayed intrarater reliability
ranging from 0.38 to 0.94 and interrater reliability of 0.68 (0.46–0.87). Intrarater reliability of frontal plane knee
For personal use only.
measures ranged from 0.88 to 0.98 and interrater reliability of 0.99 (0.97–1.00). Difference scores between
student and computer-generated measures of frontal plane knee movement were significantly different as
determined through Bland-Altman scatterplots and calculation of the upper and lower limits of agreement.
586
Physiotherapy Theory and Practice 587
report that the relatively low reliability might have been measures. Concurrent validity of student-generated
due to a lack of heterogeneity in the subjects being as- FPPA measures was determined by comparison to cri-
sessed and the use of a nonintuitive rating system that terion standard values generated by a computerized
consisted of a set of arbitrary symbols. Chmielewski motion-tracking system.
et al (2007) concluded that this method of assessment
can be used by clinicians to produce ratings of
movement with reliability greater than would be Subjects performing single leg squat
found by chance, but less than what is necessary for
clinical use.
Twenty-two individuals were recruited from a sample
These newer methods for assessing movement hold of convenience at a local university through personal
promise as useful clinical tools for measuring patient
communication. A solicitation script was used to
motion while performing a functional task. However,
recruit these individuals. The inclusion criterion was
a review of literature indicates that nothing is known
Physiother Theory Pract Downloaded from informahealthcare.com by Michigan University on 11/05/14
cutoff). Elgon data were synced with reflective to maintain an upright trunk posture while the video
marker data automatically through the Vicon-Peak images and elgon data were recorded.
system. This was done to ensure that the criterion
standard FPPA measure was always performed at the
lowest point of knee flexion during the SLS perform-
ance. This timing of the FPPA measure also correlated Protocol for creating single leg squat
with when clinician subjects were instructed to video bank
perform measurement of the FPPA on the video
images. The FPPA was calculated as the angle From the original pool of 22 subject videos, video files
between the midline of the leg and midline of the of 12 individuals performing the SLS test were se-
thigh in the frontal plane. The convention used to lected. These 12 videos were categorized by the inves-
interpret FPPA measures was as follows: a positive tigators on the basis of objective quantitative measures
FPPA indicated 2D medial position of the knee and (digital computerized methods, described previously)
a negative FPPA indicated a 2D lateral position of of their SLS performance derived through computer-
the knee. ized measure of reflective marker motion described
A digital video camera (model DCR-HC40; Sony previously. Approximately one third of the individuals
Electronics, Inc., Oradell, NJ, USA) was positioned displayed the least amount of 2D-medial knee move-
approximately 4.5 meters in front of the subject and ment (FPPA < 5°), one third displayed an average
set to record (30 Hz) a frontal plane view of the amount of 2D-medial knee movement (FPPA ≥ 5° to
SLS. The image was adjusted so that the subject was <10°), and one third displayed the most 2D-medial
visible in at least two thirds of the viewing area when knee movement (FPPA ≥ 10°). The FPPA values
the person was in a neutral standing position. The used to define the categories were based on previous
digital video camera and the infrared motion-tracking research findings that reported normal and abnormal
camera were placed by using a floor reference grid to FPPA values displayed during the SLS maneuver by
ensure that they were both recording in an identical healthy adults (DiMattia et al, 2005; Kralj, Jaeger,
distance and angle perpendicular to the frontal plane and Munih, 1990; McKinley and Horowitz, 1992;
of subjects performing the SLS maneuver. Video Nguyen and Shultz, 2007; Salem, Salinas, and
files captured by the digital video camera were later Harding, 2003; Willson and Davis, 2007; Zeller,
used by novice clinician raters to perform movement McCrory, Kibler, and Uhl, 2003; Zeller et al, 2005).
measures (described below) via the ordinal scale and The neutral category is considered representative of
FPPA methods. the average knee frontal plane movement in the
normal healthy adult population. The 2D-medial knee images that clinician raters analyzed. All video files
movement category individuals have a higher degree were observed on a standard 19-inch flat panel LCD
of 2D knee medial movement relative to the average computer display (UltraSharp-1907FPt, Dell Compu-
healthy adult. The 2D-lateral knee movement cat- ter Inc.). Quick Time Player (7.4 Pro; Apple, Inc.)
egory individuals display greater 2D lateral knee software was used to display video data.
movement relative to the average healthy adult. The
Ordinal scale assessment
aim of creating these categories was to develop a
Immediately after watching each individual video,
bank of patient simulation video files that displayed
the rater was asked to measure the SLS performance
varying levels of movement quality and quantity.
for the subject by using the modified specific quality as-
Former research has cited the lack of heterogeneity
sessment tool (Table 1). The term “modified” is used
of movement displayed as a possible cause of limited
here to describe the scale because the original ordinal
visual assessment reliability (Chmielewski et al,
scale used by Chmielewski et al (2007) did not ask
2007; DiMattia et al, 2005).
Physiother Theory Pract Downloaded from informahealthcare.com by Michigan University on 11/05/14
functional assessment training. Raters were recruited in direct correlation with the descriptions of increasing
during regular lecture and lab instruction. abnormal movement in the ordinal scale definitions.
Novice clinician raters were not limited to the
number of times they were allowed to view each
Evaluation of SLS performance by novice
video. Each rater viewed each individual video an
clinician raters average of three times.
Each novice clinician rater was asked to observe each Frontal plane projection angle assessment
video and assess movement by using the ordinal Following the ordinal scale measurement of the
scale (Table 1) and to measure each subject’s FPPA SLS performance, novice clinician raters were in-
by using a manual goniometer. Video data collected structed to view the video again and to pause the
through use of the Sony DV Camera provided the video at the time when the subject reached the
lowest position while performing the single leg squat.
While the video was paused, the clinician raters then
TABLE 1 Modified qualitative scoring method for evaluating
measured FPPA three times directly from the video
single leg squat
image using a 15-centimeter manual goniometer (E-
Z Read Manual Goniometer, Jamar Inc.). The
Point value Definition
average of these three ratings was used for all reliability
0 No deviation from neutral alignment calculations. The convention used to describe move-
5 A small-magnitude or barely observable ment with the FPPA was as follows: negative value
movement out of a neutral position and/or indicates a medial knee movement and a positive
moderate frequency of segment oscillation value indicates a lateral knee movement (Willson
10 A moderate-magnitude or marked movement
and Davis, 2007; Willson, Dougherty, Ireland, and
out of a neutral position and/or moderate
frequency of segment oscillation Davis, 2005).
15 Excessive or severe magnitude of movement The novice clinician raters were asked to return 4
out of a neutral position and/or high weeks following the initial measurement session to
frequency of segment oscillation repeat all ordinal scale and FPPA measures. These
data were subsequently entered manually into a
Each anatomical segment (trunk, pelvis, and hips/thigh) being
assessed is given a value that correlates with the observed level of spreadsheet by one of the principal investigators.
movement quality. Simulated patient’s score for quality of Both the principal investigator entering the data and
movement is then derived by summing the scores (Chmielewski the novice clinician raters were blinded to measures
et al, 2007). made at the first data collection session.
Intrarater reliability of FPPA measures ranged from Ordinal scale assessment of movement
0.88 to 0.98 (Table 3). Interrater reliability of frontal
plane projection angle measures with two-sided 95% Results of this study are mixed for the hypothesis that
confidence interval was 0.99 (0.97–1.00). clinicians with minimal clinical experience can display
FIGURE 2 Bland-Altman scatterplot for clinician subject FIGURE 5 Bland-Altman scatterplot for clinician subject
1. Upper and lower limits of agreement (ULOA & LLOA) are 4. Upper and lower limits of agreement (ULOA & LLOA) are
represented by the upper and lower dashed lines, respectively. represented by the upper and lower dashed lines, respectively.
For personal use only.
FIGURE 3 Bland-Altman scatterplot for clinician subject FIGURE 6 Bland-Altman scatterplot for clinician subject
2. Upper and lower limits of agreement (ULOA & LLOA) are 5. Upper and lower limits of agreement (ULOA & LLOA) are
represented by the upper and lower dashed lines, respectively. represented by the upper and lower dashed lines, respectively.
FIGURE 4 Bland-Altman scatterplot for clinician subject FIGURE 7 Bland-Altman scatterplot for clinician subject
3. Upper and lower limits of agreement . (ULOA & LLOA) 6. Upper and lower limits of agreement (ULOA & LLOA) are
are represented by the upper and lower dashed lines, represented by the upper and lower dashed lines, respectively.
respectively.
One possible reason for this mixed result in rater 3) Chmielewski et al (2007) discuss that a limitation of
reliability is lack of a more thorough education of their study was the use of an arbitrary, nonintuitive,
raters on consistent performance of measurement symbol system to rank quality of movement; the
using the ordinal scale. In this study the instruction current study used a more intuitive numerical
provided to raters prior to the initial rating session ranking system with equidistant measures that may
was only a brief familiarization to the ordinal scale, have contributed to a more accurate representation
which described the scales purpose and instruction of novice clinician perception of movement.
to apply one definition from the scale to each body
segment. At this instruction period, raters were not
given a standard interpretation of the ordinal scale; it Quantitative measures
was left to each individual rater to interpret the defi-
nition of each level of the ordinal scale leaving room Results suggest that a novice clinician who desires to
for variation in the application of the ordinal scale in- track an individual patient’s FPPA can do so with a
Physiother Theory Pract Downloaded from informahealthcare.com by Michigan University on 11/05/14
strument when rating subjects. Intrarater reliability degree of reliability that exceeds the minimal clinical
may improve, and exceed an acceptable level, if all standard. All novice clinician quantitative measures
novice clinician raters are instructed in what each exhibited intrarater reliability that was above the level
ordinal scale definition looks like through a prelimi- considered adequate for clinical use (≥0.75) (Portney
nary clinical education that includes examples of and Watkins, 2009). However, examination of
varying levels of stability during a single leg squat. measures of FPPA does not support the hypothesis
Education of this type might allow an opportunity that measures made by novice physical therapists are
for students to develop a static memory of the standard similar to computer-generated criterion standard
method of application of the ordinal scale. Building FPPA measures.
such a memory of consistent interpretation and appli- Because clinician measures of FPPA are reliable,
cation of this assessment method might also result in a both within raters and between raters, there is in-
For personal use only.
more reliable measurement from one rating session to creased confidence that these measures can be used
another. in the clinical setting. Previous research by Willson
Interrater reliability of ordinal scale measures failed et al (2008b) using FPPA measures generated by
to reach a level considered adequate for clinical use. hand indicates that the FPPA is a valuable measure
This is in agreement with previous findings of Chmie- of human motion that may be used in the diagnosis
lewski et al (2007). As described above for intrarater and treatment of individuals with lower extremity
reliability, a more thorough and uniform education pathology. Similar findings by Levinger and Gilleard
in the assessment method may improve levels of inter- (2007) are also in agreement.
rater reliability. However, results may also indicate that These results also suggest caution to researchers
this ordinal scale does not lend itself to agreement who may try to apply a computerized method of
between raters due to the scales inherent subjectivity measuring the FPPA with the intention to interpret
and a more refined scale or uniform education may them in the same manner as Willson et al (2007;
not change this. 2008a; 2008b) have in previous publications. It is
Intra- and interrater reliability of ordinal scale not readily apparent why computer-generated
measures in this study was higher than the single pre- measures of the FPPA were different from those gen-
vious study that used a similar specific ordinal scale erated by humans. The hypothesis that they would be
(Chmielewski et al, 2007). Possible explanations for concurrently valid was based on the fact that both the
this include the following: 1) In the study conducted computer and the human raters were “instructed” to
by Chmielewski et al (2007), the raters first scoring measure the FPPA based on the placement of the re-
session was completed by observing subjects in flective markers. It is possible that human error in
person while they performed the functional task, and placing and reading the goniometer or recording the
the second round of observations was completed by FPPA values resulted in significantly different
the raters observing a video of the prior performance. measures than were produced by the computer.
In our study, mode of visualizing the patient perform- However, the high levels of inter-and intrarater
ance was controlled (video only). 2) Chmielewski’s reliability in human measures indicate that human
group did not control for duration between the first error was not likely a substantial source of human
and second round of assessment (10 ± 1.5 weeks), and computer disagreement.
thus introducing the possibility of greater variability Therefore, it is more likely that a difference in the
between raters or a learning or maturation effect; the manner in which the computer and the human
current study controlled precisely for duration measured FPPA existed, causing this difference in
between the first and second round of rater measures. measures. One possible explanation includes the
small aspect difference created by the different place- this artificially improved the ICC values because
ment of the infrared camera and the digital video the within- and between-rater variability might have
camera. This difference in camera placement may been made artificially small relative to the sample
have created enough of a difference in the images pre- variability. It is recommended that future study not
sented to the computer and the human to undermine purposefully create heterogeneity through sample
concurrent validity between the two measurement selection, but rather allow natural variability occur in
methods. The effect could be similar to two humans the pool of subjects being assessed.
standing in two different positions when measuring Results indicate that FPPA measures of movement
the projection angle of any object. A small difference generated by novice clinicians display intra- and inter-
in perspective between the infrared-computer camera rater reliability necessary for clinical application.
and the standard video camera could cause a signifi- However, concurrent validity analysis indicates that
cant difference in measures made from the two differ- what novice clinicians are measuring during FPPA
ent videos generated. analysis may not be the same as the FPPA measures
Physiother Theory Pract Downloaded from informahealthcare.com by Michigan University on 11/05/14
The current study indicates that the reliability of the generated by 2D computerized motion analysis
methods of movement measurement described shows systems. This is encouraging because it indicates that
levels of reliability near or above that considered ade- even novice clinicians can reliably generate measures
quate for clinical use. However, results also indicate of 2D knee movement. It also highlights a piece of
that quantitative measures of the FPPA derived information not previously known; it is possible that
through inexpensive manual goniometric methods what clinicians measure when asked to assess knee
are not concurrently valid compared to criterion FPPA during a single leg squat may be different
standard measures derived through 2D computerized from measures generated by a high-speed 2D motion
motion-tracking equipment. analysis system. Therefore, until the source of this dis-
agreement is determined and ameliorated, researchers
should proceed with caution when interpreting
For personal use only.
overlooked coordination deficits in the frontal plane Yu B 2006 Understanding and preventing noncontact anterior
will be more readily treated through neuromuscular cruciate ligament injuries: A review of the Hunt Valley II
meeting, January 2005. American Journal of Sports Medicine
coordination and therapeutic exercise. 34: 1512–1532
Future investigation should also analyze if there is a Hanneman SK 2008 Design, analysis, and interpretation of
significant difference between novice raters and ad- method-comparison studies. AACN Advanced Critical Care
vanced professionals to determine if level of experi- 19: 223–234
ence affects validity of measures made. If this is so, Hewett TE, Myer GD, Ford KR, Heidt RS, Colosimo AJ, McLean
SG, van den Bogert AJ, Paterno MV, Succop P 2005 Biomecha-
it is possible that experience and training play a role nical measures of neuromuscular control and valgus loading of
in validity of clinician measures indicating a need for the knee predict anterior cruciate ligament injury risk in female
a period of training or experience in clinical practice athletes: A prospective study. American Journal of Sports Medi-
before measures of FPPA can be considered valid. cine 33: 492–501
In conclusion, the reliability of novice clinician Kralj A, Jaeger RJ, Munih M 1990 Analysis of standing up and
sitting down in humans: Definitions and normative data presen-
ordinal scale measures of human motion failed to
Physiother Theory Pract Downloaded from informahealthcare.com by Michigan University on 11/05/14