Sie sind auf Seite 1von 6

Original Paper

Cerebrovasc Dis 2010;29:188193 DOI: 10.1159/000267278


Received: February 17, 2009 Accepted: September 3, 2009 Published online: December 18, 2009

The Modified Rankin Scale in Acute Stroke Has Good Inter-Rater-Reliability but Questionable Validity
Henry Zhao a, b Janice M. Collier b Dorcas M. Quah b Tara Purvis a, c Julie Bernhardt a, b, d
School of Medicine, University of Melbourne, b National Stroke Research Institute, c Department of Physiotherapy, Austin Health, and d La Trobe University, Melbourne, Vic., Australia
a

Key Words Modified Rankin Scale Acute stroke Validity

Abstract Background and Purpose: The modified Rankin Scale (mRS), designed as a measure of disability in the community, has increasingly been administered in the acute stroke setting but has been poorly studied within the hospital environment. We prospectively studied the interrater reliability of the mRS in acutely hospitalised stroke patients and examined the effect of prior experience with the scale and use of a decision tool on the interrater agreement of trained raters. Methods: Patients !4 days after stroke were recruited. Individuals from 3 trained rater groups (experienced, inexperienced and inexperienced with decision tool) independently scored each patient within 6 h of each other. Agreement was measured with the intraclass correlation (ICC) and the weighted kappa statistic ( w), with systematic bias evaluated using the bias index. Results: Twelve raters scored 56 patients with overall agreement of ICC = 0.675. Agreement of w = 0.686 was found between experienced and inexperienced raters but a modest systematic bias was present. Experience in rating patients appeared to play some role in

affecting agreement but the decision tool did not improve the performance of inexperienced raters. Conclusions: Trained raters were found to have good interrater agreement overall when the mRS was scored in acute stroke patients but obvious problems with the interpretation and relevancy of the scale in this setting raise concerns about validity. The use of the mRS to rate disability in the acute hospital environment should be questioned.
Copyright 2009 S. Karger AG, Basel

Introduction

In stroke clinical trials, one of the most commonly used endpoint outcome measures is the modified Rankin Scale (mRS), first documented by van Swieten et al. [1] in 1988. The mRS is an ordinal 6-point (7 if death is included) scale ranging from no residual symptoms (score of 0) to severely disabled (score of 5), or dead (score of 6). The scale items incorporate aspects of impairment, activity and participation, so that the mRS could be considered a global disability index [2], similar to the New York Heart Association Functional Classifications [3] in heart failure.
Associate Prof. Julie Bernhardt National Stroke Research Institute, Level 1 Neurosciences Building 300 Waterdale Road Heidelberg Heights, Vic. 3081 (Australia) Tel. +61 3 9496 2783, Fax +61 3 9496 2650, E-Mail j.bernhardt @ unimelb.edu.au

2009 S. Karger AG, Basel 10159770/10/02920188$26.00/0 Fax +41 61 306 12 34 E-Mail karger@karger.ch www.karger.com Accessible online at: www.karger.com/ced

Although the mRS is widely used as a measure of longterm functional outcome in the community setting after stroke, it is becoming increasingly common to see the mRS used within the published literature as a measure of acute disability while patients are in the hospital environment [46]. The mRS has also been used in the acute setting to determine eligibility for carotid endarterectomy, with an mRS !3 needed before patients can be considered for surgery, according to the NASCET (North American Symptomatic Carotid Endarterectomy Trial) criteria [7]. There is, however, very limited research into the use of the mRS in the acute setting. The only study, reported in the original paper by van Swieten et al. [1] in 1988, described excellent agreement ( w = 0.89) between 24 randomly paired raters assessing 86 inpatients within 1 week of stroke onset. There are, however, methodological uncertainties associated with this study, including whether the raters independently assessed each patient and the nature of the training regime employed. We were unable to find any validity studies in the acute setting. Given the increasing use of the mRS in acute care despite its development as a long-term measure of disability, we sought to investigate whether the excellent interrater agreement found by van Swieten et al. [1] was reproducible within a busy Australian acute stroke unit. We also aimed to investigate whether rater experience or the use of a decision tool which sets out key discriminators between mRS scores, taught as part of a standardised mRS DVD training and certification procedure [8], influenced agreement. The tool is shown in Appendix 1. Experience has been found to influence the reliability of clinical assessment [9], and although training through a video medium should improve rater accuracy, the actual interview and scoring process is complex, so it cannot be assumed that newly trained raters can perform the same as those who are highly experienced.

those who had completed 1300 mRS interviews in the preceding 2 years. Inexperienced raters had no prior experience using the mRS. Patients 18 years of age with first or recurrent, ischaemic or haemorrhagic stroke (according to the WHO definition [10]) within 4 days of stroke onset were eligible for inclusion. Patients were excluded if they had a prestroke mRS score of 12 (indicating significant comorbidity-obscuring symptoms due to stroke). Procedure All raters completed a training and certification using the DVD package [8] before study commencement. In addition, those in the Inexp+DT group also received a brief training with the decision tool. Each patient was interviewed and scored by a rater from each of the 3 groups. All 3 interviews were completed independently, in a computer-generated random order and within 6 h of each other, as stipulated within a written protocol. The participants were interviewed directly except where communication or cognitive difficulties (e.g. confusion or aphasia) were present, whereupon an appropriate attending proxy was interviewed. All inexperienced raters were replaced after scoring a maximum of 20 patients to limit the effect of inexperienced raters becoming experienced. The raters were asked to write down any difficulties they experienced during the rating process. Prespecified Data Demographic data were collected from the participants at the time of consent. The stroke severity was measured using the NIHSS (National Institutes of Health Stroke Scale) [11], and the stroke subtype classified using the Oxfordshire Community Stroke Project Classifications [12]. Data Management and Analysis Data collection forms were Teleforms [13], which allowed data recorded on paper to be saved as a digital image, checked visually and transferred automatically to an electronic database. Overall agreement between the 3 rater groups was calculated using the intraclass correlation (ICC) [14], while paired agreements between groups utilised the weighted kappa statistic ( w) [15] with quadratic weighting. Both the ICC and w are equivalent indices of agreement that account for chance and penalise greater disagreement between categories. In previous studies of the mRS, systematic bias was found to be an issue [16]. Therefore, we also used the bias index as a test of systematic bias for paired rating on a ordinal scale, recommended as it takes into account overall agreement [17]. All calculations were carried out in Stata version 10 [18].

Methods
The prospective study of interrater reliability was carried out with 3 rater groups: (1) experienced (Exp); (2) inexperienced unaided (Inexp), and (3) inexperienced with decision tool (Inexp+DT). The study was conducted within an urban acute stroke unit in Melbourne, Australia, and approved by the institutional ethics committee. Participants Eligible raters were non-medical health professionals (physiotherapists, occupational therapists and trials nurses) or medical students at the hospital. Experienced raters were considered to be

Results

Raters The composition of the rater groups was as follows: (1) Exp: 2 non-medical stroke researchers; (2) Inexp+DT: 6 fourth-year medical students, and (3) Inexp: 4 neuroscience non-medical healthcare professionals. All the inexCerebrovasc Dis 2010;29:188193

Reliability of the mRS in Acute Stroke

189

perienced raters completed between 5 and 20 ratings (median: 12) before being replaced. Patients Of the 211 patients screened over 6 months, 56 acute stroke patients were recruited. The main reasons for exclusion were time from stroke onset 1 4 days, high prestroke morbidity and inability to consent. Four patients refused to participate in the study. The participant demographics (table 1) were typical of stroke in this region [19] except for a higher proportion of partial anterior circulation infarcts [20]. All patients were diagnosed with acute stroke at the time of the interview; however, the diagnoses were later revised for 5 patients to transient ischaemic attack (n = 2), conversion disorder (n = 2) and Bells palsy (n = 1). These patients were not excluded as they still exhibited characteristics of

acute stroke at the time of the interview. Twenty-nine patients (52%) were able to be interviewed directly by raters. The remainder required a proxy to answer on their behalf, which included nursing staff (39%), physiotherapists (4%) and occupational therapists (5%) involved directly in the care of the patient. In the majority of cases, the same proxy was interviewed by all 3 rater groups. The interviews were carried out within 23 days of stroke onset for most patients (median: 52 h; interquartile range: 3874 h; range: 4110 h). The median time from first to last interview for individual patients was 104 min (interquartile range: 41170 min; range: 10360 min). Interrater Agreement The overall agreement for all 3 rater groups was ICC = 0.675 (0.5590.791), while agreement between the Exp and Inexp groups was w = 0.686 (0.5410.819) (table 2). In comparison to the Exp group, the Inexp+DT group performed worse than the Inexp group, but the difference was not statistically significant. A modest bias was also present between the Exp and Inexp rater groups (bias index: 0.18; p = 0.032) where inexperienced raters consistently rated patients as being less disabled. Scores 4 and 5 (moderate to severe disability) accounted for 150% of all the scores assigned, while together, scores 0, 1 and 2 (no to slight disability) only accounted for around 25%. The bubble plot shown in figure 1 illustrates the extent of disagreement between the Exp and Inexp raters. The plot shows that, although most disagreements between raters were of one level only, the severity and spread of score discrepancies varied with each mRS score. At mRS scores of 4 and 5 (more severe disability), the disagreements were relatively fewer and less severe in comparison to scores of 03 (milder disability). A strikingly large variation in disagreement is also seen when the experienced raters assigned a score of 3, where inexperienced raters could instead score anywhere from 0 to 5. A similar distribution was found between the Exp and Inexp+DT groups (data not shown).

Table 1. Participant demographics (n = 56)

n Mean age 8 SD, years Age range, years Male gender Australian ethnicity Previous stroke Stroke severity Mild Moderate Severe Stroke subtype TACI PACI POCI LACI Haemorrhage 71813.6 3493 31 38 20 39 7 10 5 28 7 8 8

% 55.4 67.9 35.7 69.6 12.5 17.9 8.9 50.0 12.5 14.3 14.3

TACI = Total anterior circulation infarct; PACI = partial anterior circulation infarct; POCI = posterior circulation infarct; LACI = lacunar infarct.

Table 2. Interrater agreement

Exact agreement All groups Exp vs. Inexp Exp vs. Inexp+DT Inexp vs. Inexp+DT 30% 50% 52% 55%

Agreement index ICC = 0.675 w = 0.686 w = 0.568 w = 0.736

95% CI 0.5590.791 0.5410.819 0.3590.724 0.5570.859

Systematic bias, p 0.032 0.553 0.121

190

Cerebrovasc Dis 2010;29:188193

Zhao /Collier /Quah /Purvis /Bernhardt

Rater Feedback Feedback from raters indicated that scoring the mRS in the acute setting could be accomplished quickly and with a minimal burden. However, many raters reported difficulties in the interpretation of the scale within the hospital environment. One problem was that the description of some scale items makes reference to the patients ability to carry out community activities. This led to a discrepancy whereby some raters chose to score the patients on their predicted capability in their usual place of residence, while others felt that scoring the actual capability within the confines of the hospital environment was more appropriate. The data provide no clear indication of whether experience altered the approach taken. Raters also questioned whether the wording of the scale items was suited to the inpatient environment. For example, one experienced rater questioned whether the regular neurological observation for ward patients in the early phase of care constituted the constant nursing care and attention that determines whether a patient is scored at 4 or 5. Another rater reported difficulty scoring an independently mobile patient forced to rest in bed after thrombolytic therapy. Walking ability would normally help distinguish between scores 3 and 4.

6 Grade assigned by Inexp group 5 4 3 2 1 0 1 1 0 1 2 3 4 Grade assigned by Exp group 5 6

Fig. 1. Bubble plot of agreement between Exp and Inexp groups.

Width, not area, of bubble represents count at each point.

Discussion

Although the mRS is already a well-established instrument, its use in a significantly different setting cannot be justified without careful study. In an acute stroke unit, we found that trained raters scoring the mRS achieved good interrater agreement overall (under the Landis and Koch [21] classification) but the result was well short of the excellent agreement found by van Swieten et al. [1]. While it is difficult to pinpoint a precise cause for this discrepancy, we speculate it may well be related to the differences in rater training and the earlier time of assessment in our study. The participant sample used in this study may also have influenced our findings. Nonetheless, the lower agreement, combined with cases of marked discrepancy between raters, casts doubts on the use of the mRS acutely. The role of experience appeared to affect reliability. All raters completed a standardised training, yet a systematic bias was found when comparing experienced and inexperienced groups, indicating that newly trained raters consistently differed in their scoring patterns. If experience does contribute to an improved scoring of the mRS, what therefore remains unknown is how many inReliability of the mRS in Acute Stroke

terviews lead to enough experience. We readily acknowledge that classifying those with 1300 interviews as experienced was arbitrary. On the other hand, the decision tool appeared to have no effect and may even have worsened the performance of inexperienced raters. This contrasts with several previous studies of the mRS which have shown significant improvements in rater agreement when a similar but more comprehensive written guide has been used [16, 22, 23]. Although the differing professions of the two inexperienced rater groups may have played some role, we speculate that the ineffectiveness of the decision tool relates instead to its simplicity and the fact that the training DVD (on which it was based) was intended for the community setting. Inexperienced raters using the tool therefore still faced uncertainties in the acute stroke setting. Raters reported significant difficulties with scale interpretation in the inpatient setting, raising concerns regarding content validity [24]. The hospital environment intrinsically precludes an assessment of participation, with some scale items difficult, if not impossible, to score outside a community setting. If a patient is to be rated on actual functioning in the inpatient ward, a mRS score of !3 (which assumes the ability to carry out usual duties and activities) theoretically cannot be given, effectively rendering the mRS a 3-item scale. The alternative approach of choosing to predict the potential ability in the community setting, on the other hand, is also undesirable. In either case, the validity of the scale comes into question.
Cerebrovasc Dis 2010;29:188193

191

Color version available online

We found a high prevalence of mRS scores of 4 and 5 (moderately severe to severe disability) in our sample. This may reflect a true disability or, alternatively, problems associated with rating the mRS in the acute setting, as highlighted above. Acute mRS scores are likely to be skewed towards more severe disability. Subsequently, if the mRS were used to measure recovery (e.g. from acute stroke admission to discharge or follow-up), change scores would likely to be exaggerated. Although the mRS is generally considered a simple scale, this study has demonstrated that the process of rating it in a significantly different environment from its traditional use is by no means simple. For the reliability of the mRS in acute stroke to achieve an acceptable level, we believe the underlying issues of relevancy and subjective interpretation need to be addressed. There is the scope to modify the mRS to make it more suited for use in an acute setting, such as the removal of community

references from the scale items and their replacement with descriptions that better reflect the hospital environment. In addition, raters need to be given uniform instructions as to whether to assess the predicted or actual capability of patients to avoid confusion. Such measures would in turn create a modified acute scale, which would need assessment of clinimetric properties before any adoption is possible. However, given the widespread popularity of the mRS and the absence of an alternative instrument for the acute setting, we believe further exploration of this issue is warranted.
Acknowledgements
We would like to thank all raters, patients and families for kindly volunteering their time for this study. We would also like to thank Prof. John Ludbrook, Ms. Dora Pearce and Associate Prof. Leonid Churilov for their valuable assistance with statistics.

Appendix 1: Decision Tool for Completing the mRS


Color version available online

Decision Tool for Completing the Modified Rankin Scale (mRS)


Description of patient Current subject 0 No symptoms/ limitations 1 No significant disability 2 Slight disability 3 Moderate disability 4 Moderate severe disability 5 Severe disability

Symptom-free

Able to do all pre-stroke activities (e.g. driving, reading, working) Independent with ADLs (e.g. shopping, cooking, cleaning, managing $) Independent with walking/ mobility +/- aid Can be left alone
for a week or

Requires Assistance

Requires Assistance

Requires Assistance

Requires Assistance

Requires Assistance

more
without concern

for a few hours a day

No, not even for a few hours

KEY Key discriminators Yes this description applies to the patient No this description does not apply to the patient NB: Assistance may be verbal or physical

192

Cerebrovasc Dis 2010;29:188193

Zhao /Collier /Quah /Purvis /Bernhardt

References
1 van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van Gijn J: Interobserver agreement for the assessment of handicap in stroke patients. Stroke 1988; 19:604607. 2 de Haan R, Limburg M, Bossuyt P, van der Meulen J, Aaronson N: The clinical meaning of Rankin handicap grades after stroke. Stroke 1995; 26:20272030. 3 Criteria Committee of the New York Heart Association: Nomenclature and Criteria for Diagnosis of Diseases of the Heart and Great Vessels. Boston, Little Brown, 1994, pp 253 256. 4 Wang DZ, Rose JA, Honings DS, Garwacki DJ, Milbrandt JC: Treating acute stroke patients with intravenous tPA: the OSF stroke network experience. Stroke 2000; 31:7781. 5 Weimar C, Ziegler A, Knig IR, Diener HC: Predicting functional outcome and survival after acute ischemic stroke. J Neurol 2002; 249:888895. 6 Johnston SC, Wilson CB, Halbach VV, Higashida RT, Dowd CF, McDermott MW, Applebury CB, Farley TL, Gress DR: Endovascular and surgical treatment of unruptured cerebral aneurysms: comparison of risks. Ann Neurol 2000;48:1119. 7 Ferguson GG, Eliasziw M, Barr HW, Clagett GP, Barnes RW, Wallace MC, Taylor DW, Haynes RB, Finan JW, Hachinski VC, Barnett HJ: The North American Symptomatic Carotid Endarterectomy Trial: surgical results in 1,415 patients. Stroke 1999; 30: 1751 1758. 8 Quinn TJ, Lees KR, Hardemark HG, Dawson J, Walters MR: Initial experience of a digital training resource for modified Rankin Scale assessment in clinical trials. Stroke 2007; 38: 22572261. 9 Hand PJ, Haisma JA, Kwan J, Lindley RI, Lamont B, Dennis MS, Wardlaw JM: Interobserver agreement for the bedside clinical assessment of suspected stroke. Stroke 2006;37:776780. 10 Stroke 1989. Recommendations on stroke prevention, diagnosis, and therapy. Report of the WHO Task Force on Stroke and other Cerebrovascular Disorders. Stroke 1989; 20: 14071431. 11 Brott T, Adams HP Jr, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg V, et al: Measurements of acute cerebral infarction: a clinical examination scale. Stroke 1989; 20:864870. 12 Bamford J, Sandercock P, Dennis M, Burn J, Warlow C: Classification and natural history of clinically identifiable subtypes of cerebral infarction. Lancet 1991;337:15211526. 13 TELEform Elite Version 9. Sunnyvale, Verity, 2005. 14 Shrout PE, Fleiss JL: Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420428. 15 Cohen J: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213. 16 Wilson JT, Hareendran A, Grant M, Baird T, Schulz UG, Muir KW, Bone I: Improving the assessment of outcomes in stroke: use of a structured interview to assign grades on the modified Rankin Scale. Stroke 2002; 33: 22432246. 17 Ludbrook J: Detecting systematic bias between two raters. Clin Exp Pharmacol Physiol 2004;31:113115. 18 Stata Statistical Software Release 10.0. College Station, StataCorp, 2007. 19 Thrift AG, Dewey HM, Macdonell RA, McNeil JJ, Donnan GA: Stroke incidence on the east coast of Australia: the North East Melbourne Stroke Incidence Study (NEMESIS). Stroke 2000; 31:20872092. 20 Dewey HM, Sturm J, Donnan GA, Macdonell RA, McNeil JJ, Thrift AG: Incidence and outcome of subtypes of ischaemic stroke: initial results from the North East Melbourne Stroke Incidence Study (NEMESIS). Cerebrovasc Dis 2003;15:133139. 21 Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 1977;33:159174. 22 Wilson JT, Hareendran A, Hendry A, Potter J, Bone I, Muir KW: Reliability of the modified Rankin Scale across multiple raters: benefits of a structured interview. Stroke 2005;36:777781. 23 Shinohara Y, Minematsu K, Amano T, Ohashi Y: Modified Rankin Scale with expanded guidance scheme and interview questionnaire: interrater agreement and reproducibility of assessment. Cerebrovasc Dis 2006;21:271278. 24 Bland M, Altman G: Statistics notes: validating scales and indexes. BMJ 2002; 324: 606 607.

Reliability of the mRS in Acute Stroke

Cerebrovasc Dis 2010;29:188193

193

Das könnte Ihnen auch gefallen