Beruflich Dokumente
Kultur Dokumente
DOI 10.1007/s00134-006-0516-8
Christian P. Subbe
Haiyan Gao
David A. Harrison
C. P. Subbe
Wrexham Maelor Hospital,
Department of Medicine,
Wrexham LL13 4TX, UK
H. Gao D. A. Harrison (u)
Tavistock House, Intensive Care National
Audit and Research Centre,
Tavistock Square, London WC1H 9HR, UK
e-mail: david.harrison@icnarc.org
Tel.: +44-20-73882856
Fax: +44-20-73883759
ORIGINAL
Reproducibility of physiological
track-and-trigger warning systems
for identifying at-risk patients on the ward
Introduction
Physiological track-and-trigger warning systems are used
to identify patients on acute wards at risk of deterioration, as early as possible. There are three main types in
use [1]:
1. Single- and multiple-parameter systems identify patients by comparing bedside observations with a simple
620
Methods
Design and data collection
A prospective observational study was conducted at
Wrexham Maelor Hospital, a district general hospital
in North Wales. The study was approved by the local
research ethics committee. Participants were adult patients
from general medical and surgical wards. A number of
wards were selected to satisfy the sample size calculation
(below) and all patients on these wards able to give
informed consent were invited to participate. Patients were
informed about the purpose of the study and received an
information leaflet. Verbal consent was obtained.
Based on assumptions for inter-rater reliability
(kappa = 0.8, proportion of positive results = 0.07) with
four raters, a sample of 93 patients was required to
estimate kappa with a standard error of 0.1. For the intrarater reliability, with an assumed value of kappa = 0.9,
the required sample size was 44 patients. Sample size
calculations were performed using a custom-designed
module [8].
Data were collected by four members of hospital staff
on 3 days. All four raters were familiar with the scoring
methods in their clinical practice and received an induction
prior to the study. Two investigators prepared the consent
and patient identification data prior to the study.
For inter-rater reliability, data were collected on two
acute medical and two acute surgical wards. A senior doctor (Certificate of Completion of Specialist Training equivalent in Intensive Care Medicine), junior doctor (Senior
House Officer level), registered nurse (E-grade; 5 years of
experience) and student nurse, who had previously worked
as a health care assistant (nursing auxiliary), collected the
data. The order of the raters taking the measurements was
randomized for each ward from a set of possible permutations. Raters were blinded to the results of their colleagues.
For the intra-rater study, the same raters examined separate
patients from one medical and one surgical ward, examining the same patients four times each in 15-min intervals,
blinded to their previous scores. There were no interventions between the four sets of measurements.
Age and normal blood pressure, derived from an
average of the previous 48 h, were collected first. Raters
then measured the remaining parameters: systolic blood
pressure; temperature; respiratory rate; pulse rate; and
level of consciousness. Blood pressure was measured
electronically (Dinamap, Critikon, Tampa, Fla.) and
checked manually where appropriate. Blood pressure was
measured by all four raters on the first 18 patients, but the
repeated measurement was found to be unacceptable to
the patients. For subsequent patients, blood pressure was
measured only once, noted on the patients bedside sheet,
and copied by subsequent raters. Temperature was taken
orally (Temp-PlusII, IVAC Corp., San Diego, Calif.),
measured only once, noted and copied by subsequent
raters. All other parameters were measured by each rater
in turn. Pulse rate was counted over 15 s in regular heart
rhythm and 1 min in irregular heart rhythm; respiratory
rate was counted over 30 s. Raters calculated urine output
per kilogram and hour from the output over the last 4 h.
Raters scored the observations according to the three
systems. The MET criteria were scored as one if any
criterion was fulfilled and otherwise as zero. The MEWS
and ASSIST were scored according to scoring charts.
Blood pressure in MEWS was scored differently from
the published scoring method, by deviation from the
patients norm (C. Stenhouse, pers. commun.). Details of
the scoring systems, including the modification to MEWS,
are contained in the Electronic Supplementary Material.
Data were entered into a spreadsheet by a data-entry
clerk not involved in data collection. Logic, range and consistency check were applied to all variables. Outliers and
missing data were checked against original data collection
sheets.
Statistical analysis
Statistical analysis was performed using intra-class correlation coefficients for continuous variables (systolic
blood pressure, heart rate, respiratory rate, temperature
and aggregate scores), and kappa statistics for categorical
variables (conscious level, trigger events and aggregate
scores). Two-way and one-way analysis of variance was
used in calculating the intra-class correlation coefficient
for inter-rater and intra-rater studies, respectively [9].
Bootstrap methods were used to provide bias-corrected
confidence intervals. For the inter-rater study, we also
calculated kappa and phi statistics [10] for each of the
six possible pairings among the raters. All analyses were
621
Results
Inter-rater reliability
In the inter-rater study, 114 patients were examined. The
four raters were not able to perform four sets of measurements on all 114 patients, as some patients were called for
clinical investigations or were otherwise unavailable. In total, 433 sets of measurements were obtained.
Nine sets of observations from three patients were excluded as their normal blood pressures were missing, leaving 424 sets of observations included in the study. One
hundred nine, 102, 107 and 106 patients were examined,
respectively, by the senior doctor, junior doctor, registered
Observations, n
MET, n (%)
MEWS, n (%)
ASSIST, n (%)
Student
nurse
Registered
nurse
Junior
doctor
Senior
doctor
Total
p-value
106
98 (92.5)
81 (76.4)
78 (73.6)
107
106 (99.1)
81 (75.7)
90 (84.1)
102
101 (99.0)
92 (90.2)
86 (84.3)
109
109 (100)
94 (86.2)
90 (82.6)
424
414 (97.6)
348 (82.1)
344 (81.1)
0.001
0.01
0.15
The p-value indicates statistical significance of difference in correctly calculated scores among raters
Table 2 Level of agreement of aggregate scores and triggers among the four raters for inter-rater study
Calculated by raters
MET trigger
MEWS score
MEWS trigger
ASSIST score
ASSIST trigger
Corrected calculations
MET trigger
MEWS score
MEWS trigger
ASSIST score
Triggered, n (%)/
score, median (interquartile range) [range]
All agreed,
n (%)
Three agreed,
n (%)
Intra-class correlation
coefficient (95% confidence interval)
11 (2.6)
1 (1, 2) [0, 8]
60 (14.2)
1 (0, 1) [0, 8]
19 (4.5)
0.03(0.05, 0.00)
0.20 (0.13, 0.27)
0.18 (0.09, 0.27)
0.46 (0.38, 0.55)
0.20 (0.04, 0.38)
86 (77.5)
17 (15.3)
62 (55.9)
41 (36.9)
84 (75.7)
106 (95.5)
53 (47.8)
94 (84.7)
80 (72.1)
104 (93.7)
7 (1.7)
1 (1, 2) [0, 8]
69 (16.3)
1 (0, 2) [0, 8]
0.02(0.04, 0.05)
0.22 (0.15, 0.30)
0.37 (0.25, 0.51)
0.50 (0.42, 0.58)
90 (81.1)
18 (16.2)
64 (57.7)
43 (38.7)
106 (95.5)
55 (49.6)
101 (91.0)
83 (74.8)
622
Intra-rater reliability
There were 180 sets of observations from 45 patients in the
intra-rater study. All observations were used in the analyses. In total, 170 (94.4%) were missing in urine output,
which was excluded. Other parameters were 100% complete. There were copying errors for temperature in 0.6%
of observations and for blood pressure in 1.1%.
There was 100% agreement on conscious level, with
all patients scored as Alert. Intra-rater agreements on
respiratory rate, heart rate and systolic blood pressure
were similar to those of the inter-rater study. Agreement
on temperature and intra-class correlation coefficient 0.98
(0.941.00) was better in the intra-rater study than the
inter-rater study.
The proportions of scores calculated correctly were
similar to those from the inter-rater study (Table 3). In
MET, patients were 100% correctly scored by all raters.
In MEWS, 17 (9.4%) patients were scored higher and
14 (7.8%) lower than correct scores, and in ASSIST 11
(6.1%) patients were scored higher and 22 (12.2%) lower
than correct scores.
The agreement indices (Table 4) suggest intra-rater
agreement on score was similar for MEWS and ASSIST.
There was good agreement on triggers for MEWS and
ASSIST, although the confidence intervals for ASSIST
were very wide due to the low number of events. Only
1 patient triggered the MET calling criteria on a single
observation.
Student
nurse
Registered
nurse
Junior
doctor
Senior
doctor
Total
p-value
48
48 (100)
40 (83.3)
33 (68.8)
24
24 (100)
24 (100)
24 (100)
84
84 (100)
66 (78.6)
72 (85.7)
24
24 (100)
19 (79.2)
18 (75.0)
180
180 (100)
149 (82.8)
147 (81.7)
1
0.05
0.003
The p-value indicates statistical significance of difference in correctly calculated scores among raters
Table 4 Level of agreement of total scores and triggers among the four raters for intra-rater study
Calculated by raters
MET trigger
MEWS score
MEWS trigger
ASSIST score
ASSIST trigger
Corrected calculations
MET trigger
MEWS score
MEWS trigger
ASSIST score
ASSIST trigger
Triggered, n (%)/
score, median (interquartile range) [range]
All agreed,
n (%)
Three agreed,
n (%)
Intra-class correlation
coefficient (95% confidence interval)
1 (0.6)
1 (1, 2) [0, 6]
26 (14.4)
1 (1, 1) [0, 5]
6 (3.3)
0.01(0.02, 0.01)
0.53 (0.39, 0.68)
0.64 (0.46, 0.84)
0.59 (0.46, 0.74)
0.66 (0.02, 1.00)
44 (97.8)
24 (53.3)
37 (82.2)
27 (60.0)
43 (95.6)
45 (100)
37 (82.2)
45 (100)
40 (88.9)
45 (100)
1 (0.6)
1 (1, 2) [0, 5]
23 (12.8)
1 (1, 1) [0, 5]
8 (4.4)
0.01(0.02, 0.01)
0.56 (0.42, 0.68)
0.58 (0.31, 0.81)
0.54 (0.42, 0.68)
0.48 (0.03, 1.00)
44 (97.8)
23 (51.1)
37 (82.2)
25 (55.6)
41 (91.1)
45 (100)
37 (82.2)
44 (97.8)
35 (77.8)
45 (100)
623
Discussion
Scoring systems such as the ones used in this study have
become an important tool of clinical risk management for
critically ill patients on general wards. Thus far, it is not
known whether these assessments are reproducible and
how large the likely errors are if different members of staff
perform what is meant to be an identical assessment. In the
present study we have provided some data on how three
systems used in the U.K. perform. There was only fair to
moderate agreement on measurements of the parameters
used to generate the scores, and only fair agreement on
the scores. Reassuringly, there was better percentage
agreement on the decision whether a patient had triggered
or not.
As one would expect, reproducibility was partially
a function of simplicity: MET achieved higher percentage
agreement than ASSIST, and ASSIST higher than MEWS.
Intra-rater reliability was better then inter-rater reliability.
Using corrected calculations improved the level of interrater agreement but not intra-rater agreement, suggesting
that if scoring systems were misapplied, each rater was
doing so in a consistent manner.
The systems were selected because they represent three
levels of complexity. MET is very simple but does not
allow a patients progress to be tracked. MEWS is a complete assessment that takes into account urine output and
relative changes in blood pressure as compared with previous measurements. ASSIST is a simplified version with
only four parameters and an age constant. Both ASSIST
and MEWS allow monitoring of clinical progress. The
chosen systems are representative of the wide range of
scoring systems currently in use, but any system should be
assessed in the setting where it is used.
There were a number of potential weaknesses in this
study. Firstly, repeated measurements were taken within
an hour, but it is possible that patients could have deteriorated or improved during this time. We did not assess
whether there was systematic drift of figures between measurements.
A small number of patients were not able or willing to
give consent. In particular, patients with reduced neurological function (approximately 5% of all patients) could not
be included, and were likely to be generally sicker patients.
Inclusion might have led to different results with regard
to reliability of the trigger mechanism; however, abnormal
neurological scores have been found to be rare in previous
studies [3, 12].
It was our aim to assess the reliability of the scoring process in clinical practice. The reliability depends
partially on the reliability of the electronic measurement devices used for blood pressure and temperature.
This could not be assessed directly as repeated measurement was unacceptable to the patients. Our results
therefore represent the human element of reliability
only.
Conclusion
There was significant variation in the reproducibility of
physiological track-and-trigger warning systems used by
different health care professionals. All three systems examined showed better agreement on triggers than aggregate scores. Simpler systems had better reliability. Further
research should examine how reliability can be improved.
Acknowledgements. This study was funded by the UK National
Health Service Research and Development Service Delivery and
Organisation Programme (SDO/74/2004). The authors thank
S. Ameeth, S. Collins, K. Ghosh, C. Rincon and J. Tobler for
their help in preparing the study, obtaining consent from patients
and collecting the data. We thank A. Pawley for entering data into
electronic format and L. Gemmell for advising on the format and
facilitating the setup of the study.
624
References
1. Department of Health and NHS Modernisation Agency (2003) The National
Outreach Report. Department of Health,
London
2. Lee A, Bishop G, Hilman K, Daffurn K
(1995) The medical emergency team.
Anaesth Intensive Care 23:183186
3. Subbe CP, Kruger M, Rutherford P,
Gemmell L (2001) Patients at risk:
validation of a modified Early Warning Score in medical admissions.
Q J Med 94:521526
4. Buist MD, Moore GE, Bernard SA,
Waxman BP, Anderson JN, Nguyen TV
(2002) Effects of a medical emergency
team on reduction of incidence of and
mortality from unexpected cardiac
arrests in hospital: preliminary study.
Br Med J 324:387390
5. Pittard AJ (2003) Out of our reach?
Assessing the impact of introducing a critical care outreach service.
Anaesthesia 58:882885
6. Stenhouse C, Coates S, Tivey M,
Allsop P, Parker T (2000) Prospective
evaluation of a Modified Early Warning
Score to aid earlier detection of patients
developing critical illness on a general
surgical ward. Br J Anaesth 84:663P
7. Subbe CP, Hibbs R, Williams E,
Rutherford P, Gemmel L (2002) ASSIST: a screening tool for critically
ill patients on general medical wards.
Intensive Care Med 28:S21