Sie sind auf Seite 1von 6

HEALTH ECONOMICS

Health Econ. 16: 531536 (2007)


Published online 26 September 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/hec.1175

HEALTH ECONOMICS LETTERS


COMPARISON OF ALTERNATIVE METHODS OF COLLECTION
OF SERVICE USE DATA FOR THE ECONOMIC EVALUATION
OF HEALTH CARE INTERVENTIONS
SARAH BYFORDa,*, MORVEN LEESEa, MARTIN KNAPPa,b, HELEN SEIVEWRIGHTc,
SUSAN CAMERONd, VANESSA JONESc, KATE DAVIDSONe and PETER TYRERc
a

Kings College London, Institute of Psychiatry, UK


b
London School of Economics, PSSRU, UK
c
Imperial College of Science, Technology and Medicine, London, UK
d
NHS Greater Glasgow, Primary Care Division, UK
e
University of Glasgow, UK

SUMMARY
Economic evaluation of health care interventions usually requires the collection of service use data to estimate the
total cost of participants in an evaluation. There are a number of methods available to measure the quantity of
services used but little is known about the relative accuracy of alternative methods. In a multicentre randomised
controlled trial of interventions for the treatment of adults with recurrent episodes of deliberate self-harm (the
POPMACT trial), health service data were collected by patient self-report after six and twelve months and also
from GP records by independent investigators. Agreement for overall costs was relatively high. However, this hides
substantial variation in agreement between the two sources of information for dierent services. The results suggest
that GP records provide more accurate data on the use of general practice-based contacts than patient report, but
less reliable information on contacts with other health services. Thus reliance on GP records for data on hospital
services and other community health services based outside of general practice surgeries is not recommended.
Future research should explore the level of agreement between patient report and other providing sector records,
such as hospital records. Copyright # 2006 John Wiley & Sons, Ltd.
Received 25 April 2005; Accepted 14 July 2006
KEY WORDS:

economic evaluation; service utilisation; data collection; costs

INTRODUCTION
Health economic evaluation usually requires the collection of service use data in order to estimate the
total cost of caring for participants in an evaluation. A number of methods are available to measure the
quantity of services used, including service use questionnaires, service use diaries and record searches
(Bowling, 2002; Johnston et al., 1999; Mauskopf et al., 1996) but little is known about the relative
accuracy of these alternatives.
Service use questionnaires are a commonly used method of measuring service components in clinical
trials, particularly where a broad cost perspective is needed. The main disadvantage is the need to rely
on the memory of interviewees. Few such questionnaires have been systematically tested for accuracy,

*Correspondence to: Centre for the Economics of Mental Health, Box P024, Institute of Psychiatry, De Crespigny Park, London
SE5 8AF, UK. E-mail: s.byford@iop.kcl.ac.uk

Copyright # 2006 John Wiley & Sons, Ltd.

532

S. BYFORD ET AL.

mainly because of the lack of an appropriate gold standard to act as a valid yardstick (Mirandola
et al., 1999). Service use diaries require participants to record their service use prospectively over the
study period. Their prospective nature makes diaries more accurate than patient recall but the more
complex and broad the range of services used, the harder it becomes to keep diaries simple. In addition,
research suggests that respondents dislike the burden of keeping diaries (Jarbrink et al., 2003) and that
people with lower educational attainment under-report in diaries (Bruijnzeels et al., 1998). Case notes or
electronic databases may also be more accurate than patient recall but record searches can be time
consuming, may not record the information needed and may be hampered by poor completion, missing
les and illegible entries. Data will tend to be limited to services provided by the agency to which the
records belong so a multi-sector picture requires exploration of the records of many dierent agencies.
This paper reports the results of a comparison of service use data collected using a patient report
questionnaire and data collected from general practitioner (GP) records within a randomised controlled
trial (RCT) of interventions for the treatment of adults with recurrent episodes of deliberate self-harm
(DSH).

METHODS
Study design
The Prevention of Parasuicide by Manual Assisted Cognitive-behaviour Therapy (POPMACT) trial
was a multi-centre RCT of manual assisted cognitive-behaviour therapy (MACT) for the treatment of
adults with recurrent DSH compared to treatment as usual (TAU). The trial was carried out in ve
centres in England and Scotland}Glasgow, Edinburgh, Nottingham, West London, and South
London. Assessments were carried out by research assistants blind to treatment allocation at baseline,
six and twelve months after trial entry. More detailed information on the methods of this trial has been
published elsewhere (Tyrer et al., 2003a,b; Byford et al., 2003).
Economic data
A broad economic perspective was taken, including that of all service-providing sectors, productivity
losses, accommodation costs and living expenses. Economic data were collected at each follow-up
assessment to cover the period from baseline to nal follow-up. Information was collected in interview
using a modied version of the Client Service Receipt Inventory (CSRI) (Beecham and Knapp, 2000), a
measure commonly used for the collection of service use information in economic evaluations of
interventions for people with mental health problems (CSRI available from www.hsr.iop.kcl.ac.uk/
cemh).
Since interviews were being undertaken in the study for the assessment of outcomes, this was selected
as the primary method of service use data collection. Additional methods were also considered to assess
the accuracy of data collected. Given the additional burden service use diaries place on patients, the
large range of services potentially accessed by mental health populations and concerns regarding the
likelihood of comprehensive completion by a population with chronic mental health problems, we
rejected the use of diaries. With regard to records, we felt that it would be infeasible, given available
resources, to collect service use data from the case notes of a wide range of providers. Instead, data on
health services were collected from GP records. Although not providing such a broad cost perspective as
the CSRI, this source does at least allow a partial assessment of the accuracy of patient report in
comparison to case notes to be made. Data available from both sources included inpatient days,
outpatient and accident and emergency (A&E) attendance, and contacts with GPs, practice nurses,
community psychologists, community psychiatric nurses and occupational therapists.
Copyright # 2006 John Wiley & Sons, Ltd.

Health Econ. 16: 531536 (2007)


DOI: 10.1002/hec

COMPARISON OF ALTERNATIVE METHODS OF COLLECTION OF SERVICE USE DATA

533

National unit costs were applied to all service use data included (Netten and Curtis, 2000). Unit costs
were calculated for the nancial year 1999/2000, in common with the original POPMACT trial.
Statistical analysis
The level of agreement between the two data collection methods was estimated using the Lin
concordance correlation coecient (Lin, 1989), which measures the agreement between two continuous
variables obtained by two methods or persons. The Lin coecient combines measures of precision and
accuracy to determine whether the observed data signicantly deviate from perfect concordance
(Steichen, 1998). The value of the Lin coecient lies between 1 (perfect agreement) and 1 (perfect
inverse agreement). It can be applied to data from non-normal distributions, such as cost data, and is
more appropriate than a Pearson correlation since, like the intraclass correlation coecient, it takes into
account systematic bias. Generally, the intraclass correlation coecient (the standard method for
assessing agreement) and Lins concordance correlation coecient have been shown to give very similar
values (Schuck, 2004). Also reported are mean dierences and 95% condence intervals, indicating
systematic bias, and the 95% limits of agreement, indicating random variation between individual
measurements (Bland and Altman, 1986). The level of agreement was estimated for each service
category and, to test for overall agreement, for total costs.

RESULTS
480 patients were recruited to the POPMACT trial between May 1998 and April 2000. Service use data
collected using the CSRI in the original trial were available for 397 patients (83%). Data collected from
GP records were available for 272 of the patients with full CSRI data thus 56% of the original sample
were included in the current study. A comparison of the baseline characteristics of the original sample
with those patients with economic data from both sources, revealed a signicant centre dierence
(p50.001). This was due to one of the ve centres (Edinburgh) being unable to participate fully in the
collection of GP records data. Those with missing economic data were also signicantly more likely to
be white (p=0.004), although this result was due to the exclusion of Edinburgh, since Edinburgh had a
very low proportion of non-white participants (2%) in comparison to the London centres in particular
(South London 22%, West London 35%). No signicant dierences were found for any other sociodemographic characteristics or baseline measures of total cost or outcome, including parasuicide risk,
depression, anxiety, social functioning and utility.
Table I details the mean number of contacts reported by the two methods of recording service use
data and the overall level of agreement (Lins coecient), denoted rc. Agreement between the two data
sources varied greatly by service. Agreement was relatively high for GP contacts and A&E attendances
but relatively poor for all other service types, with agreement levels below the threshold 0.40 level
commonly taken to indicate acceptable clinical or practical signicance (Cicchetti, 2001). Agreement for
inpatient days was relatively low when separated into psychiatric versus other specialities, but increased
to more reliable levels when summed. Agreement for overall total costs was also found to be reliable.
The 95% limits of agreement, exploring the amount of random variation between the two data
sources, suggest that large individual dierences are likely to be encountered for all categories of service.
The mean dierence and 95% condence intervals for the dierence, reecting systematic relative bias,
demonstrate that the reported number of contacts with GPs and practice nurses is higher from GP
records than from the CSRI, on average. All other comparisons show contact data being lower from GP
records than from the CSRI, on average. The net eect is that mean total cost per patient is lower when
calculated using GP records data than the CSRI. If the higher of two alternative methods of
measurements can be assumed to be the more accurate, this would indicate negative systematic bias for
Copyright # 2006 John Wiley & Sons, Ltd.

Health Econ. 16: 531536 (2007)


DOI: 10.1002/hec

534

S. BYFORD ET AL.

Table I. Level of agreement between health service use data collected from GP records and patient self-report
GP records
mean number
of contacts

CSRI mean
number of
contacts

rc

10.66
0.73
1.68
3.41
1.71
5.12
1.77
1.26
3.02
1.20
1.00
0.21
1371

8.78
0.03
1.84
4.28
3.31
7.59
4.01
1.31
5.29
1.51
2.25
0.36
2185

0.631
0.056
0.760
0.268
0.273
0.658
0.169
0.273
0.231
0.063
0.349
-0.001
0.672

GP contacts
Practice nurse contacts
A&E attendances
Psychiatric inpatient days
Other inpatient days
Total inpatient days
Psychiatric outpatient appointments
Other outpatient appointments
Total outpatient appointments
Community psychologist contacts
Community psychiatric nurse contacts
Occupational therapist contacts
Mean total cost per patient ()

95% limits of
agreement
13.71
2.56
6.58
42.89
30.48
39.07
18.20
9.42
20.53
21.44
14.40
9.09
7088

to
to
to
to
to
to
to
to
to
to
to
to
to

16.94
3.98
6.27
41.15
27.29
34.13
13.72
9.31
15.99
20.82
11.90
8.80
5461

Mean
dierence
1.88
0.70
0.16
0.87
1.60
2.47
2.24
0.06
2.27
0.31
1.25
0.15
813

95% CI for the


dierence
0.96
0.50
0.55
3.44
3.36
4.70
3.22
0.63
3.39
1.60
2.06
0.69
1204

to
to
to
to
to
to
to
to
to
to
to
to
to

2.81
0.90
0.24
1.70
0.17
0.23
1.27
0.52
1.15
0.98
0.44
0.40
423

the GP and nurse contacts as measured by the CSRI, and negative systematic bias for all other contacts
as measured by GP records.
Despite the overall relative bias, the mean dierence in cost between the two treatment groups did not
dier substantially between calculations based on GP records (MACT 1198 vs TAU 1541; mean
dierence 344, p=0.342) and those based on the CSRI (MACT 1974 vs TAU 2391; mean
dierence 417, p=0.494).
DISCUSSION
The results of this study suggest signicant variability in agreement between service use data collected
from GP records and those collected from patient report. Overall agreement was relatively high for
contacts with GPs, as supported by previous research (Patel et al., 2005), and for A&E attendances but
low for all other health services. When converted to costs and summed, overall agreement was relatively
high. Rather than providing evidence of the reliability of patient report, the existence of changes in
absolute, but not relative, mean costs simply suggests that variability between the two measures had a
similar impact on each arm of the trial. Thus this nding will not necessarily be replicated in other
studies.
The 95% limits of agreement suggest wide variation in the range of possible values for any one
individual. This is of particular concern for individual level analysis, such as econometric modelling, but
is less of a problem in evaluations concerned with averages, such as conventionally conducted RCTs,
since this random variation is likely to be averaged out.
Systematic bias was also evident. The positive bias seen for GP and practice nurse contacts supports
the hypothesis that GP records are a more accurate record of general practice-based contacts than
patient report. There is little reason to believe that GP records systematically overestimate these
contacts. It is more likely that patients systematically underestimate them by, for example, failure to
recall contacts or confusion over terminology. Indeed, the large dierences seen for practice nurse
contacts may be explained by a lack of understanding of the term practice nurse, highlighting the need
to clearly dene all services and professionals in service use questionnaires.
For all other health services, negative systematic bias was evident from the GP records, with patients
consistently reporting higher numbers of contacts than GP records. There is little reason to assume that
the problem is systematic overestimation by patients. Since GPs in the UK health system are reliant on
other service providers to inform them of patient contacts, systematic underestimation in the records is
more likely. Whilst GPs will be aware of any referrals they make, it is less clear whether they would be
Copyright # 2006 John Wiley & Sons, Ltd.

Health Econ. 16: 531536 (2007)


DOI: 10.1002/hec

COMPARISON OF ALTERNATIVE METHODS OF COLLECTION OF SERVICE USE DATA

535

kept informed of every contact, particularly where contact was ad hoc or frequent. Thus for services
provided in hospital or by community mental health teams, GP records may be less accurate than
patient self-report. Evidence of more reliable estimates for total inpatient days than for days separated
by speciality, suggests additional problems with the classication of hospital contacts in GP records,
perhaps as a result of inaccuracies in reports from hospitals. Alternatively, this could be due to patient
uncertainty regarding speciality.
Systematic bias in RCTs is less of a concern if it is evident to a similar degree in all groups under
investigation, as seems to be the case in the POPMACT trial. However, for the evaluation of
interventions anticipated to have dierential impacts on the use of services, particularly high cost
services, there is more danger of producing misleading conclusions.
Exploration of baseline characteristics support the generalisability of these results to the broader
population of patients with recurrent DSH. Generalisability to other patient populations is less certain.
It could be hypothesised that patient report may be more reliable in non-mental health populations but
perhaps less reliable in more severely ill mental health populations, such as those with psychoses or
personality disorders. Further research is needed to explore the implications for other populations.
GP records appear to provide more accurate information on contacts with GPs and practice nurses
than patient report, but less reliable information on contacts with other health services. Thus, reliance
on GP records for non-practice based health service data is not recommended. Future research should
explore agreement between patient report and other providing sector records, such as hospitals, in order
to determine whether patients systematically underestimate these contacts in the same way they appear
to systematically underestimate general practice-based contacts.

ACKNOWLEDGEMENTS

We thank Freya Tyrer for data editing, Professor Domenic Cicchetti for statistical advice and the
members of the POPMACT group. The POPMACT study was funded by the UK Medical Research
Council (G9702283).

REFERENCES

Beecham J, Knapp M. 2000. Costing psychiatric interventions. In Measuring Mental Health Needs (2nd edn),
Thornicroft G (ed.). Gaskell: London, 200224.
Bland JM, Altman DG. 1986. Statistical methods for assessing agreement between two methods of clinical
measurement. Lancet 1: 307310.
Bowling A. 2002. Research Methods in Health (2nd edn). Open University Press: Oxford.
Bruijnzeels MA, Foets M, van der Wouden JC, Prins A, van der Heuvel. 1998. Measuring morbidity of children in
the community: a comparison of interview and diary data. International Journal of Epidemiology 27: 96100.
Byford S, Knapp M, Greenshields J. 2003. Cost-eectiveness of brief cognitive behaviour therapy versus treatment
as usual in recurrent deliberate self-harm: a rational decision making approach. Psychological Medicine 33:
977986.
Cicchetti DV. 2001. The precision of reliability, validity estimates revisited: distinguishing between clinical and
statistical signicance of sample size requirements. Journal of Clinical and Experimental and Neuropsychol 23:
695700.
Jarbrink K, Fombonne E, Knapp M. 2003. Measuring the parental, service and cost impacts of children with
autistic spectrum disorder: a pilot study. Journal of Autism and Developmental Disorders 33: 395402.
Johnston K, Buxton MJ, Jones DR, Fitzpatrick R. 1999. Assessing the costs of healthcare technologies in clinical
trials. Health Technology Assessment 3(6): 176.
Lin LI-K. 1989. A concordance correlation coecient to evaluate reproducibility. Biometrics 45: 255268.
Mauskopf J, Schulman K, Bell L, Glick H. 1996. A strategy for collecting pharmacoeconomic data during phase
II/III clinical trials. Pharmacoeconomics 9(3): 264277.
Copyright # 2006 John Wiley & Sons, Ltd.

Health Econ. 16: 531536 (2007)


DOI: 10.1002/hec

536

S. BYFORD ET AL.

Mirandola M, Biso G, Bonizzato P, Amaddeo F. 1999. Collecting psychiatric resources utilisation data to
calculate costs of care: a comparison between a service receipt interview and a case register. Social Psychiatry and
Psychiatric Epidemiology 34: 541547.
Netten A, Curtis L. 2000. Unit Costs of Health and Social Care. Personal Social Services Research Unit, University
of Kent: Canterbury.
Patel A, Rendu A, Moran P, Leese M, Mann A, Knapp M. 2005. A comparison of two methods of collecting
economic data in primary care. Family Practice 22(3): 323327.
Schuck P. 2004. Assessing reproducibility for interval data in health-related quality of life questionnaires: which
coecient should be used? Quality of Life Research 13: 571586.
Steichen TJ, Cox NJ. 1998. Concordance correlation coecient. Stata Technical Bulletin Stb43(sg84): 3539.
Tyrer P, Jones V, Thompson S. 2003a. Service variation in baseline variables and prediction of risk in a randomised
controlled trial of psychological treatment in repeated parasuicide: the POPMACT study. International Journal
of Social Psychiatry 49(1): 5869.
Tyrer P, Thompson S, Schmidt U. 2003b. Randomized controlled trial of brief cognitive behaviour therapy versus
treatment as usual in recurrent deliberate self-harm: the POPMACT study. Psychological Medicine 33(6):
969976.

Copyright # 2006 John Wiley & Sons, Ltd.

Health Econ. 16: 531536 (2007)


DOI: 10.1002/hec

Das könnte Ihnen auch gefallen