Sie sind auf Seite 1von 14

The Scientific World Journal

Volume 2012, Article ID 606154, 13 pages


doi:10.1100/2012/606154
The cientificWorldJOURNAL

Research Article
Benchmarking Strategies for Measuring
the Quality of Healthcare: Problems and Prospects

Pietro Giorgio Lovaglio


CRISP and Department of Quantitative Methods, University of Bicocca-Milan, V. Sarca 202, 20146 Milan, Italy

Correspondence should be addressed to Pietro Giorgio Lovaglio, piergiorgio.lovaglio@unimib.it

Received 12 October 2011; Accepted 29 November 2011

Academic Editors: V. Brusic, W. D. Evans, M. Fanucchi, and A. S. Levin

Copyright 2012 Pietro Giorgio Lovaglio. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.

Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare
and implementing benchmarking strategies. Besides oering accreditation and certification processes, recent approaches measure
the performance of healthcare institutions in order to evaluate their eectiveness, defined as the capacity to provide treatment that
modifies and improves the patients state of health. This paper, dealing with hospital eectiveness, focuses on research methods
for eectiveness analyses within a strategy comparing dierent healthcare institutions. The paper, after having introduced readers
to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the
methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for
controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific
challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing
noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are
discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining
regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed.

1. Introduction hospitals) is becoming increasingly important for the im-


provement of healthcare quality.
Over the last few years, increasing attention has been directed However, the debate over which types of performance in-
toward the problems inherent to measuring the quality of dicator are the most useful for monitoring healthcare quality
healthcare. Accreditation and certification procedures have remains a question of international concern [5].
acted as stimulating mechanisms for the discovery of skills In a classic formulation, Donabedian [6] asserted that
and technology specifically designed to improve perfor- quality of care includes (i) structure (characteristics of the re-
mance. Total Quality Management (TQM) and Continuous sources in the healthcare system, including organization and
Quality Improvement (CQI) are the most widespread and system of care, accessibility of services, licensure, physical
recent approaches to implementing and improving health- attributes, safety and policies procedures, viewed as the
care quality control [1]. capacity to provide high quality care), (ii) process (measures
Besides oering accreditation and certification processes, related to evaluating the process of care, including the man-
recent approaches measure the performance of health struc- agement of disease, the existence of preventive care such as
tures in order to evaluate National Health Systems. For screening for disease, accuracy of diagnosis, the appropri-
example, various international Agencies [24] measure the ateness of therapy, complications, and interpersonal aspects
performance of health structures in dierent countries, of care, such as service, timeliness, and coordination of care
considering three main dimensions: eectiveness, eciency, across settings and professional disciplines), and (iii) clinical
and customer satisfaction. outcomes.
In this perspective, performance measurement for A clinical outcome is defined as the technical result of
healthcare providers, structures, or organizations (from here, a diagnostic procedure or specific treatment episode [7],
2 The Scientific World Journal

result, often long term, on the state of patient well-being, Performance indicators can be used for various objec-
generated by the delivery of a health service [8]. tives: to gain information for policy making or strategy
Specifically, ongoing attention has been placed on the development at a regional or national level, to improve
importance of combining structural aspects (such as gover- the quality of care of a hospital, monitor performance of
nance and the healthcare workforce) with measures of out- healthcare, identify poor performers to protect public safety
comes to assess the quality of care [6, 9]. This consideration as well as to provide information to consumers to facilitate
was taken into account by the Institute of Medicine, which, the choice of hospital.
in 1990, stated that quality of care is the degree to which In general, the broader the perspective required, the
health services for individuals and populations increase the greater the relevance of outcome measures, as they reflect the
likelihood of desired health outcomes and are consistent with interplay of a wide variety of factors, some directly related
current professional knowledge [10]. healthcare, others not. Because outcome measures are an
This definition has been widely accepted and has proven indicator of health, they are valid as performance indicators
to be a robust and useful reference in the formulation of prac- in as much as the quality of health services has an impact on
tical approaches to quality assessment and improvement, health. As the perspective narrows, to hospitals, to specialties,
emphasizing that the process of care increases the probability or indeed to individual doctors, outcome measures become
of desirable outcomes for patient, reducing the probability of relatively less indicative and process measures relatively more
undesired outcomes. useful.
This paper deals with hospital eectiveness, defined as Process measures have two important advantages over
the capacity of hospitals to provide treatment that modifies outcome measures. In fact, if dierences in outcome are
and improves the patients state of health. Of particular observed, before one can conclude that the dierence reflects
importance in this perspective is the concept of relative true variations in the quality of care, alternative explanations
eectiveness that is, the eectiveness of each specific need to be considered. In contrast, a process measure lends
hospital in modifying the patients state of health within a itself to a straightforward interpretation (e.g., the more
strategy comparing dierent healthcare institutions, in short, people without contra-indications who receive a specific
eectiveness evaluation in a benchmarking framework [6]. treatment, the better). Second, the necessary remedial action
Benchmarking in healthcare is defined as the continual is clearer (use the treatment more often), whereas for an
and collaborative discipline of measuring and comparing outcome measure (e.g., higher mortality rate) it is not
the results of key work processes with those of the best immediately obvious what action needs to be taken.
performers in evaluating organizational performance [11]. Despite these limitations, outcome measures have a
Two types of benchmarking can be used to evaluate role in the monitoring of the quality of healthcare that is
patient safety and quality performance. Internal benchmark- important per se. To know that death rates from a specific
ing is used to identify best practices within an organization, diagnosis vary across hospitals is an essential finding, even if
to compare best practices within the organization, and to the reasons for the dierences cannot be explained through
compare current practice over time. Competitive or external the quality of care. Further, outcome measurement will
benchmarking involves using comparative data between reflect all aspects of the processes of care, although only a
organizations to judge performance and identify improve- subset is measurable or measured (e.g., technical expertise
ments that have proven to be successful in other organiza- and medical skill). Such aspects are likely to be important
tions. determinants of outcome in some situations and describe
Our aim is to discuss the statistical aspects and possible not only that a correct procedure is performed, but also the
strategies for the development of hospital benchmarking results for the patients.
systems. Another possible reason why outcome indicators are
The paper is structured as follows: the next section in- often used in some countries is that available data refer to
troduces readers to the principle debates on benchmarking routine information systems (administrative archives) which
strategies, which depend on the perspective and type of in- regularly record clinical aspects and other dimension useful
dicators used. Section 3 presents statistical methods, while for case mix adjustment.
Section 4 explores the methodological problems related In the Italian context, at patient level, the Hospital
to performing consistent benchmarking analyses. Section 5 Discharge Card (HDC) is the only available administrative
describes an application based on patient satisfaction that archive in the health sector. The HDC, introduced in
demonstrates the feasibility of the illustrated benchmarking Lombardy in 1975 with the introduction of reimbursement
strategies. Section 6 oers conclusions. system of the Diagnostic Related Group (DRG), collects
clinical information about patient discharge.
In this perspective, the debate on the use of clinical
2. Perspective and Type of Indicators administrative data to furnish useful information on quality
assessment remains open.
The conceptual definition and assessment of eectiveness Many authors have criticized the use of clinical outcomes
rests on a conceptual and operational definition of quality in the evaluation of quality of the care and, particu-
of care, which is an exceptionally dicult notion to define. larly, mortality rates [12, 13]. According to Vincent and
An important contextual issue is the purpose for which a colleagues [14], administrative data does not provide a
performance indicator is to be used and by whom. suitably transparent perspective on quality or improvement.
The Scientific World Journal 3

Others suggest that limited clinical content may compromise chance), then hospital quality of care (relative eectiveness)
its utility for this purpose, posing serious caveats against becomes a possible explanation.
drawing definitive conclusions [15, 16].
Despite such concerns, major consensus exists on the 3. Statistical Methods
use of clinical outcomes from administrative data as a
useful screening tool for identifying quality problems and As described above, if one cannot explain the variation in
targeting areas in which quality should be investigated in terms of dierences in type of patient, in how data is col-
greater depth [4, 16, 17]. Excluding mortality, various clinical lected, or in terms of chance, then quality of care becomes
outcomes which could indicate malpractice, are widely a possible explanation. Following the perspective that vari-
accepted by private or public Agencies [13, 18, 19] which ations in outcome are due to a dierence in quality of care
evaluate national health sectors, for example, unscheduled only as diagnosis through exclusion, institutional agencies
surgical returns to the operating room within 48 hours, gather larger data sets from administrative archives and apply
discharges against medical advice, death in low mortality risk-adjustment in order to validate quality indicators and to
DRGs, or failure to rescue (indicating deaths among patients
benchmark hospitals.
developing specified complications during hospitalization).
Administrative archives are less prone to the problem
related to how the data is collected, and reduce the possibility
2.1. Outcome Variability. In order to consider the method- that dierences in outcome may be due to chance (although
ological problems that may limit benchmark strategies, it is this risk increases when analyzing rare outcomes). Usually,
necessary to explore the possible causes of variation in an the sizes of such databases cover the entire population of
outcome. Four major categories of explanation need to be hospitalizations, enhancing their statistical power to detect
considered. The first of these is whether observed dierences important dierences in outcomes.
might be due to dierences in the type of patient cared for in Therefore, the last exclusion criterion invokes a consis-
the dierent hospitals (e.g., age, gender, comorbidity, severity tent statistical model allowing comparisons between hos-
of disease, etc.). pitals, in order to estimate relative eectiveness [22]. To
The importance of this cause of variation is illustrated this end, statistical methods for risk-adjustment identify
by studies where dierences in crude outcome disappear and adjust variations in patient outcomes stemming from
when the outcomes are adjusted to take account of these dierences inpatient characteristics (or risk factors) across
confounding factors. To this end, researchers propose risk- hospitals and, therefore, allow fair and accurate interhospital
adjustment methodologies as proper methods of equitable comparisons.
comparisons for evaluating quality and eectiveness of However, the kind of adjustment required for assessing
hospitals [12, 15, 20]. eectiveness is not the same for the various subjects inter-
A second cause of variation in outcome (or its risk- ested in the results. To this regard, it is useful to distinguish
adjusted version) is dierences in the way data is collected. between two types of eectiveness. In fact, potential patients
Dierences in the measurement of events of interest (e.g., (users) and institutional stakeholders (agents) are interested
deaths) or in the population at risk (typically the denomina- in dierent types of hospital eectiveness.
tor of an event rate) depending on dierent inclusion criteria Following the approach of Raudenbush and Willms
for determining denominators, or when dierent case-mix [23], in a comparative setting, the relative eectiveness is
data is used to adjust for potential confounding, will lead to usually assessed through a measure of performance adjusted
apparent dierences in outcome. for the factors out of the control of the hospital, so the
Thirdly, observed dierences may be due to chance. dierence between eectiveness simply lies in the kind of
Random variation is influenced both by number of cases adjustment. The authors identify Type A and Type B relative
included and by the frequency with which the outcome eectiveness: Type A eectiveness deals with users interested
occurs. To this end, a fundamental issue is whether the in comparing the results they can obtain by enrolling in
outcome indicator is likely to have the statistical power to dierent hospitals, irrespective of the way such results are
detect dierences in quality. Statistical power depends upon yielded; the performance of the hospital adjusted for the
how common the occurrence of the outcome is. For some features of its users is evaluated. Type B eectiveness deals
rare events, the limited number of patients experiencing the with Stakeholders interested in assessing the production
events limits the power of the study [21]. process in order to evaluate the ability of the hospitals to
Finally, dierences in outcome may reflect real, although exploit the available resources; in this case, the performance
unobservable, dierences in quality of care. This may be of the hospital is adjusted according to the features of its
due to variations in dierent measurable or less measurable users, the features of the hospital itself, and the context in
aspects such as the interventions performed or the skill of the which it operates.
medical team. In the nineties, numerous authors proposed to estimate
Hence, as these are dierent causes of an outcome the concept of relative eectiveness by means of multilevel
variation, the conclusion that a variation in outcome is due to or hierarchical models [24, 25]. In fact, when the behaviour
a dierence in quality of care among hospitals is essentially a of individuals within organizations is studied, the data
diagnosis through exclusion: if variation cannot be explained have a nested structure. Individuals/patients constitute the
in terms of previous components (case-mix, data collection, sampling units at the first and lowest level of the nested
4 The Scientific World Journal

hierarchy. Organizations/hospitals constitute the sampling dierent variance components for each hierarchical level.
units at the second level. Specifically, the Intraclass Correlation Coecient (ICC),
Several recent statistical papers deal with risk-adjusted defined as the ratio between the variability among hospitals
comparisons, related to the mortality or morbidity out- 02 and total variability (02 plus the variability among
comes, by means of Multilevel models, in order to take into patients within the hospitals, e2 ) captures the proportion of
account dierent case-mixes of patients (for a review, see total variability of a given risk factor that is due to systematic
Goldstein and Leyland [26] and Rice and Leyland [27]). variation between hospitals. Nevertheless, in the case of a
One of the most attractive features of multilevel models dichotomous outcome Yi j , the usual first level residuals ei j ,
is the production of useful results in healthcare eectiveness and hence their variance e2 , are not in the model (1). This
by linking individual (patient) and organizational (hospi- occurs since the outcome variance i j /(1 i j ) being part
tal) characteristics (covariates). Multilevel models overcome of the specification of the error distribution depends on the
small sample problems by appropriately pooling information mean i j and thus does not have to be estimated separately.
across organizations, introducing some correction or shrink- However, approximating the variability of the first level
age, and providing a statistical framework that quantifies and with the variance of the standard logistic distribution ( 2 /3)
explains variability in outcomes through the investigation of and summing this variance with the variability of the
patient/hospital level covariates [27]. second level (02 ) allows separating the total variance in
Quality indicators are typically calculated and dissemi- two components, giving the intercept-only model ICC =
nated at hospital level, dividing the number of events (in- 02 /(02 + 2 /3). This measure is used to assess the percentage
hospital death or adverse event as a clinical error which of outcome heterogeneity existing between the hospitals
results in disability, death, or prolonged hospital stays) by the involved in the analysis.
number of discharged patients at risk. As the second step, the probability (in the logic metric
However, at the patient/individual level, the event of i j ) of an adverse event occurrence for patients can be a
interest is typically a dichotomous variable and the Multilevel function of patients characteristics (case-mix), other than
model version for this kind of outcome is the Logistic the hospital eect. Hence (1) can be extended assuming that
Multilevel Model (LMM, [25]). i j depends on P (p = 1, . . . , P) patient covariates (x pi j )
For patient i nested in hospital j, let i j be the probability P
of occurrence of a dichotomous adverse event Yi j , where Yi j i j = 0 j + x ,
p=1 p j pi j
(3)
is Bernoulli distributed with expected value E(Yi j ) = P(Yi j =
1) = i j . Instead of i j , the LMM specifies, as dependent 0 j = 00 + u0 j , (4)
outcome, its logistic transformation (i j = log(i j /1i j )) as
a function of possible covariates, where log is the logarithmic p j = p0 + u p j , (5)
transformation and (i j /1 i j ) the ratio of the probability where p j is the slope (regression coecient) of the pth
that the adverse event occurs to the probability that it does person characteristic in hospital j which is allowed to
not is called the odds of the modelled event. randomly vary across hospitals (e.g., the eect of length of
The LMM without patients and hospital covariates stay on adverse event occurrence varies among hospitals). In
(intercept-only LMM) assumes that i j depends only on the formulation (4), the specific eect for the jth hospital
the particular hospital charging patient i, specified by 0 j a on the outcome (u0 j ) is adjusted for the eects of the P
nominal variable designating the jth hospital; the hospital person-level characteristics (xi j p ). In (5) p0 represent the
eect is assumed to be random, meaning that hospitals average slope across hospitals and u p j the specific eect of
are assumed randomly sampled from a large population of hospital j to the average slope (random eect). However, in
hospitals. Equations (1) and (2) define the intercept-only eectiveness analyses, slope parameters ( p j ) are assumed to
LMM: be fixed (putting u p j = 0 in (5) for p = 1, . . . , P), whereas
i j = 0 j , (1) only the intercept u0 j is allowed to randomly vary across
hospitals. Such models, in which the regression slopes are
 
0 j = 00 + u0 j , u0 j N 0, 0 ,
2
(2) assumed fixed, are denoted as variance component models.
In the model composed by (3)-(4) and (5) with u p j = 0,
where 0 j is the intercept (eect) for the jth hospital which the u0 j reflects the relative eectiveness of the jth hospital,
can be decomposed in 00 representing the average proba- depurated only by individual case-mix characteristics, and
bility of adverse events (in the logit metric) across hospitals thus potentially depending on dierent hospital character-
and u0 j , a specific eect capturing the dierence between istics (Type A eectiveness).
the probability of adverse event for hospital j and the For Type B eectiveness, one can move to the next
average probability of adverse event across hospitals. These step, accounting for variation in intercept parameters across
random eects are assumed to be independent and normally hospitals by adding Q (q = 1, . . . , Q) hospital variables zq j to
distributed with zero mean and variance 02 , which describes level 2 equations. Hence, (4)-(5) become
the variability of hospitals eects. The intercept-only model Q
constitutes a benchmark value of the degree of misfit of 0 j = 00 + z
q=1 0q q j
+ u0 j , (6)
the model and can be used to compare models involving
dierent covariates at dierent levels. Further, this model Q
p j = p0 + z ,
q=1 pq q j
(7)
allows decomposing the total variance of the outcome into
The Scientific World Journal 5

in which slope parameters ( p j ) referring to (3) are specified at an average hospital, whereas patients who are treated at
as nonrandom covariates across hospitals, but possibly hospitals with negative random eects (OR < 1.0) have lower
varying depending on characteristics of hospital j(zq j ). odds of adverse event than patients who are treated at an
Methodologically, this step is justified when in the average hospital.
model (3)-(4) the intercepts u0 j do significantly vary across However, since the residuals are aected by the sampling
hospitals (by investigating the associated residual ICC), once variability and other sources of error, the corresponding
the patients characteristics are controlled for. ranking has a degree of uncertainty. Such uncertainty is
The compact form of (3)-(6)-(7) is dicult to represent, since it involves multiple comparisons.
If Hospital As risk-adjusted outcomes are significantly better
P
 Q
 than those of Hospital Bs, then we are more confident
i j = 00 + p0 x pi j + 0q zq j that Hospital A oers high quality of care, but we cannot
p=1 q=1
(8) assume that Hospital A is actually better than Hospital B.
Q
P 
 Therefore, several authors [4, 8, 27] suggest avoiding hospital
+ pq x pi j zq j + u0 j , rankings based on their risk-adjusted outcomes, but to place
p=1 q=1 hospitals into a limited number of groups, based on statistical
criteria. In a conservative approach, the usual procedure is
where the double sum in (8) captures possible cross-level to build 95% pairwise confidence intervals (CI) of level 2
interactions between covariates at dierent levels (e.g., pq residuals, or their exponentiated values, and situate hospitals
exhibits that, for hospital j, the eect of length of stay into three groups: eective (problematic) hospitals are those
(x pi j ) on adverse event occurrence (i j ) may depend on the with CIs entirely under (over) the risk-adjusted mean (e.g.,
specialisation level zq j of the hospital). regional) of warning event, whereas CIs that cross the risk-
In model (8), the parameters u0 j , called level 2 resid- adjusted mean define the intermediate group. Further, the
uals, specify the relative eectiveness of the hospital j eectiveness of two hospitals is statistically dierent whether
(Type B eectiveness): they show the specific managerial the 95% pairwise ICs of u0 j do not overlap.
contribution of the jth hospital to the risk of adverse
event,
 depurated by overall risk (00 ), individual case-mix
3.1. Case-Mix Adjustment. Typically, appropriate adjust-
( p p0 z pi j ), structural/process characteristics
  of the hos- ment instruments must control for the principal diagnosis
pitals ( q 0q zq j ), and their interactions ( p q pq x pi j zq j ).
within a Diagnostic-Related Group-(DRG) (categorization
To make this interpretation clear, (8) can be rewritten by
of each hospitalization based on the average resources used
isolating the u0 j in the right term of expression (8): the
to treat patients), contain demographics as proxies for preex-
eectiveness parameter u0 j is thus a hospital unexplained
isting physiological reserve (e.g., gender, age, marital status,
deviation of the  actual outcome
 ) from
(i j  the expected
socioeconomic status), and measure the number and severity
outcome (00 + p p0 x pi j + q 0q zq j + p q pq x pi j zq j ). The
of comorbidities [28].
expected outcome is the outcome predicted by the model
Comorbidities, or coexisting diseases, are obtained by
based on the available hospital and patient-level covariates.
DRG and principal-secondary diagnoses, whereas comor-
For patient i of hospital j, the dierence between actual
bidity severity is measured with dierent strategies: among
and expected outcome has a hospital-level component u0 j
others, (i) aggregating comorbidities reflecting dierent con-
(the eectiveness). Notice that, since the expected outcome
ditions leading to hospitalization [29], (ii) aggregating DRG
depends on the covariates, the meaning of eectiveness
reflecting admission gravity (disease staging, [4, 30]). For ex-
depends on how the model adjusts for the covariates (Type
A or Type B). ample, disease staging maps from the list of comorbid
One method of estimating u0 j is to use the empirical diagnoses to a severity scale that ranges from 1 to 4 where
Bayes (EB) residual estimator [24]. The EB estimator can stage one is the least severely ill and stage four is death.
be interpreted as the dierence between the average we In absence of institutional software measuring severity,
actually observe for a hospital (average of the actual outcome possible alternatives contained in Hospital Discharge Cards
for a hospital) and the average that is expected for the data are length of stay, admission type (planned/urgent),
hospital after controlling for the individual and hospital hospitalization type (surgical/other), DRG, and DRG weight,
factors that influence the average (average of the expected a numeric value assigned to each discharge based on the
outcome for a hospital). Hence, adjusting for both individual average resources consumed to treat patients in that DRG.
and hospital level sources of variation, the EB residual is In this end, risk-adjustment methods that use only
that part of the evaluation of the variable at hand (adverse administrative data appear to be a viable alternative to widely
event occurrence) that we believe to be due to management accepted severity adjustment methods when additional clin-
practices. The exponential value of the estimated hospital- ical data (medical chart, laboratory values, etc.) required by
specific random eect u0 j is the odds ratio (OR): the odds existing severity adjustment strategies are not available [31].
of experimenting an adverse event at the jth hospital divided
by the odds of an average hospital, after controlling for the 3.2. Decomposing Total Variance. Various approaches have
individual and hospital factors. Patients who are treated at been proposed to examine the proportion of explained
hospitals with positive random eects (OR > 1.0) have variance and to indicate how well the outcome is predicted
greater odds of adverse event than patients who are treated in a multilevel model. A straightforward approach consists
6 The Scientific World Journal

of examining the residual error variances and residual ICC where each individuals probability varies with individual-
in a sequence of models. level covariates in a three-level model.
However, in an LLM, if we start with an intercept- With aggregate data, another possible way to model
only model, and then estimate a second model where we proportions is to use regression count models. Count data
add a number of covariates (the linear predictor in (3)), is increasingly common in clinical research [33]. Examples
we normally expect the variance components to become include the number of adverse events occurring during a
smaller. However, in logistic regression the first level residual follow-up period or the number of hospitalizations. Poisson
variance is again 2 /3. These implicit scale changes make it Regression (PR, [34]) is the simplest regression model for
impossible to compare regression coecients across models, count data and assumes that each observed count Yk j is
or to investigate how variance components change [25]. drawn from a Poisson distribution with the conditional
One possible solution is a multilevel extension of a method mean k j on a given vector xk j for stratum k j. If Yk j is
proposed by McKelvey and Zavoina [32] that is based on assumed to be drawn from a Poisson distribution, the mixed
the explained variance of a latent outcome in a generalized Poisson regression is useful if researchers are interested in
linear model. In this formulation, for a specific model with whether the (logarithm of) expected rates (k j /nk j ), which
m covariates, the variance of i j is decomposed into the are incidence densities, varied across Specialty and hospital
first level residual variance, which is fixed to 2 /3, the characteristics or not. Here, nk j may denote both the size of
second-level intercept variance 02 and the variance F2 of the the population at risk in stratum k j or the size of the time
linear predictor (obtained by calculating the variability of the interval over which the events are counted varies.
predictions arising from the fixed part of the model). The Indicating k j = log(k j /nk j ), once having substituted index
variance of the linear predictor is the systematic variance in i with index k, (8) identifies the Poisson Multilevel Model. It
the model, whereas the other two variances are the residual involves, as the dependent variable, an event rate, such as the
errors at the two levels. In this specification, we can rescale ratio of clinical errors resulting in patient death to the total
discharges in the kth Specialties belonging to hospital j or
the variance estimates 02 and 2 /3 of a specified model
the number of clinical errors resulting in patient death per
with m covariates by an appropriate scale correction factor,
charge period. The random error u0 j continues to represent
that rescales the model to the same underlying scale as the
the specific managerial contribution of hospital j to the rate
intercept-only model. Let 2 = 02 + 2 /3 denote the total
of clinical errors, once Specialties characteristics (case-mix)
variance of the intercept-only model, and m2 = 02 + 2 /3 + and hospital structural characteristics are taken into account.
F2 for the model m including m first-level covariates. The main feature of the Poisson model is that the
Applying the scale correction factor 2 /m2 to the variance expected value of the random variable Yk j for stratum k j
components of model m, the corrected variance components is equal to its variance. However, its assumption of equi-
can be used for assessing ICC and the amount of variance dispersion, resulting in an underestimation of the outcome
explained at the two levels. variability, is too restrictive for many empirical applications.
In practice, the variance of observed count data usually
3.3. Aggregate Data and Rare Events. Often dichotomous exceeds the mean (overdispersion), due to the unobserved
data may be available at higher levels than the patient heterogeneity and/or when modelling rare events. In this
level (e.g., aggregated adverse events occurring in the kth situation, one classic cause of over-dispersion is the presence
Specialty belonging to hospital j). In that case, the individual of the excess of zeroes in the analyzed outcome distribution
dichotomous outcome Yi j becomes a proportion or an event (e.g., when many hospitals are not responsible for adverse
rate, defined as the number of events divided by the total events). Ignoring over-dispersion seriously compromises
person of experience (k j ). Specifically, k j is the ratio the goodness of fit of the model, which also leads to an
between Yk j , the counts of adverse events occurring in kth overestimation of the statistical significance of the explicative
Specialty of the jth hospital (stratum k j), and nk j , the size variables.
of the population at risk in stratum k j. Conditional on In this perspective, as described in the previous sections,
the covariates, Yk j is assumed to have a binomial error a fundamental issue for statistical models is whether the
distribution, with expected value k j and variance k j (1 outcome indicator is likely to have the statistical power to
k j )/nk j , where nk j is the number of trials or the population detect dierences in quality. In the presence of a rare event,
at risk (e.g., discharged patients) in stratum k j. the small number of patients experiencing said event limits
In this case, with aggregate data, we can continue to use the power of the study (at a given significance level) and
LMM. Here, first level refers to Specialty k, instead of patient one cannot conclude that some hospitals are better than the
i. In each stratum k j, we have a number of patients who may rest, or that a specific hospital with low performance (high
or may not experiment the adverse event. For each patient i complication rate) is worse, as these dierences might have
in stratum k j, the probability of a warning event is the same, arisen by chance.
and the proportion of respondents in the kth Specialty of When the data show over-dispersion and excess of zeros
the jth hospital is k j , which is the dependent outcome to (rare events) compared to the expected number under
be modelled. This formulation does not model individual the Poisson distribution, other count models, such as the
probability and does not use individual-level covariates. Negative Binomial Regression model (NBR, [34]) and Zero-
However, in the presence of individual dichotomous data Inflated regression models, appear to be more flexible. NBR
(Yik j for patient i in the stratum k j), we could have a model is able to model count data with over-dispersion, because
The Scientific World Journal 7

NBR is the extension of PR with a more liberal variance 3.5. Outcomes Measured with Error. In specific circum-
assumption, modelled by means of a dispersion parameter. stances, eectiveness analyses may be conducted by using
Instead, Zero-Inflated regression models address the issue quality of life outcomes (or patient satisfaction) which can
of excess zeroes in their own right, explicitly modelling the constitute the basis for assessing dierent hospitals in a
production of zero counts. Specifically, it is assumed that comparative setting. Quality of life indicators refer to the
there are two processes that produce the data: some of the general condition of health of the patient (physical and
zeros are part of the event count and are assumed to follow a mental health, functional state, independence in daily living,
Poisson model (or a negative binomial). Other zeros are part etc.) and describe the conditions in which services are
of the event taking place or not, a binary process modelled distributed.
in a binomial model (logistic equation). These zeros are not Although such variables are not directly observable, they
part of the count; they are structural zeros, indicating that can be estimated by analyzing tests administered to patients.
the event never takes place. Suppose we wish to analyze the data of a given class of n
Thus, for count data with the evidence of over-dispersion independent subjects. Let denote the latent outcome (or
and when over-dispersion results from a high frequency of patient satisfaction). The associated Linear Multilevel Model
zero counts (rare events), several modelling strategies give is
satisfactory fitting measures. P

i j = 00 + u0 j + p j x pi j + ei j , (9)
p=1
3.4. Continuous Outcomes. The rationale underlying the
specification of (8) can be generalized to the case in which where ei j , conditioned on variables in the linear predictor
the outcome variable is assumed to be continuous (or and , have zero mean and variance e2 and u0 j , conditioned
is a scale in which the responses to a large number of on covariates and are independent normal variables with
questions are summated to one score) with a normal error zero mean and variance 02 . However, i j is latent and we only
distribution. However, two main dierences arise. Firstly, observe a fallible measurable version (Yioj ). In accordance
in a Linear Multilevel Model [24], instead of modelling with the Classical Test Theory, which assumes that the
the logit of Yi j , we directly model Yi j and, secondly, the observed scores for K tests measure the same true latent
model now involves the level 1 residuals ei j (assumed to outcome score, plus an error term, this defines an explicit
have a normal distribution with zero mean and variance measurement model for the latent outcome:
e2 and to be independent from the level 2 residuals u0 j ).  
The parameter can be estimated by the full or restricted Yioj = i j + i j , i j | i j N 0, i2 (10)
maximum likelihood method [24].
In the intercept-only model, the ICC (= 02 /(02 + e2 )) in which the error term i j is normally distributed with zero
indicates the proportion of the variance explained by the mean and variance i2 , which varies across subjects (i =
grouping structure in the population. Since, with additional 1, . . . , n) in the same manner across hospitals. For example,
covariates, all residual variance components become smaller, Yioj can be thought as the total score obtained by summing
at each step, we can decide which regression coecients or scores for patient i in hospital j over K administered tests
variances to keep based on the significance tests, the change or as a composite score, estimated by using one of the
in the deviance, and changes in the variance components known models for continuous latent variables. From (10) we
(residual ICC). can decompose the variance (Var) of Yioj as the sum of its
When the response variable does not have a normal orthogonal variance components:
distribution, the parameter estimates produced by the max-    
imum likelihood method are still consistent and asymp- Var Yioj = Var i j + 2 , (11)
totically unbiased, meaning that they tend to get closer
to the true population values as the sample size becomes where 2 = N 1 i i2 denotes the average of the individual
larger. However, the asymptotic standard errors (variance- standard errors of the measurement.
covariance matrix of the estimated regression coecients) In such circumstance, when the variable measured with
are incorrect, and they cannot be trusted to produce accurate errors is the response variable of the model, its measurement
significance tests or confidence intervals for fixed eects error is captured by the model error and there are no
[24, page 60]. One available correction method to produce consequences on the estimated parameters, but this has
robust standard errors is the so-called Huber/White or serious consequences on variance components. In fact, (11)
sandwich estimator [35], where variances of the estimated illustrates that, due to measurement error, the variance of
regression coecients are obtained by empirical residuals of the estimated latent variable overestimates the true latent
the model (robust standard errors). This makes inference less variable variance.
dependent on the assumption of normality. Therefore, since instead of i j we observe an error-
Further, when the problem involves violations of contaminated estimation Yioj , by adding i j to both terms, the
assumptions and the aim is to establish bias-corrected model (9) becomes
estimates and valid confidence intervals for variance compo- P

nents, a viable alternative to asymptotic estimation methods Yij = i j + i j = 00 + u0 j + p j x pi j + ei j + i j (12)
is the bootstrap [25]. p=1
8 The Scientific World Journal

in which 2 (the variance of measurement errors) enters as also present specific challenges, due to the following potential
an additional random component in the total variance of Yioj , areas for bias.
thus modifying formulas to obtain ICC.
For the intercept-only model, ICC = 02 /(02 + e2 + 2 ),
4.1. The Risk of Risk Adjustment. Firstly, risk adjustment can
which resulted in an attenuated version of the true ICC, thus
only adjust for factors that can be identified and measured
underestimating the variability of outcome across hospitals.
accurately (case-mix fallacy). Consequently, risk adjusted
Hence, when the outcome is measured with error, ICC must
benchmarking, using administrative data, can be hampered
be disattenuated (ICC ), by subtracting the term 2 in the
by underreporting, that is, the potential endogeneity of the
denominator of the attenuated ICC.
recorded patient-level covariates (outcomes are correlated
To this end, dierent approaches can be utilized to esti-
with the propensity to record information across hospitals)
mate 2 (and thus ICC ). These concerns can be addressed
and the potential for nonconsidered covariates (misspecifi-
within the context of Rasch measurement models [36]
cation). For example, if an important severity measure is
providing measures underlying Likert scales with optimal
missing from the database, assuming that the distribution
characteristics. The Rasch model directly furnishes individ-
of this unmeasured covariate will vary across hospitals, the
ual estimates of i2 (the standard error of the estimated
variability of adjusted outcomes among hospital may be
outcome for person i, measured across K items), and
overestimated [30].
averaging them provides an estimate of 2 .
Furthermore, when using administrative archives for
Another possibility deals with factor analysis (FA).
adverse events, claims data is problematic in nature, given
Without loss of generality, let us consider K congeneric tests,
the limited number of claims generally emerging from
allowing dierent error variances for K tests and removing
administrative sources (underreporting, or lack of close calls
the assumption that all tests are based on the same units
or near misses/errors that do not result in injury) and the lack
of measurement. Supposing that the scores of K items for
of information on the causes of medical errors causing injury
n subjects are embedded in the vector of K variables Yo =
to patients (e.g., processes and systems of care that may be
(Y1o , . . . , YKo ) , and let Yo = + denote a single-factor
responsible).
analysis model for K items, where = (1 , . . . , K ) and
Secondly, unmeasured risk factors are not randomly
= (1 , . . . , K ) indicate the vector of partial regression
distributed across hospitals, due to clustering of certain types
coecients of in the regression of Yo on , and the error
of patients in certain hospitals practices. Users can easily
terms, respectively. In the FA model, 2 can be estimated
draw incorrect conclusions, because the hospitals that appear
once the reliability of the composite = 1 (2 / 2 ), defined
to have the worst outcomes may simply have the most
as the ratio of true variance to observed score variance 2 , is
seriously ill patients. To this end, the practice of routinely
estimated.
disseminate risk-adjusted hospital comparisons has been
Unlike traditional methods for computing composites as
strongly criticized, since an institutions position in rankings
total scores, the use of maximally reliable composite scores
strongly depends on the method of risk adjustment used
[24] minimizes measurement error in the items contributing
[37].
to each scale, thus increasing the reliability of the computed
Third, since dierences in the quality of care within
scale scores. More specifically, let =  1 Y denote
hospitals (e.g., DRGs and/or Specialties) may be greater than
the factor score estimates for the individuals, where is
dierences between hospitals, there is no clear evidence of
the estimated covariance matrix of the observed indicators
high correlation between how well a hospital performs on
and the estimated vector of regression coecients; the
one standard of eective care and how well it performs
reliability of the composite is estimated as
on another. After risk adjustment, the remaining hospitals
  variability (type B eectiveness) may be imputable to com-
w w
r= , (13) plex factors, typically depending on a reciprocal interaction
(w w )
between patient case mix (pathologies, clinical severity)
where w is the estimated vector of factor score regression and the institutional form of the hospital (profit, not-for-
weights (w =  1 ) that maximize r and is the profit/public, private, University hospital, etc.). Therefore,
diagonal matrix of estimated error terms variances K . the unexplained hospital variability appears to be physiolog-
Finally, measurement error bias becomes more serious ical and not possible to eliminate completely [8, 22, 37]. In
when the model involves a covariate measured with error this perspective, it has become imperative to evaluate which
(e.g., when the outcome at baseline is used as a covariate benchmarks keep the risk of comparing noncomparable
to estimate performances), causing bias in the estimated hospitals to a minimum.
parameters. This arises because the measurement error of the To this end, some authors [38] propose to use additional
outcome at baseline is correlated with ei j + i j in (12). factors, which contribute most to variability in patient ex-
perience, as supplementary adjustment variables for patient
mix or as stratification variables in order to present transpar-
4. Methodological Problems ent benchmarking analyses.
As described, the proposed analyses on large adminis-
trative archives can be used for benchmarking purposes. 4.2. Selection Bias. Patient selection bias is a distortion of
Notwithstanding the illustrated advantages, these analyses results due to the way subjects are selected for inclusion in
The Scientific World Journal 9

the study population. Patients are not randomly assigned to Sample Selection Models (SSMs) attempt to control the
hospitals. Whereas randomized and controlled trials reduce bias introduced by unobserved variables in hospital selection,
self-selection bias through randomization by evenly dis- which are also correlated with the outcome of interest. SSMs,
tributing subjects among treatment/hospital, observational widely used in the econometrics literature, are a special
studies based on administrative database are nonrandomized case of Instrumental Variable (IV) Models. The concept
and eectiveness results may be confounded by selection bias behind an IV is to identify a variable, the instrument,
due to systematic dierences in admission practices between that is associated with a subset of the variables that predict
(private/public) hospitals or dierences in hospital referral hospital choice but is independent of the patients baseline
patterns. Such selection biases may result in the preferential characteristics. If a good IV is identified, both measured
admission (or exclusion) of patients with dierent under- and unmeasured confounders can be accounted for in the
lying prognoses, independently of the severity of patients analysis.
illness. Typical instruments include severity of illness, territorial
Estimates of the eects and outcomes can be biased due supply of healthcare providers that may or may not oer
to a correlation between factors (such as baseline health specific treatments the distance from each patient home to
status) associated with hospital selection and outcomes either the nearest hospital that does specific treatments; or
(endogeneity). In fact, eectiveness random parameters u0 j the nearest hospital, that may or may not provide specific
are assumed as independent and uncorrelated with fixed treatments [41].
explicative variables. When this correlation occurs (e.g., this SSMs are two-stage methods. Before estimating the
may occur since the patients are selected in hospital), the outcome equation (second-stage model), the probability that
hypothesis is not valid and the model is not appropriate. patient i has chosen hospital j is predicted as an endogenous
Such a correlation can result in erroneous inferences about variable, as a function of observed patient and hospital
the magnitude and statistical significance of hospital eects characteristics, including instrumental variables. Further, all
[25]. Assessment of such bias, which limits a suitable relative instrumental variables are excluded from the second-stage
eectiveness of hospitals [39], would be extremely dicult model.
and would require information about all possible hospital The residual from the first stage is then added as an
admissions. explanatory variable to the outcome equation. It captured
A straightforward remedy to endogeneity due to a the unobservable nonrandom component and allowed us to
possible covariate x p is to add the hospital mean of x p to control for selection bias. Instead, IV techniques, contrary to
the model equation: this makes the patient level covariate x p SSMs, use a single equation to estimate the relative eective-
uncorrelated with the hospital eects, so valid estimates of ness without estimating the choice equation that is replaced
the Type A eects can be obtained. In this sense, the bias is by the presence of instruments in the outcome equation.
shifted to Type B eects by the endogeneity of hospital-level
covariates that typically occurs for the omission of relevant 5. Application
covariates at this level.
Furthermore, to control for selection bias in observa- To clarify the potentiality of the presented methods, this
tional data, dierent statistical techniques can be used for section focuses on hospital eectiveness concerning patient
evaluating hospital eectiveness that adjust for observed satisfaction. In Lombardy, the monitoring of patient sat-
and unknown dierences in the baseline characteristics and isfaction, mandatory for hospitals, is performed using the
prognostic factors of patients across hospitals. Propensity Ocial Customer Satisfaction (OCS) questionnaire of the
Score (PS), Instrumental Variable (IV), and Sample Selection Lombardy region. It contains 12 items regarding acceptance,
Models (SSM) are three techniques developed to minimize healthcare performance, satisfaction with physicians and
this potential bias [39, 40]. nurses, accommodation, discharge, and two items asking for
PS is the individual probability that a patient will receive an overall judgement of satisfaction. Each item is scored on a
a particular treatment (i.e., chooses hospital j) and is esti- seven-point Likert scale ranging from 1 to 7. Scores of 5 and
mated by logistic regression that predicts a patients choice over indicate increasing levels of satisfaction, whereas scores
as a function of covariates, including patients pretreatment of 3 and below indicate dissatisfaction.
characteristics (sociodemographic, comorbidities, diagnosis, Available data, provided by the regional Directorate of
and urgency-related factors). Using PS, potential bias due to Healthcare, refers to all Lombard hospitals in 2009, which
hospital choice is minimized if the choice and the outcome between April and November 2009, delivered the OCS
being evaluated are conditionally independent given the questionnaires to a random sample of discharged patients,
measured pretreatment characteristics. proportional to their annual number of discharges in 2009.
Further, in a second stage, ad hoc models (e.g., LMM For the analysis, we select only patients with planned
or multilevel version of count regression models when data admissions to general hospitals (excluding urgency admis-
are aggregated) are used to estimate relative eectiveness sions and specialist hospitals) in order to minimize the risk of
across hospitals in the outcome equation, adjusting for patient selection for analysed hospitals. Globally, the sample
posttreatment characteristics and propensity scores. This can is composed by 46,096 patients, nested in 64 hospitals (an
be done by adding PS as additional continuous covariate or average of 720 patients per hospital). Exploring the patient
by estimating hospitals eectiveness in the outcome equation covariates embedded in the OCS, patients dier by gender
within propensity scores strata, typically quintiles. (46% are female), age class (7% < 24 years, 37% in the
10 The Scientific World Journal

Table 1: Item Analysis: missing values, percentage of patients satisfied, and item-component correlation (n = 46, 096).
Missing % Satisfied Y1 Y2 Y3
Item description values (scores 6+7) ClinSAT GenSAT WaitLists
Nurses courtesy, attention, availability 370 88.5% 0.70 0.04 0.09
Doctors courtesy, attention, availability 606 89.4% 0.83 0.19 0.08
Satisfaction of the care provided 1309 89.3% 0.81 0.06 0.07
Health status (and discharge) information 609 85.1% 0.79 0.06 0.05
Privacy and consent information 635 88.6% 0.72 0.11 0.05
Comfort, bed, food, cleanliness 2150 83.6% 0.12 0.78 0.11
Organisation of the process of the care 627 81.6% 0.11 0.78 0.02
Recommend hospital (friends or relatives) 1346 85.2% 0.12 0.73 0.03
Overall satisfaction 704 85.3% 0.05 0.72 0.01
Waiting time to be admitted to the hospital 1417 75.7% 0.01 0.02 0.99

age class 2554 and 55% > 54 years), schooling level (5% Table 2: ICC and significant hospital characteristics.
primary school, 50% middle school, 36% high school, 9%
Y1 Y2 Y3d
university degree), and nationality (94% are Italian).
Available hospital structural characteristics involve sector ClinSAT GenSAT WaitDISSAT
(Private/public), typology (University or not), size (in three ICC 13.0% 14.8% 12.2%

bed-size categories), and whether the hospital has an emer- Residual ICC 2.7% 9.5% 1.2%#
gency unit. Hospital process measures (all measured in 2009 Hospital Characteristics Model coecients and significance
and obtained by Hospital Discharge Cards) involve number Private Hosp n.s 2.068 0.0420
of specialties in the hospital (N Specialties), percentage University Hosp 1.729 n.s n.s.
of beds utilized (% Beds), number of operating rooms % Beds 0.020
0.056
n.s.
utilized (N OpRoom), total number of hours operating N Specialties 0.079 0.281 0.0040
room utilized (Hours OpRoom), average monthly hours per
N OpRoom 0.072 0.102 n.s.
operating room (Ave MH OpRoom), and the case-mix of
% High medical casemix 3.515 n.s n.s.
charged patients during 2009.
The case-mix is measured as the percentage of (surgical Hours OpRoom n.s 0.001 n.s.
and medical) discharges having DRG weight above (High Ave MH OpRoom n.s 0.058 0.0004

case mix) or below (Low case mix) the regional median DRG corrected for measurement error, # rescaled with scale correction factor.
P-value < 0.01, P-value < 0.05, P-value < 0.10, n.s. = not
weight. In the analyzed sample, 52% are public hospitals,
significant.
85% have emergency unit, 8% are University hospitals, and
36% have more than 250 beds (5% < 50 beds).
Analyzing items scores (Table 1) with Confirmative Fac- The upper part of Table 2 exhibits, for Y1 and Y2,
tor Analysis, we found three orthogonal (Varimax rotation) the corrected (disattenuated) ICCs in the intercept-only
composites: the first deals with clinical aspects satisfaction model and the residual ICCs (the remaining proportion of
(Y1: ClinSAT), the second with general and accommodation variability due to hospitals dierences, once that covariates
aspects of satisfaction (Y2: GenSAT), and the third coincides are inserted in the models). For Y3d, the Residual ICC
with the single item dealing with satisfaction on waiting time is rescaled with the scale correction factor, in order to be
to be admitted in hospitals (Y3: WaitLists). For the first two, comparable to the ICC of the intercept-only model.
coecients alphas (Y1 = 0.92; Y2 = 0.90) and composite The three patient outcomes appear to be highly influ-
reliability (rY1 = 0.89; rY2 = 0.84) indicate acceptable inter- enced by the inclusion in the dierent hospitals; for con-
nal consistency and reliability for the estimated composites. tinuous outcomes, the disattenuated ICCs (higher than the
Despite many patients being very satisfied in many attenuated versions that equal 8.2% and 10.4% for Y1
domains (column 3 of Table 1), a multilevel analysis is and Y2, resp.) demonstrated that a high proportion of the
performed to assess whether there are meaningful dierences dierences in the outcomes is attributable to dierences
between hospitals in evaluations of patient satisfaction between hospitals. This especially occurs for Y2, meaning
and whether these dierences remain, after controlling for that almost 15% of the variance in overall satisfaction
patient and hospital characteristics (hospital eectiveness). (14.8%) is across hospitals.
Specifically, we specify a Linear Multilevel Model for the To explain these dierences, available covariates are
composites Y1 and Y2, whereas a Logistic Multilevel Model used. The lower part of Table 2 exhibits covariates that are
is used for predicting the probability of being dissatisfied significant at least for one outcome. Firstly, individual patient
with waiting time, using as dependent outcome Y3d (Wait- characteristics and other hospital characteristics (such as the
DISSAT), a dichotomous variable that is equal to 1 when chirurgical case-mix, hospital dimension, and presence of
the score on the Waiting time item 3 and is equal to 0 emergency unit) are found to be not significant (at the 0.05
otherwise. significance level).
The Scientific World Journal 11

This highlights that the three patient satisfaction dimen- clinicians, and patients) need reliable and valid information
sions are not aected by patient characteristics and do not for benchmarking, making judgments, and determining
significantly vary among available hospital characteristics. priorities, accountability, and quality improvement.
In contrast, for Clinical Satisfaction (Y1), most of the Where health services have eects on outcome, use of
variation is associated to the dierence in the number of outcome measures as performance indicators is appropriate
specialties (inversely linked with Y1) and of the number and eorts should be taken to ensure that the benchmarking
of operating rooms (positively linked with Y1) between strategies can be interpreted reliably. However, the conclu-
hospitals, with higher levels of Y1 for university hospitals, sion that dierences in outcome are due to dierences in
demonstrating that clinical satisfaction is higher in special- quality of care will always be tentative and open to the
ized university hospitals. possibility that the apparent association between a given
The overall satisfaction (Y2) is higher for private hos- unit and poor outcome is due to the confounding eect of
pitals with high volumes of operating room hours utilized some other factor that has not been measured, measured
and decreases for hospitals with several specialties and high inadequately, or misspecified.
utilization rates of operating rooms. Observing Y3d, it is As the empirical application has shown, estimated hospi-
of note that the significant covariates for predicting overall tal rankings must be interpreted in scrupulous detail. Despite
satisfaction (Y2) act in exactly the same manner in predicting such limitations, clinical administrative data is broadly
the dissatisfaction for waiting time, (higher for private considered as a useful screening tool for identifying quality-
hospitals with several specialties) and the high utilization related problems and in targeting areas, which potentially
rates for operating rooms. require in-depth investigation. The simultaneous monitor-
After checking for hospital characteristics, the residual ing of several outcomes, which indicate malpractice appears
ICCs become very small, except for Y2 that decreases to 9.5% to oer a useful strategy in facilitating hospitals and stake-
from 14.8%. Globally, the significant hospital covariates holders in detecting trends and identifying extreme outliers.
explain 81%, 34%, and 90% of the outcome variability Once a benchmark for each performance measure is
among hospitals for Y1, Y2, and Y3d, respectively. determined, analyzing data results becomes more meaning-
The remaining hospital dierences (residuals) are pur- ful.
ported to define eects of management practices (Type B However, moving from the evaluation step towards the
eectiveness) to increase patient satisfaction in the three phase of statistical implications mainly depends on the way
domains. in which monitored (e.g., adverse) events are distributed
Before investigating the obtained rankings, we explore among hospitals. If a large proportion of adverse events are
possible covariate endogeneity by means of three generalized concentrated among relatively few hospitals, the traditional
linear models which specify, for each outcome, the hospital quality control approach targeting error prone, ineective
residuals (u0 j ) as dependent variable. In these models, the health structures for specific attention has high potential
eects of hospital covariates are found to be not significant value. When variation is discovered through continuous
(at the 0.01 significance level). monitoring, or when unexpected events suggest performance
The global F-tests, referring to the hypothesis that all problems, members of the organization may decide that
covariates coecients are equal to zero, versus the alternative there is an opportunity for improvement.
that at least one does not, are largely not significant (FY1 = The opportunity may involve a process or an outcome
0.41, P-value = 0.954; FY2 = 0.49, P-value = 0.913; FY3d = that could be changed to better meet customer feedback,
0.51, P-value = 0.987), meaning that no serious endogene- needs, or expectations.
ity is found, so valid eectiveness parameters are obtained. In contrast, when ineective hospitals are more diusely
As a last step of the analysis, we check the concordance distributed, targeting specific hospitals may be a less ecient
of three hospitals rankings based on the estimated u0 j (Type strategy than investigating the clinical processes in the frame-
B eectiveness). Spearman correlations (r) exhibit weak work of continuous quality improvement with an emphasis
agreement between estimated rankings for all outcomes, on careful examination, rigorous, scientific testing methods,
showing three independent dimensions. Specifically, the statistical analysis, and the transparent adjustment of clinical
ranking based on overall satisfaction is significantly and processes.
positively correlated with the ranking based on clinical To this end, exhaustive and exclusive measure specifica-
satisfaction (r = 0.375, P-value = 0.002) and with those tions should be described, including specific definitions of
based on satisfaction with waiting time (r = 0.304, P-value = the clinical indicators and standards and identification of the
0.014), although of modest strength. Instead, the correlation target population and data sources.
between the rankings of clinical and waiting time satisfaction Steps can be taken to minimize the possibility of a false
is positive, but at the limit of statistical significance (r = conclusion being drawn on the quality of care based on
0.252, P-value = 0.045). outcome measurement.
Standardising how data is collected can reduce the extent
6. Conclusion to which dierences in measurement can potentially cause
observed variation. Including sucient numbers of patients
Using clinical outcomes for quality assessment represents will reduce the possibility of random variation mask-
an important approach to documenting the quality of ing real dierences or making spurious dierences appear.
care. Consumers of indicator information (stakeholders, Development of sophisticated case-mix adjustment systems
12 The Scientific World Journal

can reduce the possibility that observed dierences are due [2] JCAHO Joint Commission on Accreditation of Healthcare
to dierences in the types of patient, developing of an Organization, A Guide to Establishing Programs for Assessing
analytical plan with descriptions of the statistical and clinical Outcomes in Clinical Settings, Oakbrook Terrace, Ill, USA,
significance of results to be assessed when comparing groups 1994.
or comparing a group to a standard. [3] CIHI Canadian Institute for Health Information, Executive
As part of the development process, indicator measure- Summary: Data Quality Documentation, Discharge Abstract
ment can be made more ecient when incorporated into Database 2005-2006, CIHI Press, Ottawa, Canada, 2006.
[4] AHRQ Agency for Healthcare Research and Quality, Guid-
routine patient care as part of clinicians and administra-
ance for using the AHRQ Quality Indicators for hospital-level
tors documentation of required information on patient public reporting or payment, 2006, http://www.qualityindi-
characteristics and care delivery, already being recorded for cators.ahrq.gov/.
clinical purposes (medical record data). This would eliminate [5] S. F. Jencks, T. Cuerdon, D. R. Burwen et al., Quality of
duplicative clinical data collection for the purposes of clinical medical care delivered to medicare beneficiaries: a profile at
care and quality assessment. state and national levels, Journal of the American Medical As-
In conclusion, another important topic that aects the sociation, vol. 284, no. 13, pp. 16701676, 2000.
evaluation quality of the care in a benchmarking perspective [6] A. Donabedian, The quality of care. How can it be assessed?
is the institutional condition of the healthcare system and its Journal of the American Medical Association, vol. 260, no. 12,
modifications over time. pp. 17431748, 1988.
For example, the English National Health Service (NHS) [7] L. J. Opit, The Measurement of Health Service Outcomes,
has developed from 2002 onwards a new era of hospital Oxford Textbook of Health Care, 10, OLJ, London, UK, 1993.
market (New Labour). Under this model, competition arises [8] H. Goldstein and D. J. Spiegelhalter, League tables and their
from patient choice, selective contracting of purchasers limitations: statistical issues in comparisons of institutional
(primary care trusts) with providers and from competition performance, Journal of the Royal Statistical Society. Series A,
vol. 159, no. 3, pp. 385443, 1996.
between dierent providers (NHS trusts, private providers,
[9] L. M. Koran, The reliability of clinical methods, data and
independent sector treatment centres, and NHS foundation
judgments. Part II, New England Journal of Medicine, vol. 293,
trusts). no. 14, pp. 695701, 1975.
In Italy, since 2001, the healthcare system has moved [10] K. Lohr, Medicare: A Strategy for Quality Assurance, National
in the direction of a welfare-mix system, characterized by Academy Press, Washington, DC, USA, 1990.
freedom of choice for the consumer and by the joint- [11] R. G. Gift and D. Mosel, Benchmarking in health care, Ameri-
presence of state agents (operating with functional financial can Hospital Publishing, Chicago, Ill, USA, 1994.
autonomy), private profit or nonprofit accredited companies [12] L. I. Iezzoni, A. S. Ash, M. Shwartz, J. Daley, J. S. Hughes,
endowed with autonomous decision-making and managerial and Y. D. Mackieman, Judging hospitals by severity-adjusted
procedures and by freedom of choice for the consumer. mortality rates: the influence of the severity-adjustment
Hence, the specific question is to evaluate the relation method, American Journal of Public Health, vol. 86, no. 10,
between hospital competition and hospital quality. To this pp. 13791387, 1996.
end, some recent econometric studies focusing on NHS [13] W. R. Best and D. C. Cowper, The ratio of observed-to-
find causal eects of hospital competition on care quality. expected mortality as a quality of care indicator in non-sur-
Specifically, they show that competition improves clinical gical VA patients, Medical Care, vol. 32, no. 4, pp. 390400,
quality (as measured by reduction in hospital mortality rates 1994.
after myocardial infarction) and also reducing waiting times [14] C. Vincent, P. Aylin, B. D. Franklin et al., Is health care getting
safer? British Medical Journal, vol. 337, no. 7680, pp. 1205
[42, 43].
1207, 2008.
In this perspective, other open questions remain crucial:
[15] L. I. Iezzoni, The risks of risk adjustment, Journal of the
does available evidence-based result support institutional American Medical Association, vol. 278, no. 19, pp. 16001607,
proposals to extend competition? How does competition 1997.
compares with other policies to increase hospital quality? [16] L. I. Iezzoni, Assessing quality using administrative data,
More applied research is required for these topics. Annals of Internal Medicine, vol. 127, no. 8, pp. 666673, 1997.
Overall, the present paper suggests a launching board for [17] A. Epstein, Performance reports on qualityprototypes,
discussions with experts in the field of administrative data, problems, and prospects, New England Journal of Medicine,
risk adjustment, and performance measurement reporting. vol. 333, no. 1, pp. 5761, 1995.
Clinicians and researchers should actively participate in [18] NHS, National Health Service, Commission for Health Im-
designing future administrative databases to ensure that they provement. A commentary on Star Ratings 2002-2003, 2004,
are clinically meaningful and useful for quality measurement, http://www.chi.nhs.uk./ratings/.
oering regional stakeholders the opportunity to gain a [19] IQIP International Quality Indicator Project, 2004, http://
deeper understanding of the problematic areas in clinical risk www.internationalqip.com/.
assessment. [20] R. W. Dubois, R. H. Brook, and W. H. Rogers, Adjusted hos-
pital death rates: a potential screen for quality of medical care,
References American Journal of Public Health, vol. 77, no. 9, pp. 1162
1167, 1987.
[1] J. vretveit and D. Gustafson, Improving the quality of health [21] P. M. Rothwell and C. P. Warlow, Interpretation of operative
care: using research to inform quality programmes, British risks of individual surgeons, Lancet, vol. 353, no. 9161, p.
Medical Journal, vol. 326, no. 7392, pp. 759761, 2003. 1325, 1999.
The Scientific World Journal 13

[22] C. Damberg, E. A. Kerr, and E. A. McGlynn, Description of [41] H. S. Luft, D. W. Garnick, D. H. Mark et al., Does quality
data sources and related issues, in Health Information Systems. influence choice of hospital? Journal of the American Medical
Design Issues and Analytical Application, E. A. McGlynn, C. Association, vol. 263, no. 21, pp. 58992906, 1990.
Damberg, E. A. Kerr, and R. A. Brook, Eds., pp. 4376, RAND [42] Z. Cooper, S. Gibbons, S. Jones, and A. McGuire, Does
Health, Santa Monica, Calif, USA, 1998. hospital competition save lives? Evidence from the English
[23] S. W. Raudenbush and J. D. Willms, The estimation of school NHS patient choice reforms, Econometric Journal, vol. 121,
eects, Journal of Educational and Behavioral Statistics, vol. pp. F228F260, 2011.
20, pp. 307335, 1995. [43] G. Bevan and M. Skellern, Does competition between
[24] H. Goldstein, Multilevel Statistical Models, Edward Arnold, hospitals improve clinical quality? A review of evidence from
London, UK, 1995. two eras of competition in the English NHS, British Medical
[25] T. A. B. Snijders and R. J. Bosker, Multilevel Analysis. An Journal, vol. 343, no. 7830, article d6470, 2011.
Introduction to Basic and Advanced Multilevel Modelling, Sage,
London, UK, 1999.
[26] H. Goldstein and A. H. Leyland, Multilevel Modelling of Health
Statistics, Wiley, New York, NY, USA, 2001.
[27] N. Rice and A. Leyland, Multilevel models: applications to
health data, Journal of Health Services Research & Policy, vol.
1, no. 3, pp. 154164, 1996.
[28] N. P. Wray, J. C. Hollingsworth, N. J. Petersen, and C. M. Ash-
ton, Case-mix adjustment using administrative databases: a
paradigm to guide future research, Medical Care Research and
Review, vol. 54, no. 3, pp. 326356, 1997.
[29] A. Elixhauser, C. Steiner, D. R. Harris, and R. M. Coey,
Comorbidity measures for use with administrative data,
Medical Care, vol. 36, no. 1, pp. 827, 1998.
[30] L. I. Iezzoni, Risk Adjustment for Measuring Health Care
Outcomes, Health Administration Press, Ann Arbor, Mich,
USA, 1994.
[31] T. Lagu, P. K. Lindenauer, M. B. Rothberg et al., Development
and validation of a model that uses enhanced administrative
data to predict mortality in patients with sepsis, Critical Care
Medicine, vol. 39, no. 11, pp. 24252430, 2011.
[32] R. McKelvey and W. Zavoina, A statistical model for the
analysis of ordinal level dependent variables, Journal of
Mathematical Sociology, vol. 4, pp. 103120, 1975.
[33] R. J. Glynn and J. E. Buring, Ways of measuring rates of
recurrent events, British Medical Journal, vol. 312, no. 7027,
pp. 364367, 1996.
[34] D. Lambert, Zero-inflated poisson regression, with an appli-
cation to defects in manufacturing, Technometrics, vol. 34, no.
1, pp. 114, 1992.
[35] H. White, Maximum likelihood estimation of misspecified
models, Econometrica, vol. 50, pp. 125, 1982.
[36] B. D. Wright and M. Mok, Rasch models overview, Journal
of applied measurement, vol. 1, no. 1, pp. 83106, 2000.
[37] R. Lilford, M. A. Mohammed, D. Spiegelhalter, and R.
Thomson, Use and misuse of process and outcome data
in managing performance of acute medical care: avoiding
institutional stigma, Lancet, vol. 363, no. 9415, pp. 1147
1154, 2004.
[38] B. M. Holzer and C. E. Minder, A simple approach to
fairer hospital benchmarking using patient experience data,
International Journal for Quality in Health Care, vol. 23, no. 5,
pp. 524530, 2011.
[39] N. Zohoori and D. A. Savitz, Econometric approaches to
epidemiologic data: relating endogeneity and unobserved
heterogeneity to confounding, Annals of Epidemiology, vol. 7,
no. 4, pp. 251257, 1997.
[40] P. Rosenbaum and D. Rubin, Reducing bias in observational
studies using subclassification on the propensity score, Jour-
nal of the American Statistical Association, vol. 79, pp. 516524,
1984.
MEDIATORS of

INFLAMMATION

The Scientific Gastroenterology Journal of


World Journal
Hindawi Publishing Corporation
Research and Practice
Hindawi Publishing Corporation
Hindawi Publishing Corporation
Diabetes Research
Hindawi Publishing Corporation
Disease Markers
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of International Journal of


Immunology Research
Hindawi Publishing Corporation
Endocrinology
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at


http://www.hindawi.com

BioMed
PPAR Research
Hindawi Publishing Corporation
Research International
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of
Obesity

Evidence-Based
Journal of Stem Cells Complementary and Journal of
Ophthalmology
Hindawi Publishing Corporation
International
Hindawi Publishing Corporation
Alternative Medicine
Hindawi Publishing Corporation Hindawi Publishing Corporation
Oncology
Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Parkinsons
Disease

Computational and
Mathematical Methods
in Medicine
Behavioural
Neurology
AIDS
Research and Treatment
Oxidative Medicine and
Cellular Longevity
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Das könnte Ihnen auch gefallen